Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-08-01 Thread Mike Snitzer
On Wed, Jul 31 2013 at  5:53pm -0400,
Chris Murphy li...@colorremedies.com wrote:

 
 On Jul 31, 2013, at 12:38 PM, Eric Sandeen sand...@redhat.com wrote:
 
  
  i.e. if you only want the efficient snapshots, a way to fully-provision
  a thinp device.  I'm still not sure if this is possible…?
 
 […]
 
  
  I guess I'm pretty nervous about offering actual thin provisioned
  storage to average Fedora users.  I'm having nightmares about the bug
  reports already, just based on the likelihood of most users misunderstanding
  the  feature and it's requirements  expected behavior…
 
 So possibly the installer should be conservative about how thin the
 provisioning is;

We (David Lehman, myself and others on our respective teams) have
already decided some months ago that any thin LVs that anaconda
establishes will _not_ oversubscribe the thin-pool.

And in fact a reserve of free space will be kept in the thin-pool as
well as the parent VG.

 otherwise I'm imagining inadequately provisioned thinp LV, while also
 using the rollback feature [1].

Can you elaborate?  Rollback with LVM thin provisioning doesn't require
any additional space in the pool.  It is a simple matter of swapping the
internal device_ids that the thin-pool uses as an index to access the
corresponding thin volumes.  This is done when activating the thin
volumes.

LVM2's support thinp snapshot merge (aka rollback) is still pending, but
RFC patches have been published via this BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=957881
 
 [1] https://fedoraproject.org/wiki/Changes/Rollback

The Rollback project authors have been having periodic concalls with
David Lehman, myself and others.  So we are relatively coordinated ;)

Mike
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-08-01 Thread Reindl Harald


Am 31.07.2013 21:24, schrieb Matthew Miller:
 On Wed, Jul 31, 2013 at 08:18:52PM +0200, Reindl Harald wrote:
 you are aware how much 10% of 8 TB are?
 
 So why *not* keep more logs, at least while nothing else is using it?

to save space?

there where i use Thin Provisioning are full backups of
machines mandatory and you do not want to have hundrets
of gigabyteslogs

 you need at least a lot of more fuzzy logic
 * not more than XXX MB
 * or vary the percentage depending on the drive size
 * if /var/log is a dedicated partition *nothing* reserved
 
 That last, at least, seems reasonable

at least the second too - nobody looks at TB's of logs and the
few people who do are not the norm and can configure it



signature.asc
Description: OpenPGP digital signature
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Florian Weimer

On 07/29/2013 08:38 PM, Ric Wheeler wrote:


If application A does a stat or statvfs() call, sees 1GB of space left
and then does a write, we could easily lose that race to any other
application.

If you want to reserve space, you need to grab the space yourself
(always works with a large write() but preallocation can also help
without dm-thin).


In order to have it work always, you'll have to write unpredictable 
data.  If you write just zeros, the reservation isn't guaranteed if the 
file system supports compression.


I'm pretty sure we want a crass layering violation for this one 
(probably a new mode flag for fallocate), to ensure proper storage 
reservation for things like VM images.


--
Florian Weimer / Red Hat Product Security Team
--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Zdenek Kabelac

Dne 31.7.2013 10:39, Florian Weimer napsal(a):

On 07/29/2013 08:38 PM, Ric Wheeler wrote:


If application A does a stat or statvfs() call, sees 1GB of space left
and then does a write, we could easily lose that race to any other
application.

If you want to reserve space, you need to grab the space yourself
(always works with a large write() but preallocation can also help
without dm-thin).


In order to have it work always, you'll have to write unpredictable data.
If you write just zeros, the reservation isn't guaranteed if the file system
supports compression.

I'm pretty sure we want a crass layering violation for this one (probably a
new mode flag for fallocate), to ensure proper storage reservation for things
like VM images.



If someone wants to use preallocation - thus always allocate whole space,
than there is no reason to use provisioned devices unless someone want's to 
use its snapshot feature (for this  there could be probably introduced 
something like creation of fully provisioned device) - but then you end-up

with the same problem once you start to use snapshot.

For me this rather looks like misuse of thin provisioning.

ThinP should be configured in a way that admin is able to extend pool to 
honour promised space if really needed. It's not a good idea, to provision 1EB 
if you have at most just 1TB disk and then you expect you will have no 
problems when someone fallocate() 500TB.


I.e. if someone is using  iSCSI disc array with 'hw' thin provisioning 
support, there is no scsi command to provision space - it's admin's work to 
ensure there is enough disc space to keep up with user demands


Maybe - just an idea - there could be a kernel bit-flag somewhere, which might 
tell if the device used by fs is 'fully provisioned' or 'thin provisioned' 
(something like rotational/non-rotational)  But there is no way to return 
information about free disc space - since it's highly subjective value and 
moreover very expensive to calculate.


Zdenek

--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Mike Snitzer
On Mon, Jul 29 2013 at  2:49pm -0400,
Daniel P. Berrange berrangeatredhat.com wrote:

 On Mon, Jul 29, 2013 at 02:38:23PM -0400, Ric Wheeler wrote:
  On 07/29/2013 10:18 AM, Daniel P. Berrange wrote:
  On Mon, Jul 29, 2013 at 08:01:23AM -0600, Chris Murphy wrote:
  On Jul 29, 2013, at 6:30 AM, Daniel P. Berrange berrange at 
  redhat.com wrote:
  
  Yep, we need to be able to report free space on filesystems, so that
  apps provisioning virtual machines can get an idea of how much storage
  they can provide to VMs without risk of over comitting.
  
  I agree that we really want the kernel, or at least a reusable shared
  library, to provide some kind of interface to determine this, rather
  than requiring every userspace app which cares to re-invent the wheel.
  What does it mean for an app to use stat to get free space, and then
  proceeds to create too big a VM image in a directory that has a quota
  set? I still think apps are asking an inappropriate/unqualified question
  by asking for volume free space, instead of what's available to them for
  a specified path.
   From an API POV, libvirt doesn't need/care about the free space on the
  volume underlying the filesystem. We actually only care about the free
  space in a given directory that we're using for disk images. It just
  happens that we implement this using statvfs() currently. So when I
  ask for an API above, don't take this to mean I want a statvfs() that
  knows about sparse volumes. An API or syscall that provides free space
  for individual directories is fine with me.
  
 
  Just another note, it is never safe to assume that storage under any
  file system is yours for the taking.
  
  If application A does a stat or statvfs() call, sees 1GB of space
  left and then does a write, we could easily lose that race to any
  other application.
 
 This race doesn't matter from libvirt's POV. It is just providing a
 mechanism via its API. It is upto the management application using
 libvirt to make use of the mechanism to provide a usage policy.
 Their usage scenario may well enable them to make certain assumptions
 about the storage that you could not otherwise do in a race free
 manner.
 
 In addition, even in more general purpose usage scenarios, it does
 not neccessarily matter if there is a race, because there can be a
 second line of defence. For example, KVM can be set to pause the VM
 upon ENOSPC errors, giving management application or administrator
 the chance to expand capacity the underlying storage and then unpause
 the guest. In that case checking the free space is mostly just a
 sanity check which serves to avoid hitting the pause-on-ENOSPC scenario
 too frequently.

Running out of free space _should_ be extremely rare.  A properly
configured dm-thin pool will have adequate free space, with an
appropriate low water mark, that would give admins ample time to extend
(even if a human were to do it).  But lvm2 has support to autoextend the
thin-pool with free space in the parent volume group.

But I'm just talking about the not-really-chicken solution of leaning on
a properly configured system (either by admins in a data center or by
fedora developers with sane defaults).

As an aside, this extra free space checking that KVM is doing is really
broken by design (polling sucks -- especially if this polling is
happening in the host for each guest).  Would be much better to leverage
something like lvm2 with a custom dmeventd plugin that fires when it
receives the low watermark and/or -ENOSPC event.

Thinly provisioned volumes offer the prospect of doing away with this
polling -- as such proper dm-thin integration has been on the virt
roadmap for a while.  Just never seems to happen.

Mike
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Mike Snitzer
On Mon, Jul 29 2013 at  2:48pm -0400,
Eric Sandeen sandeenatredhat.com wrote:

 On 7/27/13 11:56 AM, Lennart Poettering wrote:
  On Fri, 26.07.13 22:13, Miloslav Trmač (mitr at volny.cz) wrote:
  
  Hello all,
  with thin provisioning available, the total and free space values
  reported by a filesystem do not necessarily mean that that much space
  is _actually_ available (the actual backing storage may be smaller, or
  shared with other filesystems).
 
  If your package reports disk space usage to users, and bases this on
  filesystem free space, please consider whether it might need to take
  LVM thin provisioning into account.
 
  The same applies if your package automatically allocates a certain
  proportion of the total or available space.
 
  A quick way to check whether your package is likely to be affected, is
  to look for statfs() or statvfs() calls in C, or the equivalent in
  your higher-level library / programming language.
  
  Well, I am pretty sure the burden must be on the file systems to report
  a useful estimate free blocks value in statfs()/statvfs(). Exporting that
  problem to userspace and expecting userspace to work around this is just
  wrong. In fact, this would be quite an API breakage if applications
  cannot rely that the value returned is at least a rough estimate on how
  much data can be stored on disk.
  
  journald will scale how much disk usage it will use of /var/log/journal
  based on the file system size and free level. It will also module the
  per-service rate limit levels based on the amount of free disk space. If
  you break the API of statfs()/statvfs(), then you will end up break this
  and all programs like it.
 
 Any program needs to be prepared for ENOSPC; as Ric mentioned elsewhere,
 until you successfully write to it, it's not yours! :)  (Ok, thinp
 running out of space won't generate ENOSPC today, either, but you see
 my general point...)
 
 And how much space are we really talking about here?  If you're running
 thin-provisioning on thin margins, especially w/o some way to automatically 
 hot-add storage, you're probably doing it wrong.
 
 (And if journald sees 100T free and decides it can use 50T of that,
 it's doing it wrong, too) ;)
 
 The truth is somewhere in the middle, but quibbling over whether this
 app or that can claim a bit of space behind a thin-provisioned volume
 probably isn't useful.

Right, so picking up on what we've discussed: adding the ability to have
fallocate propagate to the underlying storage via a new REQ_RESERVE bio
(if the storage opts-in, which dm-thinp could).  This bio would be the
reciprocal of discard -- thus enabling the caller to efficiently reserve
space in the underlying storage (e.g. dm-thin-pool).  So volumes or apps
(e.g. journald) that _expect_ to have fully-provisioned space from thinp
could.

This would also allow for a hyrid setup where the thin-pool is
configured to use a smaller block size to benefit taking many snapshots
-- but then allows select apps and/or volumes to reserve contiguous
space from the thin-pool.  It obviously also offers the other
traditional fallocate benefits too (reserving large contiguous space for
performance, etc).

I'll draft an RFC patch or 2 for LKML... may take some time for me to
get to it but I can make it a higher priority if others have serious
interest.

 The admin definitely needs tools to see the state of thinly provisioned
 storage, but that's the admin's job to worry about, not the app's, IMHO.

Yeah, in a data center the admin really should be all over these thinp
concerns, making them a non-issue.  But on the desktop the fedora
developers need to provide sane policy/defaults.

Mike
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Mike Snitzer
On Wed, Jul 31 2013 at  5:52am -0400,
Zdenek Kabelac zkabe...@redhat.com wrote:

 Dne 31.7.2013 10:39, Florian Weimer napsal(a):
 On 07/29/2013 08:38 PM, Ric Wheeler wrote:
 
 If application A does a stat or statvfs() call, sees 1GB of space left
 and then does a write, we could easily lose that race to any other
 application.
 
 If you want to reserve space, you need to grab the space yourself
 (always works with a large write() but preallocation can also help
 without dm-thin).
 
 In order to have it work always, you'll have to write unpredictable data.
 If you write just zeros, the reservation isn't guaranteed if the file system
 supports compression.
 
 I'm pretty sure we want a crass layering violation for this one (probably a
 new mode flag for fallocate), to ensure proper storage reservation for things
 like VM images.
 
 
 If someone wants to use preallocation - thus always allocate whole space,
 than there is no reason to use provisioned devices unless someone
 want's to use its snapshot feature (for this  there could be
 probably introduced something like creation of fully provisioned
 device) - but then you end-up
 with the same problem once you start to use snapshot.
 
 For me this rather looks like misuse of thin provisioning.
 
 ThinP should be configured in a way that admin is able to extend
 pool to honour promised space if really needed. It's not a good
 idea, to provision 1EB if you have at most just 1TB disk and then
 you expect you will have no problems when someone fallocate() 500TB.

fallocate doesn't allow you to reserve more physical space than you have
(or allowed via quota).
 
 I.e. if someone is using  iSCSI disc array with 'hw' thin
 provisioning support, there is no scsi command to provision space -
 it's admin's work to ensure there is enough disc space to keep up
 with user demands
 
 Maybe - just an idea - there could be a kernel bit-flag somewhere,
 which might tell if the device used by fs is 'fully provisioned' or
 'thin provisioned' (something like rotational/non-rotational)  But
 there is no way to return information about free disc space - since
 it's highly subjective value and moreover very expensive to
 calculate.

If things like journald _need_ to have a sysfs flag that denotes the
volume it is writing to is thinp then I'd like to understand what it'd
do differently knowing that info.  Would it conditionally call
fallocate() -- assuming dm-thinp grows REQ_RESERVE support like I
mentioned in my previous post.

I see little value in exposing whether some portion of the storage stack
is thin or not.  What is an app to do with that info?  It'd have to do
things like: 1) determine the blockdevice the FS is layered on 2) lookup
sysfs file for that device.. a filesystem can span multiple
devices.. some time some not.  It is just a rat's nest.

Thinly provisioned storage this isn't exactly a new concept.  But the
Linux provided target obviously engages other parts of the OS to
properly support it (at a minimum the volume manager and the
installer).  If the fallocate() triggered REQ_RESERVE passdown to the
underlying storage provides a reasonable stop gap we can really explore
it -- at least we'd be piggybacking on an established interface that
returns success or failure.

Mike
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Ric Wheeler

On 07/31/2013 10:32 AM, Mike Snitzer wrote:

On Mon, Jul 29 2013 at  2:48pm -0400,
Eric Sandeen sandeenatredhat.com wrote:


On 7/27/13 11:56 AM, Lennart Poettering wrote:

On Fri, 26.07.13 22:13, Miloslav Trmač (mitr at volny.cz) wrote:


Hello all,
with thin provisioning available, the total and free space values
reported by a filesystem do not necessarily mean that that much space
is _actually_ available (the actual backing storage may be smaller, or
shared with other filesystems).

If your package reports disk space usage to users, and bases this on
filesystem free space, please consider whether it might need to take
LVM thin provisioning into account.

The same applies if your package automatically allocates a certain
proportion of the total or available space.

A quick way to check whether your package is likely to be affected, is
to look for statfs() or statvfs() calls in C, or the equivalent in
your higher-level library / programming language.

Well, I am pretty sure the burden must be on the file systems to report
a useful estimate free blocks value in statfs()/statvfs(). Exporting that
problem to userspace and expecting userspace to work around this is just
wrong. In fact, this would be quite an API breakage if applications
cannot rely that the value returned is at least a rough estimate on how
much data can be stored on disk.

journald will scale how much disk usage it will use of /var/log/journal
based on the file system size and free level. It will also module the
per-service rate limit levels based on the amount of free disk space. If
you break the API of statfs()/statvfs(), then you will end up break this
and all programs like it.

Any program needs to be prepared for ENOSPC; as Ric mentioned elsewhere,
until you successfully write to it, it's not yours! :)  (Ok, thinp
running out of space won't generate ENOSPC today, either, but you see
my general point...)

And how much space are we really talking about here?  If you're running
thin-provisioning on thin margins, especially w/o some way to automatically
hot-add storage, you're probably doing it wrong.

(And if journald sees 100T free and decides it can use 50T of that,
it's doing it wrong, too) ;)

The truth is somewhere in the middle, but quibbling over whether this
app or that can claim a bit of space behind a thin-provisioned volume
probably isn't useful.

Right, so picking up on what we've discussed: adding the ability to have
fallocate propagate to the underlying storage via a new REQ_RESERVE bio
(if the storage opts-in, which dm-thinp could).  This bio would be the
reciprocal of discard -- thus enabling the caller to efficiently reserve
space in the underlying storage (e.g. dm-thin-pool).  So volumes or apps
(e.g. journald) that _expect_ to have fully-provisioned space from thinp
could.


I think that this would be really useful and, as you mention, is the mirror 
image of our discard support


ric



This would also allow for a hyrid setup where the thin-pool is
configured to use a smaller block size to benefit taking many snapshots
-- but then allows select apps and/or volumes to reserve contiguous
space from the thin-pool.  It obviously also offers the other
traditional fallocate benefits too (reserving large contiguous space for
performance, etc).

I'll draft an RFC patch or 2 for LKML... may take some time for me to
get to it but I can make it a higher priority if others have serious
interest.


The admin definitely needs tools to see the state of thinly provisioned
storage, but that's the admin's job to worry about, not the app's, IMHO.

Yeah, in a data center the admin really should be all over these thinp
concerns, making them a non-issue.  But on the desktop the fedora
developers need to provide sane policy/defaults.

Mike


--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Chris Murphy

On Jul 31, 2013, at 8:32 AM, Mike Snitzer snit...@redhat.com wrote:

 But on the desktop the fedora
 developers need to provide sane policy/defaults.

Right. And the concern I have (other than a blatant bug), is the F20 feature 
for the installer to create thinp LVs; and to do that the installer needs to 
know what sane default parameters are. I think perhaps determining those 
defaults is non-obvious because of my experience in this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=984236

If I'm going to use thinP mostly for snapshots, then that suggests a smaller 
chunk size at thin pool creation time; whereas if I have no need for snapshots 
but a greater need for provisioning then a larger chunk size is better. And 
asking usage context in the installer, I think is a problem.


Chris Murphy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Zbigniew Jędrzejewski-Szmek
On Mon, Jul 29, 2013 at 05:34:05PM -0400, Simo Sorce wrote:
 On Mon, 2013-07-29 at 23:06 +0200, Lennart Poettering wrote:
  Well, the point I am making is that it is wrong to ask userspace to
  handle this. Get the APIs right you expose to userspace. 
 
 If user space assume it can use 'all the space up to 15% from exhausting
 space' then it is user space that is wrong to me.
 Even w/o thin provisioning you do not want any application to
 boundlessly consume terabytes of space just because it happen to sit on
 a *big* disk.
 Applications may use heuristics to better behave in 'common' situations,
 but should limit themselves also on the max space they are going to
 consume in general (or better have admin controllable knobs to do so
 coupled with reasonable defaults).
journald provides configuration knobs to exactly set the limits.
But forcing the admin to always configure this is something that
should be avoided, and reasonable values that work OK most of the
time should be used. Those defaults (15% of available /var/log, 10% free)
may not be perfect, but they give reasonable behaviour on various
systems, large and small. This is true even on btrfs with 50%
overestimate of free space.

  I mean, ultimately for me it doesn't matter I geuss, since you say
  neither the fs/block layer nor userspace should care, but that this is
  the admin's problem, but that really sounds like chickening out to
  me... 
 
 Given the admin is the only one that knows for any given situation what
 is more important to him I do not think there is much more you can do.
 Sure you can set defaults or what not but there isn't any configuration
 that will ever be right short of something that can read minds.
 
 What you can do is give admin knobs to tweak in user space, as well as
 in the kernel. So that applications can limit themselves based on
 configuration, lacking those knobs you need to provide mechanisms a la
 cgroups that are used to hard limit misbehaving apps that think they are
 at an all-you-can-it buffet.
Let's say that I'm downloading something in the browser, or creating a
iso image in brasero. I think it would be really awful if those
applications couldn't tell me that I don't have enough space (without
actually exhausting it and hitting a limit), like they can now.
So it's not a question of misbehaving.

Zbyszek

-- 
they are not broken. they are refucktored
   -- alxchk
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Eric Sandeen
On 7/31/13 12:08 PM, Chris Murphy wrote:
 
 On Jul 31, 2013, at 8:32 AM, Mike Snitzer snit...@redhat.com
 wrote:
 
 But on the desktop the fedora developers need to provide sane
 policy/defaults.
 
 Right. And the concern I have (other than a blatant bug), is the F20
 feature for the installer to create thinp LVs; and to do that the
 installer needs to know what sane default parameters are. I think
 perhaps determining those defaults is non-obvious because of my
 experience in this bug: 
 https://bugzilla.redhat.com/show_bug.cgi?id=984236
 
 If I'm going to use thinP mostly for snapshots, then that suggests a
 smaller chunk size at thin pool creation time; whereas if I have no
 need for snapshots but a greater need for provisioning then a larger
 chunk size is better. And asking usage context in the installer, I
 think is a problem.

Quite some time ago I had asked whether we could get the allocation-tracking
snapshot niceties from dm-thinp, without actually needing it to be thin.

i.e. if you only want the efficient snapshots, a way to fully-provision
a thinp device.  I'm still not sure if this is possible...?

I guess I'm pretty nervous about offering actual thin provisioned
storage to average Fedora users.  I'm having nightmares about the bug
reports already, just based on the likelihood of most users misunderstanding
the  feature and it's requirements  expected behavior...

-Eric

 Chris Murphy
 

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Reindl Harald


Am 31.07.2013 20:14, schrieb Zbigniew Jędrzejewski-Szmek:
 journald provides configuration knobs to exactly set the limits.
 But forcing the admin to always configure this is something that
 should be avoided, and reasonable values that work OK most of the
 time should be used. Those defaults (15% of available /var/log, 10% free)
 may not be perfect, but they give reasonable behaviour on various
 systems, large and small. This is true even on btrfs with 50%
 overestimate of free space

you are aware how much 10% of 8 TB are?

this is the same way fundamentally broken as the
5% reserved for root these days

you need at least a lot of more fuzzy logic

* not more than XXX MB
* or vary the percentage depending on the drive size
* if /var/log is a dedicated partition *nothing* reserved




signature.asc
Description: OpenPGP digital signature
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Matthew Miller
On Wed, Jul 31, 2013 at 08:18:52PM +0200, Reindl Harald wrote:
 you are aware how much 10% of 8 TB are?

So why *not* keep more logs, at least while nothing else is using it?

 you need at least a lot of more fuzzy logic
 * not more than XXX MB
 * or vary the percentage depending on the drive size
 * if /var/log is a dedicated partition *nothing* reserved

That last, at least, seems reasonable.

-- 
Matthew Miller  ☁☁☁  Fedora Cloud Architect  ☁☁☁  mat...@fedoraproject.org
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Mike Snitzer
On Wed, Jul 31 2013 at  1:08pm -0400,
Chris Murphy li...@colorremedies.com wrote:

 
 On Jul 31, 2013, at 8:32 AM, Mike Snitzer snit...@redhat.com wrote:
 
  But on the desktop the fedora
  developers need to provide sane policy/defaults.
 
 Right. And the concern I have (other than a blatant bug), is the F20
 feature for the installer to create thinp LVs; and to do that the
 installer needs to know what sane default parameters are. I think
 perhaps determining those defaults is non-obvious because of my
 experience in this bug:
 https://bugzilla.redhat.com/show_bug.cgi?id=984236

Hmm, certainly a strange one.  But some bugs can be.  Did you ever look
to see if CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is enabled?  Wouldn't
explain any dmeventd memleak issues but could help explain slowness
associated with mkfs.btrfs ontop of thinp.  Anyway, to be continued in
the BZ...

 If I'm going to use thinP mostly for snapshots, then that suggests a
 smaller chunk size at thin pool creation time; whereas if I have no
 need for snapshots but a greater need for provisioning then a larger
 chunk size is better. And asking usage context in the installer, I
 think is a problem.

It is certainly less than ideal but we haven't come up with an
alternative yet.  As Zdenek mentioned in comment#13 of the BZ you
referenced, we're looking to do is establish default profiles for at
least these 2 use-cases you mentioned.  lvm2 has recently grown profile
support.  We just need to come to terms with what constitutes
sufficiently small and sufficently large thinp block sizes.

We're doing work to zero in on the best defaults... so ultimately this
is still up in the air.

But my current thinking for these 2 profiles is:
* for performance, use data device's optimal_io_size if  64K.
  - this will yield a thinp block_size that is a full stripe on RAID[56]
* for snapshots, use data device's minimum_io_size if  64K.

If/when we have the kernel REQ_RESERVE support to prealloc space in the
thin-pool it _could_ be that we make the snapshots profile the default;
and anything that wanted more performance could use fallocate().  But it
is a slippery slope because many apps could overcompensate to always use
fallocate()... we really don't want that.  So some form of quota might
need to be enforced on a cgroup level (once cgroup's reservation quota
is exceeded fallocate()'s REQ_RESERVE bio pass down will be skipped).
Grafting in cgroup-based policy into DM is a whole other can of worms,
but doable.

Open to other ideas...

Mike
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Mike Snitzer
On Wed, Jul 31 2013 at  2:38pm -0400,
Eric Sandeen sand...@redhat.com wrote:

 On 7/31/13 12:08 PM, Chris Murphy wrote:
  
  On Jul 31, 2013, at 8:32 AM, Mike Snitzer snit...@redhat.com
  wrote:
  
  But on the desktop the fedora developers need to provide sane
  policy/defaults.
  
  Right. And the concern I have (other than a blatant bug), is the F20
  feature for the installer to create thinp LVs; and to do that the
  installer needs to know what sane default parameters are. I think
  perhaps determining those defaults is non-obvious because of my
  experience in this bug: 
  https://bugzilla.redhat.com/show_bug.cgi?id=984236
  
  If I'm going to use thinP mostly for snapshots, then that suggests a
  smaller chunk size at thin pool creation time; whereas if I have no
  need for snapshots but a greater need for provisioning then a larger
  chunk size is better. And asking usage context in the installer, I
  think is a problem.
 
 Quite some time ago I had asked whether we could get the allocation-tracking
 snapshot niceties from dm-thinp, without actually needing it to be thin.
 
 i.e. if you only want the efficient snapshots, a way to fully-provision
 a thinp device.  I'm still not sure if this is possible...?

TBD, we could add a sandeen_makes_thinp_his_bitch param and if
specified (likely for entire pool, but we'll see) it would mean thin
volumes allocating from the pool would have their logical address space
reserved to be completey contiguous on creation (with all thin blocks
flagged in metadata as RESERVED).

The actual thin block allocation (zeroing of blocks on first write if
configured, etc.) transitions the metadata's block from RESERVED to
PROVISIONED.  Not yet clear to me that the DM thinp code can be easily
adapted to make the thin block allocation 2 staged.

But would seem to be a prereq for dm-thinp's REQ_RESERVE support.  I'll
check with Joe (cc'd) and come back with his dose of reality ;)

 I guess I'm pretty nervous about offering actual thin provisioned
 storage to average Fedora users.  I'm having nightmares about the bug
 reports already, just based on the likelihood of most users misunderstanding
 the  feature and it's requirements  expected behavior...

Heh, you shouldn't be nervous.  You can just punt said bugs over the
fence right? ;)

Mike
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Chris Murphy

On Jul 31, 2013, at 1:44 PM, Mike Snitzer snit...@redhat.com wrote:

 Did you ever look
 to see if CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is enabled?

It's not enabled in either the regular or debug kernels found in koji.

Chris Murphy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Tom Coughlan
On Wed, 2013-07-31 at 11:52 +0200, Zdenek Kabelac wrote:
...
 ThinP should be configured in a way that admin is able to extend pool to 
 honour promised space if really needed. It's not a good idea, to provision 
 1EB 
 if you have at most just 1TB disk and then you expect you will have no 
 problems when someone fallocate() 500TB.
 
 I.e. if someone is using  iSCSI disc array with 'hw' thin provisioning 
 support, there is no scsi command to provision space - it's admin's work to 
 ensure there is enough disc space to keep up with user demands

Oops, Zdenek is likely repeating a mis-statement I made about the SCSI
Standard on a call yesterday. I just checked and I was wrong - the
latest draft of the Standard does provide a way to pre-provision space.
Sorry - I should have checked before speaking. 

In March 2010 the SCSI committee added the concept of anchored thin
provisioning to the (proposed) SCSI Block Commands – 3 (SBC-3)
Standard. This allows a logical block to be in one of three states:
mapped, deallocated, or anchored.  A write command that specifies an
anchored LBA does not require allocation of additional LBA mapping
resources for that LBA. A write command that specifies a deallocated LBA
may require allocation of LBA mapping resources.

This change was proposed by David Black from EMC. The justification is
reflects our discussion:  

There is extensive experience with this sort of resource preallocation
mechanism in filesystems, as most physical filesystems are effectively
thin provisioned courtesy of the way that file space allocation works.
In that domain, this sort of preallocation mechanism is useful and used
selectively (e.g., the fallocate() primitive in Linux and Unix systems).
In this context, SCSI anchored functionality can be viewed as extending
filesystem notions of preallocation down to include SCSI thin
provisioning..

So, 1) others have seen the need for pre-allocation in thinp
environments, 2) hardware will eventually show up that implements it, 3)
it appears as though the extension to fallocate that Mike suggested is
worth investigating, 4) if we do this, we will want to add the concept
to LVM thinp, and 5) to the plumbing in Linux SCSI so we can pass it to
capable hardware. 

Tom  

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Adam Williamson
On Wed, 2013-07-31 at 13:38 -0500, Eric Sandeen wrote:
 On 7/31/13 12:08 PM, Chris Murphy wrote:
  
  On Jul 31, 2013, at 8:32 AM, Mike Snitzer snit...@redhat.com
  wrote:
  
  But on the desktop the fedora developers need to provide sane
  policy/defaults.
  
  Right. And the concern I have (other than a blatant bug), is the F20
  feature for the installer to create thinp LVs; and to do that the
  installer needs to know what sane default parameters are. I think
  perhaps determining those defaults is non-obvious because of my
  experience in this bug: 
  https://bugzilla.redhat.com/show_bug.cgi?id=984236
  
  If I'm going to use thinP mostly for snapshots, then that suggests a
  smaller chunk size at thin pool creation time; whereas if I have no
  need for snapshots but a greater need for provisioning then a larger
  chunk size is better. And asking usage context in the installer, I
  think is a problem.
 
 Quite some time ago I had asked whether we could get the allocation-tracking
 snapshot niceties from dm-thinp, without actually needing it to be thin.
 
 i.e. if you only want the efficient snapshots, a way to fully-provision
 a thinp device.  I'm still not sure if this is possible...?
 
 I guess I'm pretty nervous about offering actual thin provisioned
 storage to average Fedora users.  I'm having nightmares about the bug
 reports already, just based on the likelihood of most users misunderstanding
 the  feature and it's requirements  expected behavior...

There was some discussion in #anaconda yesterday about making it
available only via custom partitioning rather than the Installation
Options dropdown, IIRC. That has the effect of denoting it as an
'advanced feature' and requiring more expertise on the part of the user
to set it up.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | identi.ca: adamwfedora
http://www.happyassassin.net

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-31 Thread Chris Murphy

On Jul 31, 2013, at 12:38 PM, Eric Sandeen sand...@redhat.com wrote:

 
 i.e. if you only want the efficient snapshots, a way to fully-provision
 a thinp device.  I'm still not sure if this is possible…?

[…]

 
 I guess I'm pretty nervous about offering actual thin provisioned
 storage to average Fedora users.  I'm having nightmares about the bug
 reports already, just based on the likelihood of most users misunderstanding
 the  feature and it's requirements  expected behavior…

So possibly the installer should be conservative about how thin the 
provisioning is; otherwise I'm imagining inadequately provisioned thinp LV, 
while also using the rollback feature [1].


[1] https://fedoraproject.org/wiki/Changes/Rollback


Chris Murphy
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-30 Thread Zdenek Kabelac

Dne 29.7.2013 23:06, Lennart Poettering napsal(a):

On Mon, 29.07.13 16:52, Ric Wheeler (rwhee...@redhat.com) wrote:


Oh, we don't assume it's all ours. We recheck regularly, immediately
before appending to the journal files, of course assuming that we are
not the only writers.


With thinly provisioned storage (or things like btrfs, writeable
snapshots, etc), you will not really ever know how much space is
really there.


Yeah, and that's an API regression.




I guess  there  is major misunderstanding what is the whole purpose
of thin provisioning.

From this thread one could get the feeling that thinp just complicates
estimations of free space for filesystem :)

But the usage is quite different from the beginning.

- disk space is costly resource
- resizing of filesystem (i.e. ext4) is blocking usage and could be risky.
- lot of filesystems does not support native snapshots.

So thinp is here to help with these things.

- Instead of running multi terrabyte disk arrays when user is only using gigs
of disk space - thinp allows to add storage when needed (so you could
slowly extend your disk arrays with more hw which needs more energy)

- Instead of resizing fs all the time - you create large fs from the beginning
and you let the block layer to resolve magic (at the price of disk 
fragmentation)

- Instead of repeatedly writing code for snapshots to every fs - again you let 
the block layer to handle it (at the price of less efficient disk space usage).



So the idea that fs would return  different number of free space when it's 
being run on thinly provisioned device is simply wrong from many points.


And there is no point to support this - since  with LVM you could replace
thinp device with linear mirrored device online if that would be needed.
Obviously this would give you very floating results for any stats() functions 
you think there should be supported.


Secondly - thinpool is designed to grow - so from which number you would 
actually want to estimate the free size -  from the current pool size ?
from the free size in whole volume group ? from the size of all attached disks 
which could be attached to volume group ?
If you have multiple thin pools in VG - then what do you think the result 
value should be here?


And finaly - the snapshot features makes the estimation of free space very 
costly operation - if you run multiple snapshots -how do you estimate free 
space ? What would be the meaning of this value ?



thinp should work the same. Of course, this requires that the block
layer has to pass more metadata up to the file systems than before, but
there's really nothing intrinsicly evil about that, I mean, it could be
as basic as just passing along a provisioning perentage or so which
the fs will simply multiply into the returned values... (Of
course it won't be that simple, but you get the concept...)


Sorry, but the only broken concept I can see here is to allocate
50% of free disk space just because it can be made -  disk space is not
local RAM - if the FS tells you it has 1EB doesn't mean the program should 
just allocate 500TB for nothing.


In this case  admin obviously must properly configure provisioned disk space 
for those users who are used to eat every single byte from their fs.  Thinp 
can't resolve this.


Zdenek

--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Miloslav Trmač
On Sat, Jul 27, 2013 at 6:56 PM, Lennart Poettering
mzerq...@0pointer.de wrote:
 On Fri, 26.07.13 22:13, Miloslav Trmač (m...@volny.cz) wrote:
 The same applies if your package automatically allocates a certain
 proportion of the total or available space.

 A quick way to check whether your package is likely to be affected, is
 to look for statfs() or statvfs() calls in C, or the equivalent in
 your higher-level library / programming language.

 Well, I am pretty sure the burden must be on the file systems to report
 a useful estimate free blocks value in statfs()/statvfs(). Exporting that
 problem to userspace and expecting userspace to work around this is just
 wrong. In fact, this would be quite an API breakage if applications
 cannot rely that the value returned is at least a rough estimate on how
 much data can be stored on disk.

Well, we have two subsystems making quite reasonable assumptions, with
the composition being unreasonable.  We'll just have to figure a
solution out.  That's what distributions are for, after all.
Mirek.


P.S. WRT stat{v,}fs API breakage:
I've been (as can be expected) thinking about this a lot, and the
primary criteria for API breakage are IMHO:
1) When an old application is used in an old environment, the
results should not change (so that things keep working, especially
on system upgrades; OTOH it's OK for an old application not to be able
to handle a new environment/to be broken by it).
2) When faced with an environment change, an interface should usually
behave strictly as documented, not to preserve a specific use case and
break other use cases (because we can't know how various use cases are
prevalent, especially in binary-only applications).
3) This implies that introducing new features typically means new
adding new interfaces and updating applications to be able to use
them.  That's, I think, quite reasonable.

So, I think that stat{v,}fs should continue to only report about the
file system:
1) the value of free space reported by stat{v,}fs() for a
SAN-located FS shouldn't change if Fedora 21 suddenly learns about SAN
thin provisioning
2) stat{v,}fs is explicitly documented to report about the file
system, not about the storage stack.

To be clear, it's more important to have _a_ solution than to have
specifically a solution that follows these ideas.  That's why it's a
P.S.
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Daniel P. Berrange
On Fri, Jul 26, 2013 at 10:06:20PM +0100, Richard W.M. Jones wrote:
 On Fri, Jul 26, 2013 at 10:13:42PM +0200, Miloslav Trmač wrote:
  Hello all,
  with thin provisioning available, the total and free space values
  reported by a filesystem do not necessarily mean that that much space
  is _actually_ available (the actual backing storage may be smaller, or
  shared with other filesystems).
  
  If your package reports disk space usage to users, and bases this on
  filesystem free space, please consider whether it might need to take
  LVM thin provisioning into account.
  
  The same applies if your package automatically allocates a certain
  proportion of the total or available space.
  
  A quick way to check whether your package is likely to be affected, is
  to look for statfs() or statvfs() calls in C, or the equivalent in
  your higher-level library / programming language.
 
 Also libvirt has a whole set of APIs around storage and
 free space.

Yep, we need to be able to report free space on filesystems, so that
apps provisioning virtual machines can get an idea of how much storage
they can provide to VMs without risk of over comitting.

I agree that we really want the kernel, or at least a reusable shared
library, to provide some kind of interface to determine this, rather
than requiring every userspace app which cares to re-invent the wheel.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Chris Murphy

On Jul 29, 2013, at 6:30 AM, Daniel P. Berrange berra...@redhat.com wrote:

 Yep, we need to be able to report free space on filesystems, so that
 apps provisioning virtual machines can get an idea of how much storage
 they can provide to VMs without risk of over comitting.
 
 I agree that we really want the kernel, or at least a reusable shared
 library, to provide some kind of interface to determine this, rather
 than requiring every userspace app which cares to re-invent the wheel.

What does it mean for an app to use stat to get free space, and then proceeds 
to create too big a VM image in a directory that has a quota set? I still think 
apps are asking an inappropriate/unqualified question by asking for volume free 
space, instead of what's available to them for a specified path.


Chris Murphy
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Daniel P. Berrange
On Mon, Jul 29, 2013 at 08:01:23AM -0600, Chris Murphy wrote:
 
 On Jul 29, 2013, at 6:30 AM, Daniel P. Berrange berra...@redhat.com wrote:
 
  Yep, we need to be able to report free space on filesystems, so that
  apps provisioning virtual machines can get an idea of how much storage
  they can provide to VMs without risk of over comitting.
  
  I agree that we really want the kernel, or at least a reusable shared
  library, to provide some kind of interface to determine this, rather
  than requiring every userspace app which cares to re-invent the wheel.
 
 What does it mean for an app to use stat to get free space, and then
 proceeds to create too big a VM image in a directory that has a quota
 set? I still think apps are asking an inappropriate/unqualified question
 by asking for volume free space, instead of what's available to them for
 a specified path.

From an API POV, libvirt doesn't need/care about the free space on the
volume underlying the filesystem. We actually only care about the free
space in a given directory that we're using for disk images. It just
happens that we implement this using statvfs() currently. So when I
ask for an API above, don't take this to mean I want a statvfs() that
knows about sparse volumes. An API or syscall that provides free space
for individual directories is fine with me.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Richard W.M. Jones
On Mon, Jul 29, 2013 at 08:01:23AM -0600, Chris Murphy wrote:
 
 On Jul 29, 2013, at 6:30 AM, Daniel P. Berrange berra...@redhat.com wrote:
 
  Yep, we need to be able to report free space on filesystems, so that
  apps provisioning virtual machines can get an idea of how much storage
  they can provide to VMs without risk of over comitting.
  
  I agree that we really want the kernel, or at least a reusable shared
  library, to provide some kind of interface to determine this, rather
  than requiring every userspace app which cares to re-invent the wheel.

 What does it mean for an app to use stat to get free space, and then
 proceeds to create too big a VM image in a directory that has a
 quota set? I still think apps are asking an
 inappropriate/unqualified question by asking for volume free space,
 instead of what's available to them for a specified path.

libvirt only does what users (or applications) ask of it.

The problem is that the information is not available to give to users/
applications so they can make a good decision.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Chris Murphy

On Jul 29, 2013, at 8:18 AM, Daniel P. Berrange berra...@redhat.com wrote:
 
 From an API POV, libvirt doesn't need/care about the free space on the
 volume underlying the filesystem. We actually only care about the free
 space in a given directory that we're using for disk images. It just
 happens that we implement this using statvfs() currently. So when I
 ask for an API above, don't take this to mean I want a statvfs() that
 knows about sparse volumes. An API or syscall that provides free space
 for individual directories is fine with me.

Got it. So what's needed is an API that can return available space to 
user/application, while abstracting whether the limit is a function of btrfs 
raid peculiarities, thinp overcommitting, or file system quota? And then 
applications that need to know what this limit is, need to use this new API?


Chris Murphy
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Ric Wheeler

On 07/29/2013 10:01 AM, Chris Murphy wrote:

On Jul 29, 2013, at 6:30 AM, Daniel P. Berrange berra...@redhat.com wrote:


Yep, we need to be able to report free space on filesystems, so that
apps provisioning virtual machines can get an idea of how much storage
they can provide to VMs without risk of over comitting.

I agree that we really want the kernel, or at least a reusable shared
library, to provide some kind of interface to determine this, rather
than requiring every userspace app which cares to re-invent the wheel.

What does it mean for an app to use stat to get free space, and then proceeds 
to create too big a VM image in a directory that has a quota set? I still think 
apps are asking an inappropriate/unqualified question by asking for volume free 
space, instead of what's available to them for a specified path.


Chris Murphy


When you use thinly provisioned storage, the file system itself does not know 
how much physical storage is really backing it so stat, df and friends *really* 
have no way to tell.


Think of it as the equivalent of virtual memory backed by physical DRAM - the 
virtual storage is backed by physical disk.


It is up to the admin/installation tools to provision enough real storage to 
make this work. If you provision 10 file systems with a virtual 1TB each and 
only back it with 2TB of real disk, you will need to monitor the space (via 
device mapper tools) and dynamically throw in more disk when the physical pool 
runs low.


Regards,

Ric

--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Ric Wheeler

On 07/29/2013 10:18 AM, Daniel P. Berrange wrote:

On Mon, Jul 29, 2013 at 08:01:23AM -0600, Chris Murphy wrote:

On Jul 29, 2013, at 6:30 AM, Daniel P. Berrange berra...@redhat.com wrote:


Yep, we need to be able to report free space on filesystems, so that
apps provisioning virtual machines can get an idea of how much storage
they can provide to VMs without risk of over comitting.

I agree that we really want the kernel, or at least a reusable shared
library, to provide some kind of interface to determine this, rather
than requiring every userspace app which cares to re-invent the wheel.

What does it mean for an app to use stat to get free space, and then
proceeds to create too big a VM image in a directory that has a quota
set? I still think apps are asking an inappropriate/unqualified question
by asking for volume free space, instead of what's available to them for
a specified path.

 From an API POV, libvirt doesn't need/care about the free space on the
volume underlying the filesystem. We actually only care about the free
space in a given directory that we're using for disk images. It just
happens that we implement this using statvfs() currently. So when I
ask for an API above, don't take this to mean I want a statvfs() that
knows about sparse volumes. An API or syscall that provides free space
for individual directories is fine with me.

Daniel


Just another note, it is never safe to assume that storage under any file system 
is yours for the taking.


If application A does a stat or statvfs() call, sees 1GB of space left and then 
does a write, we could easily lose that race to any other application.


If you want to reserve space, you need to grab the space yourself (always works 
with a large write() but preallocation can also help without dm-thin).


The difference with dm-thin is that all applications on all file systems backed 
by the same block pool compete for that space.


Another worry here is that preallocation is a file system concept, thinly 
provisioned storage (dm-thin or array backed), only sees proper writes so you 
need to write to space to really claim it.


Ric





--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Eric Sandeen
On 7/27/13 11:56 AM, Lennart Poettering wrote:
 On Fri, 26.07.13 22:13, Miloslav Trmač (m...@volny.cz) wrote:
 
 Hello all,
 with thin provisioning available, the total and free space values
 reported by a filesystem do not necessarily mean that that much space
 is _actually_ available (the actual backing storage may be smaller, or
 shared with other filesystems).

 If your package reports disk space usage to users, and bases this on
 filesystem free space, please consider whether it might need to take
 LVM thin provisioning into account.

 The same applies if your package automatically allocates a certain
 proportion of the total or available space.

 A quick way to check whether your package is likely to be affected, is
 to look for statfs() or statvfs() calls in C, or the equivalent in
 your higher-level library / programming language.
 
 Well, I am pretty sure the burden must be on the file systems to report
 a useful estimate free blocks value in statfs()/statvfs(). Exporting that
 problem to userspace and expecting userspace to work around this is just
 wrong. In fact, this would be quite an API breakage if applications
 cannot rely that the value returned is at least a rough estimate on how
 much data can be stored on disk.
 
 journald will scale how much disk usage it will use of /var/log/journal
 based on the file system size and free level. It will also module the
 per-service rate limit levels based on the amount of free disk space. If
 you break the API of statfs()/statvfs(), then you will end up break this
 and all programs like it.

Any program needs to be prepared for ENOSPC; as Ric mentioned elsewhere,
until you successfully write to it, it's not yours! :)  (Ok, thinp
running out of space won't generate ENOSPC today, either, but you see
my general point...)

And how much space are we really talking about here?  If you're running
thin-provisioning on thin margins, especially w/o some way to automatically 
hot-add storage, you're probably doing it wrong.

(And if journald sees 100T free and decides it can use 50T of that,
it's doing it wrong, too) ;)

The truth is somewhere in the middle, but quibbling over whether this
app or that can claim a bit of space behind a thin-provisioned volume
probably isn't useful.

The admin definitely needs tools to see the state of thinly provisioned
storage, but that's the admin's job to worry about, not the app's, IMHO.

-Eric
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Daniel P. Berrange
On Mon, Jul 29, 2013 at 02:38:23PM -0400, Ric Wheeler wrote:
 On 07/29/2013 10:18 AM, Daniel P. Berrange wrote:
 On Mon, Jul 29, 2013 at 08:01:23AM -0600, Chris Murphy wrote:
 On Jul 29, 2013, at 6:30 AM, Daniel P. Berrange berra...@redhat.com 
 wrote:
 
 Yep, we need to be able to report free space on filesystems, so that
 apps provisioning virtual machines can get an idea of how much storage
 they can provide to VMs without risk of over comitting.
 
 I agree that we really want the kernel, or at least a reusable shared
 library, to provide some kind of interface to determine this, rather
 than requiring every userspace app which cares to re-invent the wheel.
 What does it mean for an app to use stat to get free space, and then
 proceeds to create too big a VM image in a directory that has a quota
 set? I still think apps are asking an inappropriate/unqualified question
 by asking for volume free space, instead of what's available to them for
 a specified path.
  From an API POV, libvirt doesn't need/care about the free space on the
 volume underlying the filesystem. We actually only care about the free
 space in a given directory that we're using for disk images. It just
 happens that we implement this using statvfs() currently. So when I
 ask for an API above, don't take this to mean I want a statvfs() that
 knows about sparse volumes. An API or syscall that provides free space
 for individual directories is fine with me.
 

 Just another note, it is never safe to assume that storage under any
 file system is yours for the taking.
 
 If application A does a stat or statvfs() call, sees 1GB of space
 left and then does a write, we could easily lose that race to any
 other application.

This race doesn't matter from libvirt's POV. It is just providing a
mechanism via its API. It is upto the management application using
libvirt to make use of the mechanism to provide a usage policy.
Their usage scenario may well enable them to make certain assumptions
about the storage that you could not otherwise do in a race free
manner.

In addition, even in more general purpose usage scenarios, it does
not neccessarily matter if there is a race, because there can be a
second line of defence. For example, KVM can be set to pause the VM
upon ENOSPC errors, giving management application or administrator
the chance to expand capacity the underlying storage and then unpause
the guest. In that case checking the free space is mostly just a
sanity check which serves to avoid hitting the pause-on-ENOSPC scenario
too frequently.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Steve Grubb
On Friday, July 26, 2013 09:29:41 PM Eric Sandeen wrote:
 On 7/26/13 3:13 PM, Miloslav Trmač wrote:
  A quick way to check whether your package is likely to be affected, is
  to look for statfs() or statvfs() calls in C, or the equivalent in
  your higher-level library / programming language.
 
 statfs will still tell you how much space the filesystem has allocated,
 as well as how much space it thinks it has left, based on the total
 space the disk has *said* it has available, just like it always ever
 did.
 
 The difference, of course, is that you might actually run out of blocks
 before you fill the fs.  But I can't think offhand what apps would care.
 And again, it's something the admin shouldn't let happen.

The audit system also cares about space available. We tell people to create a 
partition specifically for auditing so that we can keep close track on what's 
left. But we have the requirement that for people that depend on it to take 
away system access should the ability to record audit events fail. We also 
need an accurate estimation before we run out so we can send an admin defined 
warning when the disk has filled to a certain point so that they can archive 
files or make space available. 

If we run out of disk space and were not able to warn admins and this was a 
shop that really cares about auditing,  the system will either be shutdown or 
sent to single user mode for corrective action. So, having accurate space left 
numbers is real important so that systems don't get shutdown unexpectedly.

-Steve

 For now, consider it completely transparent to the user (unless the
 admin doesn't keep up, in which case it will be anything *but*
 transparent).
 
 TBH, when the backing store runs out of space, things do get pretty
 ugly at this point.  It's work that needs to be done to make it more
 robust  graceful.
 
 -Eric
 
   Mirek
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Ric Wheeler

On 07/29/2013 03:05 PM, Steve Grubb wrote:

On Friday, July 26, 2013 09:29:41 PM Eric Sandeen wrote:

On 7/26/13 3:13 PM, Miloslav Trmač wrote:

A quick way to check whether your package is likely to be affected, is
to look for statfs() or statvfs() calls in C, or the equivalent in
your higher-level library / programming language.

statfs will still tell you how much space the filesystem has allocated,
as well as how much space it thinks it has left, based on the total
space the disk has *said* it has available, just like it always ever
did.

The difference, of course, is that you might actually run out of blocks
before you fill the fs.  But I can't think offhand what apps would care.
And again, it's something the admin shouldn't let happen.

The audit system also cares about space available. We tell people to create a
partition specifically for auditing so that we can keep close track on what's
left. But we have the requirement that for people that depend on it to take
away system access should the ability to record audit events fail. We also
need an accurate estimation before we run out so we can send an admin defined
warning when the disk has filled to a certain point so that they can archive
files or make space available.

If we run out of disk space and were not able to warn admins and this was a
shop that really cares about auditing,  the system will either be shutdown or
sent to single user mode for corrective action. So, having accurate space left
numbers is real important so that systems don't get shutdown unexpectedly.

-Steve


Of course, if you simply can never run out of space and have a special file 
system/device of your own, you should use fully provisioned storage.


dm-thin is not about solving *every* problem :)

ric




For now, consider it completely transparent to the user (unless the
admin doesn't keep up, in which case it will be anything *but*
transparent).

TBH, when the backing store runs out of space, things do get pretty
ugly at this point.  It's work that needs to be done to make it more
robust  graceful.

-Eric


  Mirek


--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Chris Murphy

On Jul 29, 2013, at 1:05 PM, Steve Grubb sgr...@redhat.com wrote:
 
 The audit system also cares about space available. We tell people to create a 
 partition specifically for auditing so that we can keep close track on what's 
 left.

How does the audit system determine space available? If it's using btrfs 
configured for raid1 or raid10, df and stat will report the total storage of 
all devices in the volume, unlike md raid (or even proprietary raid). Instead 
df reports logs files as using twice as much space.


Chris Murphy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Chris Adams
Once upon a time, Chris Murphy li...@colorremedies.com said:
 How does the audit system determine space available? If it's using btrfs 
 configured for raid1 or raid10, df and stat will report the total storage of 
 all devices in the volume, unlike md raid (or even proprietary raid). Instead 
 df reports logs files as using twice as much space.

How is _anything_ (programs, users, admins) supposed to know how much
free space is left on btrfs?  This behavior seems like an admin's
nightmare.
-- 
Chris Adams li...@cmadams.net
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Ric Wheeler

On 07/29/2013 03:50 PM, Chris Adams wrote:

Once upon a time, Chris Murphy li...@colorremedies.com said:

How does the audit system determine space available? If it's using btrfs 
configured for raid1 or raid10, df and stat will report the total storage of 
all devices in the volume, unlike md raid (or even proprietary raid). Instead 
df reports logs files as using twice as much space.

How is _anything_ (programs, users, admins) supposed to know how much
free space is left on btrfs?  This behavior seems like an admin's
nightmare.


With copy on write file systems like btrfs, you can consume space on writes to 
existing files when overwriting them. You can even consume space by removing 
files :)


Ric

--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Lennart Poettering
On Mon, 29.07.13 13:48, Eric Sandeen (sand...@redhat.com) wrote:

  Well, I am pretty sure the burden must be on the file systems to report
  a useful estimate free blocks value in statfs()/statvfs(). Exporting that
  problem to userspace and expecting userspace to work around this is just
  wrong. In fact, this would be quite an API breakage if applications
  cannot rely that the value returned is at least a rough estimate on how
  much data can be stored on disk.
  
  journald will scale how much disk usage it will use of /var/log/journal
  based on the file system size and free level. It will also module the
  per-service rate limit levels based on the amount of free disk space. If
  you break the API of statfs()/statvfs(), then you will end up break this
  and all programs like it.
 
 Any program needs to be prepared for ENOSPC; as Ric mentioned elsewhere,
 until you successfully write to it, it's not yours! :)  (Ok, thinp
 running out of space won't generate ENOSPC today, either, but you see
 my general point...)

Oh, we don't assume it's all ours. We recheck regularly, immediately
before appending to the journal files, of course assuming that we are
not the only writers.

 And how much space are we really talking about here?  If you're running
 thin-provisioning on thin margins, especially w/o some way to automatically 
 hot-add storage, you're probably doing it wrong.

journald will by default allow the journal files to grow to 10% of the
filesystem size of /var/log/journal, but will make sure that 15% is
always kept free. 

This really is just about finding some useful parameters for sizing
things that are likely to just work. It's not at all about making any
algorithms depending on that, a way to avoid ENOSPC handling or anything
like that. It's just about finding some sensible default metircs.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Steve Grubb
On Monday, July 29, 2013 01:41:12 PM Chris Murphy wrote:
 On Jul 29, 2013, at 1:05 PM, Steve Grubb sgr...@redhat.com wrote:
  The audit system also cares about space available. We tell people to
  create a partition specifically for auditing so that we can keep close
  track on what's left.
 
 How does the audit system determine space available?

It uses fstatfs() on the descriptor currently opened for logging.

-Steve

 If it's using btrfs configured for raid1 or raid10, df and stat will report
 the total storage of all devices in the volume, unlike md raid (or even
 proprietary raid). Instead df reports logs files as using twice as much
 space.
 
 
 Chris Murphy
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Ric Wheeler

On 07/29/2013 04:35 PM, Lennart Poettering wrote:

On Mon, 29.07.13 13:48, Eric Sandeen (sand...@redhat.com) wrote:


Well, I am pretty sure the burden must be on the file systems to report
a useful estimate free blocks value in statfs()/statvfs(). Exporting that
problem to userspace and expecting userspace to work around this is just
wrong. In fact, this would be quite an API breakage if applications
cannot rely that the value returned is at least a rough estimate on how
much data can be stored on disk.

journald will scale how much disk usage it will use of /var/log/journal
based on the file system size and free level. It will also module the
per-service rate limit levels based on the amount of free disk space. If
you break the API of statfs()/statvfs(), then you will end up break this
and all programs like it.

Any program needs to be prepared for ENOSPC; as Ric mentioned elsewhere,
until you successfully write to it, it's not yours! :)  (Ok, thinp
running out of space won't generate ENOSPC today, either, but you see
my general point...)

Oh, we don't assume it's all ours. We recheck regularly, immediately
before appending to the journal files, of course assuming that we are
not the only writers.


With thinly provisioned storage (or things like btrfs, writeable snapshots, 
etc), you will not really ever know how much space is really there.





And how much space are we really talking about here?  If you're running
thin-provisioning on thin margins, especially w/o some way to automatically
hot-add storage, you're probably doing it wrong.

journald will by default allow the journal files to grow to 10% of the
filesystem size of /var/log/journal, but will make sure that 15% is
always kept free.

This really is just about finding some useful parameters for sizing
things that are likely to just work. It's not at all about making any
algorithms depending on that, a way to avoid ENOSPC handling or anything
like that. It's just about finding some sensible default metircs.

Lennart



I am starting to think that this is critical enough that we might want to always 
fully provision this - just like we would for audit logs


Checking won't hurt anything, but the storage stack will lie to you (and 
honestly, we always have in many cases :)).


There are some alerts that we can raise when you hit a low water mark for the 
device mapper physical pool, it would be interesting to talk about how you might 
leverage these.


Ric

--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Lennart Poettering
On Mon, 29.07.13 16:52, Ric Wheeler (rwhee...@redhat.com) wrote:

 Oh, we don't assume it's all ours. We recheck regularly, immediately
 before appending to the journal files, of course assuming that we are
 not the only writers.
 
 With thinly provisioned storage (or things like btrfs, writeable
 snapshots, etc), you will not really ever know how much space is
 really there.

Yeah, and that's an API regression.

On btrfs you can just add/remove device as you wish during runtime and
statvfs() does refelct this immediately. 

thinp should work the same. Of course, this requires that the block
layer has to pass more metadata up to the file systems than before, but
there's really nothing intrinsicly evil about that, I mean, it could be
as basic as just passing along a provisioning perentage or so which
the fs will simply multiply into the returned values... (Of
course it won't be that simple, but you get the concept...)

 I am starting to think that this is critical enough that we might
 want to always fully provision this - just like we would for audit
 logs
 
 Checking won't hurt anything, but the storage stack will lie to you
 (and honestly, we always have in many cases :)).

Well, journald is totally fine if it is lied to in the sense that the
values returned by statfs()/statvfs() are just estimates, and not
precise. However, it is assumed that the values are not off by  100% as
they might be on thinp... 

That the values are not perfectly accurate has been known forever. Since
file systems existed developers knew that book-keeping and stuff means
the returned valuea are slightly higher than practically reachable. And
since compressed file systems they also knew that they might be lower
than actually reachable. However, it's one thing to return bad
estimates, and it is another thing to be totally off in the woods as is
the case for thinp!

 There are some alerts that we can raise when you hit a low water
 mark for the device mapper physical pool, it would be interesting to
 talk about how you might leverage these.

Well, the point I am making is that it is wrong to ask userspace to
handle this. Get the APIs right you expose to userspace. 

I mean, ultimately for me it doesn't matter I geuss, since you say
neither the fs/block layer nor userspace should care, but that this is
the admin's problem, but that really sounds like chickening out to
me... 

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Simo Sorce
On Mon, 2013-07-29 at 23:06 +0200, Lennart Poettering wrote:
 Well, the point I am making is that it is wrong to ask userspace to
 handle this. Get the APIs right you expose to userspace. 

If user space assume it can use 'all the space up to 15% from exhausting
space' then it is user space that is wrong to me.
Even w/o thin provisioning you do not want any application to
boundlessly consume terabytes of space just because it happen to sit on
a *big* disk.
Applications may use heuristics to better behave in 'common' situations,
but should limit themselves also on the max space they are going to
consume in general (or better have admin controllable knobs to do so
coupled with reasonable defaults).

 I mean, ultimately for me it doesn't matter I geuss, since you say
 neither the fs/block layer nor userspace should care, but that this is
 the admin's problem, but that really sounds like chickening out to
 me... 

Given the admin is the only one that knows for any given situation what
is more important to him I do not think there is much more you can do.
Sure you can set defaults or what not but there isn't any configuration
that will ever be right short of something that can read minds.

What you can do is give admin knobs to tweak in user space, as well as
in the kernel. So that applications can limit themselves based on
configuration, lacking those knobs you need to provide mechanisms a la
cgroups that are used to hard limit misbehaving apps that think they are
at an all-you-can-it buffet.

Simo.
 
-- 
Simo Sorce * Red Hat, Inc * New York

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Ric Wheeler

On 07/29/2013 05:06 PM, Lennart Poettering wrote:

On Mon, 29.07.13 16:52, Ric Wheeler (rwhee...@redhat.com) wrote:


Oh, we don't assume it's all ours. We recheck regularly, immediately
before appending to the journal files, of course assuming that we are
not the only writers.

With thinly provisioned storage (or things like btrfs, writeable
snapshots, etc), you will not really ever know how much space is
really there.

Yeah, and that's an API regression.


It is actually  not an API regression, this is how file systems have always 
operated on enterprise storage (including writeable snapshots) and, to all 
practical purposes, whenever you are running in a multi-application environment.


In effect, there never was an API that gave you what you want outside of the 
write(2) system call :)


On btrfs you can just add/remove device as you wish during runtime and
statvfs() does refelct this immediately.


btrfs consumes space on each write to the same block.

If you have a 10GB file system with a 5GB, existing log file and overwrite it 
twice in place, you will run out of space.




thinp should work the same. Of course, this requires that the block
layer has to pass more metadata up to the file systems than before, but
there's really nothing intrinsicly evil about that, I mean, it could be
as basic as just passing along a provisioning perentage or so which
the fs will simply multiply into the returned values... (Of
course it won't be that simple, but you get the concept...)


I would argue that it is working how it should work. If you want fully 
provisioned storage and are a single application/single user file system, you 
can configure your box that way.


Thin provisioned storage - by design - has a pool of real storage that is shared 
across all file systems that sit on devices that it serves.  On SAN volumes, 
that exactly means you share the physical storage pool across multiple hosts and 
all of their file systems.


The way it works assumes:

* the system administrator understands thin provisioned storage and the system 
workload to some rough level
* the sys admin set the water marks appropriately so that when we hit a low 
water mark, we can add physical storage to the pool


There is no magic pony here for you - if you configure thin, you mean to use it 
to lie to the users and their file systems for a valid reason.


Applications can do whatever they want as long as the sys admin monitors the box 
properly and has a way to add storage when needed.


Think just in time storage provisioning.




I am starting to think that this is critical enough that we might
want to always fully provision this - just like we would for audit
logs

Checking won't hurt anything, but the storage stack will lie to you
(and honestly, we always have in many cases :)).

Well, journald is totally fine if it is lied to in the sense that the
values returned by statfs()/statvfs() are just estimates, and not
precise. However, it is assumed that the values are not off by  100% as
they might be on thinp...


Or on btrfs or on copy on write LVM (not just ours, but hardware LVM) snapshots, 
etc.


Or if a large application is running that is about to do a pre-allocation of the 
rest of the free data.


The heuristic you assume does not work in any but the most constrained of all 
use cases.




That the values are not perfectly accurate has been known forever. Since
file systems existed developers knew that book-keeping and stuff means
the returned valuea are slightly higher than practically reachable. And
since compressed file systems they also knew that they might be lower
than actually reachable. However, it's one thing to return bad
estimates, and it is another thing to be totally off in the woods as is
the case for thinp!


This is not new or unique to thinp.




There are some alerts that we can raise when you hit a low water
mark for the device mapper physical pool, it would be interesting to
talk about how you might leverage these.

Well, the point I am making is that it is wrong to ask userspace to
handle this. Get the APIs right you expose to userspace.

I mean, ultimately for me it doesn't matter I geuss, since you say
neither the fs/block layer nor userspace should care, but that this is
the admin's problem, but that really sounds like chickening out to
me...



Not chickening out, just working as designed. If you don't like this, you need 
to use traditional, fully provisioned storage and not use copy on write 
technologies (like btrfs or LVM writeable snapshots).


Apparently we have lied to  you so well over the years that you just never 
noticed the reality of many other misleading IO stack configurations :)


Ric

--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Chris Murphy

On Jul 29, 2013, at 3:34 PM, Simo Sorce s...@redhat.com wrote:

 
 btrfs consumes space on each write to the same block.
 
 If you have a 10GB file system with a 5GB, existing log file and overwrite it 
 twice in place, you will run out of space.

It's a sufficiently confusing example, that I almost wish df (the typical 
user-space tool to learn of free space) would lie for btrfs volumes by 
subtracting 15% in the report. The raid1/10 case is even more confusing, but 
only because users now expect to be lied to that they have 1/2 the storage 
space compared to what they purchased. Btrfs is telling the whole truth about 
the total available storage and that data is taking twice the allocation.

So again it's managing user expectations. Maybe df should persist in lying 
somehow (although difficult with mixed raid levels in a single volume), and the 
more lower level stat and btrfs tools should tell the deeper story?


 There is no magic pony here for you - if you configure thin, you mean to use 
 it to lie to the users and their file systems for a valid reason.

And not new. Qcow has allowed the situation for some time.

A legitimate concern is how the file system behaves when its virtual storage 
suddenly runs out of backing storage space; that it can fail semi-gracefully 
(i.e. without file system corruption, even if there would be data loss).


Chris Murphy
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-29 Thread Chris Murphy

On Jul 29, 2013, at 3:06 PM, Lennart Poettering mzerq...@0pointer.de wrote:
 
 Well, journald is totally fine if it is lied to in the sense that the
 values returned by statfs()/statvfs() are just estimates, and not
 precise. However, it is assumed that the values are not off by  100% as
 they might be on thinp... 

e.g. A VM has a 1TB virtual device backed by a qcow2 file residing on a file 
system with 100GB free space, on a hard drive. That's  1000%.

It's not just thinp, this has been going on for a long time. So if journald 
wants to know something more real in the case of thinp, why not a passthrough 
from real file system to virtual file system in the case of qcow2?


 Well, the point I am making is that it is wrong to ask userspace to
 handle this. Get the APIs right you expose to userspace. 

Effectively it seems for a long time now there hasn't been an API exposing the 
information you think user space requires. Apps using stat are asking a 
question of an API that by design only knows about the most immediate file 
system, not anything beyond which may be backing it. The request for free space 
in the immediate file system seems vaguely reasonable, but needing information 
beyond the file system seems specious.

 I mean, ultimately for me it doesn't matter I geuss, since you say
 neither the fs/block layer nor userspace should care, but that this is
 the admin's problem, but that really sounds like chickening out to
 me… 

It may be that admins are going to need better tools to assist them in 
monitoring before crisis events occur. But it does seem the admin shouldn't be 
creating 16TB qcow2 files on 1TB real backing, any more than they should do the 
same with thinp. They're just asking for trouble, the lie is too big.


Chris Murphy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-28 Thread Dennis Jacobfeuerborn

On 27.07.2013 05:07, Chris Murphy wrote:


On Jul 26, 2013, at 4:53 PM, Pádraig Brady p...@draigbrady.com wrote:


On 07/26/2013 09:13 PM, Miloslav Trmač wrote:

Hello all,
with thin provisioning available, the total and free space values
reported by a filesystem do not necessarily mean that that much space
is _actually_ available (the actual backing storage may be smaller, or
shared with other filesystems).

If your package reports disk space usage to users, and bases this on
filesystem free space, please consider whether it might need to take
LVM thin provisioning into account.

The same applies if your package automatically allocates a certain
proportion of the total or available space.

A quick way to check whether your package is likely to be affected, is
to look for statfs() or statvfs() calls in C, or the equivalent in
your higher-level library / programming language.


Anything df(1) should do here?


Example: Creating a btrfs raid1 volume from two 2TB drives, df shows it as 
having 4TB available:

# parted -l

Error: /dev/sdb: unrecognised disk label
Model: ATA VBOX HARDDISK (scsi)
Disk /dev/sdb: 2199GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

Error: /dev/sdc: unrecognised disk label
Model: ATA VBOX HARDDISK (scsi)
Disk /dev/sdc: 2199GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

# mkfs.btrfs -d raid1 -m raid1 /dev/sd[bc]

WARNING! - Btrfs v0.20-rc1 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

adding device /dev/sdc id 2
fs created label (null) on /dev/sdb
nodesize 4096 leafsize 4096 sectorsize 4096 size 4.00TB
Btrfs v0.20-rc1

# mount /dev/sdb /mnt
#  df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda179G  4.2G   71G   6% /
devtmpfs1.5G 0  1.5G   0% /dev
tmpfs   1.5G 0  1.5G   0% /dev/shm
tmpfs   1.5G  680K  1.5G   1% /run
tmpfs   1.5G 0  1.5G   0% /sys/fs/cgroup
tmpfs   1.5G  4.0K  1.5G   1% /tmp
none224G   87G  138G  39% /media/sf_chris
/dev/sdb4.0T   56K  4.0T   1% /mnt


The explanation is that the file system isn't raid1, but rather the allocated 
chunks have this attribute. Presently a volume only allocates with one profile, 
but the future plan is per subvolume and even per file raid profiles. So 
establishing how much free space there is on a btrfs volume is absolutely less 
than clear.

Anyway, I think it will cause some confusion if by available an application 
thinks it can write out more than 2TB of data to this example volume.


I thought one of the features of combining the block layer and 
filesystem layer like btrfs does is that the filesystem can actually 
know the state/topology of the block layer and work more efficiently. 
Combined with the already existing problem of getting out of diskspace 
errors long before use hits 100% (has this been fixed since?) this makes 
any sort of capacity planning difficult if not impossible.


Regards,
  Dennis

--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-27 Thread Lennart Poettering
On Fri, 26.07.13 22:13, Miloslav Trmač (m...@volny.cz) wrote:

 Hello all,
 with thin provisioning available, the total and free space values
 reported by a filesystem do not necessarily mean that that much space
 is _actually_ available (the actual backing storage may be smaller, or
 shared with other filesystems).
 
 If your package reports disk space usage to users, and bases this on
 filesystem free space, please consider whether it might need to take
 LVM thin provisioning into account.
 
 The same applies if your package automatically allocates a certain
 proportion of the total or available space.
 
 A quick way to check whether your package is likely to be affected, is
 to look for statfs() or statvfs() calls in C, or the equivalent in
 your higher-level library / programming language.

Well, I am pretty sure the burden must be on the file systems to report
a useful estimate free blocks value in statfs()/statvfs(). Exporting that
problem to userspace and expecting userspace to work around this is just
wrong. In fact, this would be quite an API breakage if applications
cannot rely that the value returned is at least a rough estimate on how
much data can be stored on disk.

journald will scale how much disk usage it will use of /var/log/journal
based on the file system size and free level. It will also module the
per-service rate limit levels based on the amount of free disk space. If
you break the API of statfs()/statvfs(), then you will end up break this
and all programs like it.

Note that btrfs RAID is broken in a similar way: it will return the
amount of actual free blocks to the user. Since if RAID is enabled each
file however requires twice (or some other factor) the number of blocks
the value is completely bogus. The btrfs RAID userspace API is simply
broken.

The accepted way to get an estimate how much disk space is still
available is statfs()/statvfs(), applications and admins rely on the
values it returns. You cannot just break that and think you can get away
with it.

The thin provisioning folks need to find a way to make this work, not
userspace programmers. 

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-27 Thread Lennart Poettering
On Fri, 26.07.13 21:29, Eric Sandeen (sand...@redhat.com) wrote:

 On 7/26/13 3:13 PM, Miloslav Trmač wrote:
  Hello all,
  with thin provisioning available, the total and free space values
  reported by a filesystem do not necessarily mean that that much space
  is _actually_ available (the actual backing storage may be smaller, or
  shared with other filesystems).
  
  If your package reports disk space usage to users, and bases this on
  filesystem free space, please consider whether it might need to take
  LVM thin provisioning into account.
 
 Short answer: it doesn't (it can't).
 
 Just like an application doesn't know if it's got a 2.5 or 3.5 drive
 behind it, or cloud behind it, or a usb stick behind it, it doesn't
 know if it's got thinly provisioned storage behind it.

Well, correct me if I am wrong but don't RAID devices communicate
certain metrics to the file systems on them already? (stride size
iirc?). It doesn't sound too difficult to communicate the thin
provisioning ratio as well, and then leave it to the file system to
scale the report disk free space.

  The same applies if your package automatically allocates a certain
  proportion of the total or available space.
 
 I can't imagine that anything actually does that, does it?
 Good lord I hope not.  ;)

journald does that (see other mail).

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-27 Thread Richard W.M. Jones
On Sat, Jul 27, 2013 at 07:02:41PM +0200, Lennart Poettering wrote:
 On Fri, 26.07.13 21:29, Eric Sandeen (sand...@redhat.com) wrote:
  On 7/26/13 3:13 PM, Miloslav Trmač wrote:
   Hello all,
   with thin provisioning available, the total and free space values
   reported by a filesystem do not necessarily mean that that much space
   is _actually_ available (the actual backing storage may be smaller, or
   shared with other filesystems).
   
   If your package reports disk space usage to users, and bases this on
   filesystem free space, please consider whether it might need to take
   LVM thin provisioning into account.
  
  Short answer: it doesn't (it can't).
  
  Just like an application doesn't know if it's got a 2.5 or 3.5 drive
  behind it, or cloud behind it, or a usb stick behind it, it doesn't
  know if it's got thinly provisioned storage behind it.
 
 Well, correct me if I am wrong but don't RAID devices communicate
 certain metrics to the file systems on them already? (stride size
 iirc?). It doesn't sound too difficult to communicate the thin
 provisioning ratio as well, and then leave it to the file system to
 scale the report disk free space.

It's been like this since thin-provisioned SANs first came along, so
twenty or more years.  It's got a lot worse because of widespread
virtualization and the use of raw-sparse, VMDK and qcow2.

I agree with you that underlying storage really ought to communicate
this information up to the kernel (as has been recently done with
alignment information, see [1]).

I also appreciate this will not be easy with the sheer variety of
underlying storage.  Also the problem is not well-defined: What
happens if the underlying storage is storing two guests, and either of
them could grow to the full size on their own, but if both together
did it, we'd run out of space?  What does thin provisioning ratio
mean for this?

Rich.

[1] 
http://libguestfs.org/virt-alignment-scan.1.html#linux-host-block-and-i-o-size


-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-27 Thread Matthew Miller
On Sat, Jul 27, 2013 at 06:25:57PM +0100, Richard W.M. Jones wrote:
 I also appreciate this will not be easy with the sheer variety of
 underlying storage.  Also the problem is not well-defined: What
 happens if the underlying storage is storing two guests, and either of
 them could grow to the full size on their own, but if both together
 did it, we'd run out of space?  What does thin provisioning ratio
 mean for this?

And, yes -- overprovisioning is certainly a huge use case.

-- 
Matthew Miller  ☁☁☁  Fedora Cloud Architect  ☁☁☁  mat...@fedoraproject.org
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-27 Thread Chris Murphy

On Jul 27, 2013, at 10:56 AM, Lennart Poettering mzerq...@0pointer.de wrote:
 
 Well, I am pretty sure the burden must be on the file systems to report
 a useful estimate free blocks value in statfs()/statvfs().

tl;dr

4 VMs, each using one thinp LV. Each LV has a virtualsize of 1TB. The VG 
backing those LVs is 1TB. If each LV actually is using only 150GB, the real 
free space in the VG is 400GB. 

But how to you propose informing each VMs of the real free space? Are they all 
informed there's 400GB of free space? Or do you just do a simple scaling and 
tell them 400GB/4 is free?

OK well what if 2 of those VMs actively make use of snapshotting? The scaling 
approach quickly isn't going to work out for any of the VMs.

I think the burden is on the virtual storage layer designer/implementer. He 
shouldn't make 1TB virtualsize LVs, when only 150GB is needed. The idea isn't 
to use thinp to totally eliminate the need to ever grow an LV and the 
underlying fs, but to reduce the need (perhaps significantly).



 Note that btrfs RAID is broken in a similar way: it will return the
 amount of actual free blocks to the user. Since if RAID is enabled each
 file however requires twice (or some other factor) the number of blocks
 the value is completely bogus. The btrfs RAID userspace API is simply
 broken.

It's a problem. I'm unconvinced it's broken.

As I mentioned earlier, a btrfs volume as a whole doesn't have a raid profile 
set. It's the subvolume (or possibly a file). Because the work isn't done to 
enable per subvolume or per file raid profiles, this is done at mkfs time. But 
this actually only sets the profile for the default subvolume, not the whole 
file system. It just seems it is that way now. So you could argue that in the 
meantime, btrfs devs should punt, and report free space similar to md and lvm 
raid.

Long term fix seems to require the application making a more qualified inquiry. 
Asking free space for a whole volume that it may not even have write permission 
for seems unreasonable. It should instead ask for free space for a particular 
path. The actual write location might be a directory with a quota that must be 
honored; or a subvolume with a raid1 data profile set.

The program asking for volume free space is a totally ambiguous inquiry.


 The accepted way to get an estimate how much disk space is still
 available is statfs()/statvfs(), applications and admins rely on the
 values it returns. You cannot just break that and think you can get away
 with it.

Sorry, this is a half empty vs half full problem. A solution won't be found by 
disregarding the other perspective; as a consequence to calling it broken, 
you're saying to not break it we can't have per subvolume or per file raid. And 
that's less acceptable than the original problem, which really is that some 
programs are making unacceptably vague and grandiose inquiries about free space 
availability.


 
 The thin provisioning folks need to find a way to make this work, not
 userspace programmers. 

99.9% of userspace programs are writing out pretty small files, at a rate 
that's fairly knowable. They are thus well behaved. A handful of applications 
want to know how much free space there is, as if the answer entitles them to 
use all or most of that free space, compared to some other program that asks at 
the same time?

I think the expectation programs can get ballpark free space information for a 
volume was probably always unreasonable, it's just that thin provisioning is 
making this more clear.

Most burden is on the user implementer who creates virtualsize LVs to not make 
them too big. And then I think there is some burden on programs to make more 
qualified inquiries for free space available.


Chris Murphy


-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-26 Thread Miloslav Trmač
Hello all,
with thin provisioning available, the total and free space values
reported by a filesystem do not necessarily mean that that much space
is _actually_ available (the actual backing storage may be smaller, or
shared with other filesystems).

If your package reports disk space usage to users, and bases this on
filesystem free space, please consider whether it might need to take
LVM thin provisioning into account.

The same applies if your package automatically allocates a certain
proportion of the total or available space.

A quick way to check whether your package is likely to be affected, is
to look for statfs() or statvfs() calls in C, or the equivalent in
your higher-level library / programming language.
 Mirek
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-26 Thread DJ Delorie

 If your package reports disk space usage to users, and bases this on
 filesystem free space, please consider whether it might need to take
 LVM thin provisioning into account.

Perhaps you could include a small code snippet explaining *how* to do
this?  Is there an lvm_thin_statfs() we can use?
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-26 Thread Miloslav Trmač
On Fri, Jul 26, 2013 at 10:17 PM, DJ Delorie d...@redhat.com wrote:

 If your package reports disk space usage to users, and bases this on
 filesystem free space, please consider whether it might need to take
 LVM thin provisioning into account.

 Perhaps you could include a small code snippet explaining *how* to do
 this?  Is there an lvm_thin_statfs() we can use?

I'd love to, but I don't know how.  David, could you suggest something, please?
Mirek
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-26 Thread drago01
On Fri, Jul 26, 2013 at 10:59 PM, Miloslav Trmač m...@volny.cz wrote:
 On Fri, Jul 26, 2013 at 10:17 PM, DJ Delorie d...@redhat.com wrote:

 If your package reports disk space usage to users, and bases this on
 filesystem free space, please consider whether it might need to take
 LVM thin provisioning into account.

 Perhaps you could include a small code snippet explaining *how* to do
 this?  Is there an lvm_thin_statfs() we can use?

 I'd love to, but I don't know how.  David, could you suggest something, 
 please?

The same issue exits for btrfs ... shouldn't we somehow try to get a
generic api for the exotic file systems?
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-26 Thread Richard W.M. Jones
On Fri, Jul 26, 2013 at 10:13:42PM +0200, Miloslav Trmač wrote:
 Hello all,
 with thin provisioning available, the total and free space values
 reported by a filesystem do not necessarily mean that that much space
 is _actually_ available (the actual backing storage may be smaller, or
 shared with other filesystems).
 
 If your package reports disk space usage to users, and bases this on
 filesystem free space, please consider whether it might need to take
 LVM thin provisioning into account.

I guess virt-df could be such a package.

 The same applies if your package automatically allocates a certain
 proportion of the total or available space.
 
 A quick way to check whether your package is likely to be affected, is
 to look for statfs() or statvfs() calls in C, or the equivalent in
 your higher-level library / programming language.

What code needs to be used in addition/instead?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-26 Thread Richard W.M. Jones
On Fri, Jul 26, 2013 at 10:13:42PM +0200, Miloslav Trmač wrote:
 Hello all,
 with thin provisioning available, the total and free space values
 reported by a filesystem do not necessarily mean that that much space
 is _actually_ available (the actual backing storage may be smaller, or
 shared with other filesystems).
 
 If your package reports disk space usage to users, and bases this on
 filesystem free space, please consider whether it might need to take
 LVM thin provisioning into account.
 
 The same applies if your package automatically allocates a certain
 proportion of the total or available space.
 
 A quick way to check whether your package is likely to be affected, is
 to look for statfs() or statvfs() calls in C, or the equivalent in
 your higher-level library / programming language.

Also libvirt has a whole set of APIs around storage and
free space.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-26 Thread Chris Murphy

On Jul 26, 2013, at 3:02 PM, drago01 drag...@gmail.com wrote:

 On Fri, Jul 26, 2013 at 10:59 PM, Miloslav Trmač m...@volny.cz wrote:
 On Fri, Jul 26, 2013 at 10:17 PM, DJ Delorie d...@redhat.com wrote:
 
 If your package reports disk space usage to users, and bases this on
 filesystem free space, please consider whether it might need to take
 LVM thin provisioning into account.
 
 Perhaps you could include a small code snippet explaining *how* to do
 this?  Is there an lvm_thin_statfs() we can use?
 
 I'd love to, but I don't know how.  David, could you suggest something, 
 please?
 
 The same issue exits for btrfs ... shouldn't we somehow try to get a
 generic api for the exotic file systems?

I don't think btrfs yet supports thin provisioning itself via quotas. The 
quotas code is still in flux in any case. I have had some problems with lvm 
thinp and btrfs that happen more rarely with xfs. The thinp LV is somewhat 
large, 16TB for the available RAM on the machine, 4GB. But I don't have the 
same problems when using qcow2 to thin provision the same amount of space.

But isn't the idea that the file system itself isn't really aware of how much 
actual space is available? That's up to the manager of the thin provisioned 
space, in this case LVM. Not the file system.


Chris Murphy
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-26 Thread David Lehman
On Fri, 2013-07-26 at 22:59 +0200, Miloslav Trmač wrote:
 On Fri, Jul 26, 2013 at 10:17 PM, DJ Delorie d...@redhat.com wrote:
 
  If your package reports disk space usage to users, and bases this on
  filesystem free space, please consider whether it might need to take
  LVM thin provisioning into account.
 
  Perhaps you could include a small code snippet explaining *how* to do
  this?  Is there an lvm_thin_statfs() we can use?
 
 I'd love to, but I don't know how.  David, could you suggest something, 
 please?

As noted by drago01, this is not exactly new or specific to thinp -- a
similar situation exists with btrfs. You would have to ask the
developers of lvm and btrfs for a way to decode the magic. I don't know
any manageable solution to this problem.

 Mirek


-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-26 Thread Chris Murphy

On Jul 26, 2013, at 3:34 PM, David Lehman dleh...@redhat.com wrote:

 On Fri, 2013-07-26 at 22:59 +0200, Miloslav Trmač wrote:
 On Fri, Jul 26, 2013 at 10:17 PM, DJ Delorie d...@redhat.com wrote:
 
 If your package reports disk space usage to users, and bases this on
 filesystem free space, please consider whether it might need to take
 LVM thin provisioning into account.
 
 Perhaps you could include a small code snippet explaining *how* to do
 this?  Is there an lvm_thin_statfs() we can use?
 
 I'd love to, but I don't know how.  David, could you suggest something, 
 please?
 
 As noted by drago01, this is not exactly new or specific to thinp -- a
 similar situation exists with btrfs. 

The RAID 1/10 problem where df reports free space as the size of the volume, 
not the actual amount of stuff that can be stored on the volume? Or is there 
another case?

It does seem to be a problem that df works this way with btrfs. It's one thing 
for 'btrfs df' to do it's own thing, but for df to do this doesn't seem like a 
good idea.


Chris Murphy
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-26 Thread Pádraig Brady
On 07/26/2013 09:13 PM, Miloslav Trmač wrote:
 Hello all,
 with thin provisioning available, the total and free space values
 reported by a filesystem do not necessarily mean that that much space
 is _actually_ available (the actual backing storage may be smaller, or
 shared with other filesystems).
 
 If your package reports disk space usage to users, and bases this on
 filesystem free space, please consider whether it might need to take
 LVM thin provisioning into account.
 
 The same applies if your package automatically allocates a certain
 proportion of the total or available space.
 
 A quick way to check whether your package is likely to be affected, is
 to look for statfs() or statvfs() calls in C, or the equivalent in
 your higher-level library / programming language.

Anything df(1) should do here?


-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-26 Thread Eric Sandeen
On 7/26/13 3:13 PM, Miloslav Trmač wrote:
 Hello all,
 with thin provisioning available, the total and free space values
 reported by a filesystem do not necessarily mean that that much space
 is _actually_ available (the actual backing storage may be smaller, or
 shared with other filesystems).
 
 If your package reports disk space usage to users, and bases this on
 filesystem free space, please consider whether it might need to take
 LVM thin provisioning into account.

Short answer: it doesn't (it can't).

Just like an application doesn't know if it's got a 2.5 or 3.5 drive
behind it, or cloud behind it, or a usb stick behind it, it doesn't
know if it's got thinly provisioned storage behind it.

It's up to the administrator to make sure that the thinly provisioned
device doesn't fill up  run out of space, but if it does, there's
nothing an app can do about that.

There's also no standard interface to query how full your thinly
provisioned storage is; it depends on what's implementing the thin
provisioning.  So again, nothing an app can do/query/handle/change.

 The same applies if your package automatically allocates a certain
 proportion of the total or available space.

I can't imagine that anything actually does that, does it?
Good lord I hope not.  ;)
 
 A quick way to check whether your package is likely to be affected, is
 to look for statfs() or statvfs() calls in C, or the equivalent in
 your higher-level library / programming language.

statfs will still tell you how much space the filesystem has allocated,
as well as how much space it thinks it has left, based on the total
space the disk has *said* it has available, just like it always ever
did.

The difference, of course, is that you might actually run out of blocks
before you fill the fs.  But I can't think offhand what apps would care.
And again, it's something the admin shouldn't let happen.

For now, consider it completely transparent to the user (unless the
admin doesn't keep up, in which case it will be anything *but*
transparent).

TBH, when the backing store runs out of space, things do get pretty
ugly at this point.  It's work that needs to be done to make it more
robust  graceful.

-Eric

  Mirek
 

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

2013-07-26 Thread Chris Murphy

On Jul 26, 2013, at 4:53 PM, Pádraig Brady p...@draigbrady.com wrote:

 On 07/26/2013 09:13 PM, Miloslav Trmač wrote:
 Hello all,
 with thin provisioning available, the total and free space values
 reported by a filesystem do not necessarily mean that that much space
 is _actually_ available (the actual backing storage may be smaller, or
 shared with other filesystems).
 
 If your package reports disk space usage to users, and bases this on
 filesystem free space, please consider whether it might need to take
 LVM thin provisioning into account.
 
 The same applies if your package automatically allocates a certain
 proportion of the total or available space.
 
 A quick way to check whether your package is likely to be affected, is
 to look for statfs() or statvfs() calls in C, or the equivalent in
 your higher-level library / programming language.
 
 Anything df(1) should do here?

Example: Creating a btrfs raid1 volume from two 2TB drives, df shows it as 
having 4TB available:

# parted -l

Error: /dev/sdb: unrecognised disk label
Model: ATA VBOX HARDDISK (scsi)   
Disk /dev/sdb: 2199GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags: 

Error: /dev/sdc: unrecognised disk label
Model: ATA VBOX HARDDISK (scsi)   
Disk /dev/sdc: 2199GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags: 

# mkfs.btrfs -d raid1 -m raid1 /dev/sd[bc]

WARNING! - Btrfs v0.20-rc1 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

adding device /dev/sdc id 2
fs created label (null) on /dev/sdb
nodesize 4096 leafsize 4096 sectorsize 4096 size 4.00TB
Btrfs v0.20-rc1

# mount /dev/sdb /mnt
#  df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda179G  4.2G   71G   6% /
devtmpfs1.5G 0  1.5G   0% /dev
tmpfs   1.5G 0  1.5G   0% /dev/shm
tmpfs   1.5G  680K  1.5G   1% /run
tmpfs   1.5G 0  1.5G   0% /sys/fs/cgroup
tmpfs   1.5G  4.0K  1.5G   1% /tmp
none224G   87G  138G  39% /media/sf_chris
/dev/sdb4.0T   56K  4.0T   1% /mnt


The explanation is that the file system isn't raid1, but rather the allocated 
chunks have this attribute. Presently a volume only allocates with one profile, 
but the future plan is per subvolume and even per file raid profiles. So 
establishing how much free space there is on a btrfs volume is absolutely less 
than clear.

Anyway, I think it will cause some confusion if by available an application 
thinks it can write out more than 2TB of data to this example volume.


Chris Murphy
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct