[zfs-code] statvfs change

Chris Kirby Wed, 29 Aug 2007 10:55:34 -0500

Don Cragun wrote:
>>From: Chris Kirby <chris.kirby at sun.com>

>>>When you cap f_files, f_ffree, and f_favail at UINT32_MAX when the
>>>correct values for these fields are larger; you are not returning valid
>>>information.
>>
>>
>>I think it's valid in the sense that you will be able to create at
>>least UINT32_MAX files.  Of course once you've done so,
>>we might still report that you can create UINT32_MAX
>>additional files.  :-)
> 
> 
> You may also find an app (A) that checks number of free files, removes
> a few and then crashes because it was supposed to be running on a quiet
> machine and has now detected that some other app (B) is creating files
> as fast as A can remove them.  (B doesn't really exist, but since the
> number of free files isn't rising, A has to assume that B is active.)


Remember that f_ffiles is in no way under an application's
control.  That value usually represents an inode count, which
is not necessarily updated synchronously at unlink time.  The
fs itself might be doing background activity (garbage collection)
that could cause this count to fluctuate on an otherwise idle system.

Plus, the fs is free to report whatever number it wants here.  It can
even be the same number all the time.  QFS, which does dynamic
inode allocation, always reports f_ffiles as -1.

None of {ZFS,QFS,UFS} even bother to grab a lock in their statvfs
handlers, so as Boyd Adamson correctly pointed out, all of the statvfs
values are rumors at best anyway.


>>
>>But the setting of f_bsize is up to the underlying fs.  And at least
>>for QFS, UFS, and ZFS, its value is not scaled based on f_frsize.
>>That's also why I don't rescale f_bsize.
> 
> 
> Correct.  I'm not suggesting that statvfs() should scale f_bsize; I'm
> saying that if you scale f_frsize, some application may be think its
> world has turned upside down because the relationship it thought
> existed between f_frsize and f_bsize is no longer true.


Given that we don't document any relationship between those two
fields, I'm not sure why that would be a valid assumption.

AFAICS, these values aren't even required to be constant for
consecutive calls to the same vfs.

> 
> I believe statvfs() should be returning an error condition with errno
> set to EOVERFLOW and that applications that run into the EOVERFLOW
> should be fixed to handle the brave new world of large filesystems.
> 
> By the logic you're using, we would not have needed to change the df
> utility to be large filesystem aware; we should have just let it
> truncate the number of blocks it said were available for all
> filesystems to 32-bit values. For a sysadmin that wants to know if the
> ZFS filesystem that was just created came out at the correct size, this
> clearly is not sufficient; but for "most" users who just want to know
> if there is room to create a file, it will meet their needs perfectly.

I'm not suggesting that we just truncate the number of blocks.
We're also adjusting the unit size (f_frsize) so that
(f_blocks * f_frsize) is the same before and after the rescaling.

Yes, there might be a rounding error of up to (f_frsize - 1) *bytes*.
So on a 1PB fs, we might be off by (256K - 1) bytes.  That's
0.000000026 percent.  It doesn't seem like it would generate
many support calls.

> 
> For any particular call to statvfs(), the system won't know whether the
> discarded low order bits of f_blocks, f_bfree, and f_bavail and the
> high order bits of f_files, f_ffree, and f_favail are important or
> not.  The only safe thing to do is report the overflows and fix the
> applications that get the resulting EOVERFLOW errors.

Even if an application decides that those bits were important (and
we're talking about the very last tiny bit of space on the fs),
there are currently no guarantees that those few bytes would be
available to a user anyway.  Growing a file by one byte could still
result in ENOSPC if the fs needed more than the remaining bytes for
its own data structures.

I agree that broken apps should be fixed.  But that's not always
possible.  And I guess I still don't see where we would be breaking
any well-behaved applications.

-Chris

[zfs-code] statvfs change

Reply via email to