[zfs-code] [ufs-discuss] statvfs change

Frank Hofmann Wed, 29 Aug 2007 19:41:53 +0100 (BST)

On Wed, 29 Aug 2007, Chris Kirby wrote:

[ ... ]
> None of {ZFS,QFS,UFS} even bother to grab a lock in their statvfs
> handlers, so as Boyd Adamson correctly pointed out, all of the statvfs
> values are rumors at best anyway.


That is not correct for UFS. This piece:

         /*
          * Adjust the numbers based on things waiting to be deleted.
          * modifies f_bfree and f_ffree.  Afterwards, everything we
          * come up with will be self-consistent.  By definition, this
          * is a point-in-time snapshot, so the fact that the delete
          * thread's probably already invalidated the results is not a
          * problem.  Note that if the delete thread is ever extended to
          * non-logging ufs, this adjustment must always be made.
          */
         if (TRANS_ISTRANS(ufsvfsp))
                 ufs_delete_adjust_stats(ufsvfsp, sp);

in ufs_statvfs() is grabbing locks. And it was introduced because some 
"nitpickers" claimed guaranteed point-in-time consistency of statvfs() is 
mandated by the standards, and arguing that "the values are out of date 
as soon as the syscall returns no matter what" wouldn't matter in that 
context. For the gory details, see:

5012326 delay between unlink/close and space becoming available may be
        arbitrarily long

and then:

6251659 statvfs taking 30 second to complete

Still wondering sometimes whether it'd been the right thing to claim 
"statvfs must be 100% accurate".

[ ... ]
> I'm not suggesting that we just truncate the number of blocks.
> We're also adjusting the unit size (f_frsize) so that
> (f_blocks * f_frsize) is the same before and after the rescaling.
>
> Yes, there might be a rounding error of up to (f_frsize - 1) *bytes*.
> So on a 1PB fs, we might be off by (256K - 1) bytes.  That's
> 0.000000026 percent.  It doesn't seem like it would generate
> many support calls.

Even if. As was mentioned about ZFS behaviour a few times, ZFS will 
"compactify" small files anyway (meaning that, even if the FS were full to 
that degree, extending _small_ files may be possible), and since it 
optionally also does disk compression the amount of free space can only be 
estimated.
The point, in particular for ZFS, can be made that the accurracy of all 
statvfs() return values is, by design, not to the 'least significant bit'. 
Hence, as long as things like "space free" is the lower bound of what 
really is free, it's performing its task.

Don's statement about the standard requiring "X free - not X+1" is, hmm, 
to say it politically correct, "difficult to meet with a filesystem that 
does compress/tail-compact".
The "X not X-1" is, on the other hand, to be taken very seriously. I 
cannot claim to have free space if I haven't, and that's what the above 
UFS bugfixes have been about.

I agree with Don that we can't lie about having free space if we don't. 
I also agree that statvfs() and statvfs64() shouldn't return different 
values.

But I don't see why we couldn't round up the used space / round (down) the 
free space to some value/blocksize chosen by the filesystem itself. We 
just have to make this consistent. If:

        statvfs()
        statvfs64()
        64bit statvfs()

all return the same values, and:

        - freeing a unit fs_frsize adds '1' to fs_blkfree
        - allocating a unit fs_frsize subtracts '1' from fs_blkfree
        - an alloc attempt when fs_blkfree is zero must fail

are met, we're standards compliant, aren't we ?

The standard, for statvfs(), doesn't make any statement about what happens 
on attempts to allocate/free amounts _less than_ fs_frsize bytes, except 
that "if I release N bytes I must be able to allocate N bytes again - to 
the same file".

As I read the proposal, the blocked behaviour is exactly what's being 
requested, isn't it ?

We have to remember that statvfs returns _BLOCKED_ units. Space 
(non)availability is judged in these units. And actions like:

        - release fs_frsize/N from N different files ==> fs_blkfree++
        - fs_blkfree == 1, alloc fs_frsize/N to N different files succeeds

are _NOT_ covered by the standard as "must (not) fail".

If I'm wrong about this, then please provide pointers. I'm very 
interested.

Thanks,
FrankH.

>
>>
>> For any particular call to statvfs(), the system won't know whether the
>> discarded low order bits of f_blocks, f_bfree, and f_bavail and the
>> high order bits of f_files, f_ffree, and f_favail are important or
>> not.  The only safe thing to do is report the overflows and fix the
>> applications that get the resulting EOVERFLOW errors.
>
> Even if an application decides that those bits were important (and
> we're talking about the very last tiny bit of space on the fs),
> there are currently no guarantees that those few bytes would be
> available to a user anyway.  Growing a file by one byte could still
> result in ENOSPC if the fs needed more than the remaining bytes for
> its own data structures.
>
> I agree that broken apps should be fixed.  But that's not always
> possible.  And I guess I still don't see where we would be breaking
> any well-behaved applications.

A 32bit application that uses statvfs() in a non-largefile compile 
environment - aka not getting statvfs64() - isn't "well behaved". It's way 
over a decade that the LF64 interfaces were introduced. That's a while 
before anyone used the term 'Java' or 'Netbeans Installer'.



>
> -Chris
>
> _______________________________________________
> ufs-discuss mailing list
> ufs-discuss at opensolaris.org
>

[zfs-code] [ufs-discuss] statvfs change

Reply via email to