Chris,
        There are several issues here.  Please find comments in-line
below...

        Cheers,
        Don

>Date: Mon, 27 Aug 2007 16:04:42 -0500
>From: Chris Kirby <chris.kirby at sun.com>
>Subject: Re: [zfs-code] statvfs change
>To: johansen-osdev at sun.com
>Cc: ufs-discuss at opensolaris.org, don.cragun at sun.com, zfs-code at 
>opensolaris.org
>
>johansen-osdev at sun.com wrote:
>> Can you explain in a bit more detail why we're doing this?  I probably
>> don't understand the issue in sufficient detail.  It seems like the
>> large file compilation environment, lfcompile(5), exists to solve this
>> exact problem.  Isn't it the application's responsibility to properly
>> handle EOVERFLOW or choose an interface that can handle file offsets
>> that are greater than 2Gbytes?  Is there something obvious here that I'm
>> missing?
>> 
>
>It's not a large file issue, it's a large *filesystem* issue
>that revolves around f_frsize unit reporting via the cstatvfs32
>interface.  f_blocks, f_bfree, and f_bavail are all reported in
>units of f_frsize.

ZFS has large file and large filesystem issues.  But, those of us who
participated in the Large File Summit (vendors and consumers who
jointly produced the Large File Summit Specification and did the work
to get the non-transitional interfaces integrated into X/Open's (now
The Open Group's) X/Open Portability Guide, Issue 4, Version 2 remember
the discussions that led to the creation of the EOVERFLOW errno value.
The base point is that any time you lie to applications, some
application software is going to make wrong decisions based on the lie.

>
>For ZFS, we report f_frsize as 512 regardless of the size of
>the fs.  ...

Why?  Why shouldn't you always set f_frsize to the actual size of an
allocation unit on the filesystem?  Is it still true that we don't
support disks formatted with 1024 byte sectors?

>   ...   This means we can only express vfs size up to
>UINT32_MAX * 512 bytes.  That's not a terribly large fs
>by today's standards. Anything larger will result in EOVERFLOW
>from statvfs.
>
>You're entirely correct that it's the application's responsibility
>to deal with EOVERFLOW, perhaps by using statvfs64.  But if we can
>return valid information instead of an error, that seems like a
>good thing.

When you cap f_files, f_ffree, and f_favail at UINT32_MAX when the
correct values for these fields are larger; you are not returning valid
information.

You may be returning "valid" values for f_frsize, f_blocks, f_bfree,
and f_bavail; but you aren't checking to see if that is true or not.
(If shifting f_blocks, f_bfree, or f_bavail right throws away a bit
that was not a zero bit; the scaled values being returned are not
valid.)

Since the statvfs(2) and statvfs.h(3HEAD) man pages don't state any
relationship between f_bsize and f_frsize, applications may well have
made their own assumptions.  Is there documentation somewhere that
specifies how many bytes should be written at a time (on boundaries
that is a multiple of that value) to get the most efficiency out of
the underlying hardware?  I would hope that f_bsize would be that
value.  If it is, it seems that f_bsize should be an integral multiple
of f_frsize.

>
>-Chris


Reply via email to