Re: RFC: Use of VOP_ALLOCATE() by NFSV4.2 nfsd

2021-10-13 Thread Stefan Esser
Am 10.10.21 um 05:52 schrieb Alan Somers:
> On Sat, Oct 9, 2021 at 7:13 PM Rick Macklem  wrote>> 
> This leads me to a couple of questions:
>> - Is there a good reason for not using vop_stdallocate() for ZFS?
> 
> Yes.  posix_fallocate is supposed to guarantee that subsequent writes
> to the file will not fail with ENOSPC.  But ZFS, being a copy-on-write
> file system, cannot possibly guarantee that.  See SVN r325320.

This is not entirely true: ZFS supports reservations and it could
thus support the pre-allocation of space that is later "filled".
This reservations would be substracted from the free space sum,
and it would be guaranteed that this free space is available for
the file for which the pre-allocation has been requested.

This would require that the allocate() call recorded the block
range for which an allocation is requested (and for which no
disk blocks are currently allocated) without assignment of any
backing blocks at that time.

Later writes to that range would allocate disk blocks and at the
same time reduce the amount that is reserved and remove that range
(that is now allocated) from the recorded pre-allocation range.

This would of course require the addition of block ranges that
are reserved but not yet backed by disk blocks to the znode, and
of the total count of blocks reserved for this purpose in addition
to other types of reservations in a separate variable.

>> - Should I try and support both file system types via vop_stdallocate()
>>   or not support Allocate at all?
> 
> Since you can't possibly support it for ZFS (not to mention other file
> systems like fusefs) you'll have to not support it at all.

While I do think that an allocate() operation could be implemented
in ZFS, it is obvious that this does not apply to all possible
fusefs filesystems (which do not even need to support the concept
of an allocation of blocks or ranges).

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: RFC: Use of VOP_ALLOCATE() by NFSV4.2 nfsd

2021-10-10 Thread Willem Jan Withagen via freebsd-current

On 10-10-2021 07:57, Rick Macklem wrote:



This leads me to a couple of questions:
- Is there a good reason for not using vop_stdallocate() for ZFS?

Yes.  posix_fallocate is supposed to guarantee that subsequent writes
to the file will not fail with ENOSPC.  But ZFS, being a copy-on-write
file system, cannot possibly guarantee that.  See SVN r325320.

However, vop_stdallocate() just does VOP_WRITE()s to the area (with
bytes of data all zeros). Wouldn't that satisfy the criteria?

I had the same problem in Ceph, where a guaranteed writable space is 
required
for keeping a log of modifications to the system. Not having this space 
might case loss of data.


Writing al zero's is probably even worse on filesystems that have 
compression set.

Almost nothing is allocated, and so no guarantee at all.
Next trick wass to write random data, but then you run into the problem 
signaled by

Alan and Warner. New writes will need free space, since the CoW nature.

Solution was to actually create a specific zpool just for this.
But that will not help you with NFS 4.2 I guess

--WjW




Re: RFC: Use of VOP_ALLOCATE() by NFSV4.2 nfsd

2021-10-10 Thread Rick Macklem
Alan Somers wrote:
[stuff snipped]
>Rick Macklem wrote:
> >Alan Somers wrote:
> > >Yes.  posix_fallocate is supposed to guarantee that subsequent writes
> > >to the file will not fail with ENOSPC.  But ZFS, being a copy-on-write
> > >file system, cannot possibly guarantee that.  See SVN r325320.
> > However, vop_stdallocate() just does VOP_WRITE()s to the area (with
> > bytes of data all zeros). Wouldn't that satisfy the criteria?
>
> No.  It works for UFS, which is an overwriting file system.  But for
> ZFS, when the user comes back later to rewrite those same offsets, ZFS
> will actually allocate new LBAs for them.
Eighto. I get it now.

Looks like I must disable it in the server, unless there is a way to enable
it on a per file system basis (which I don not believe is the case for NFSv4.2,
although that isn't completely clear from the RFC, which says each operation
is optional, but does not mention "per file system").

Thanks everyone, for your replies, rick

>
> >> - Should I try and support both file system types via vop_stdallocate()
> >>   or not support Allocate at all?
> >
> >Since you can't possibly support it for ZFS (not to mention other file
> >systems like fusefs) you'll have to not support it at all.
> It does sound like not supporting it is the best alternative.
>
> rick
>
> >
> > Btw, as a bit of an aside, "cc" uses posix_fallocate() and in weird ways,
> > such as offset=0, len=1. Why, I have no idea?
> >
> > Thanks in advance for any comments, rick
> >



Re: RFC: Use of VOP_ALLOCATE() by NFSV4.2 nfsd

2021-10-10 Thread Alan Somers
On Sat, Oct 9, 2021 at 11:57 PM Rick Macklem  wrote:
>
> Alan Somers wrote:
> >On Sat, Oct 9, 2021 at 7:13 PM Rick Macklem  wrote:
> >>
> >> Hi,
> >>
> >> I ran into an issue this week during the nf...@ietf.org's testing event.
> >> UFS - supports VOP_ALLOCATE() by using vop_stdallocate().
> >> ZFS - just return EINVAL for VOP_ALLOCATE().
> >>
> >> An NFSv4.2 server can either support Allocate or not, but it has to be
> >> for all exported file systems.
> >
> >That seems like a protocol bug to me.  Could this be fixed in a future
> >NFS revision?
> Who knows. I don't see any interest in a 4.3. 4.2 is extensible, but I think
> this is now "cast in stone".
>
> >>
> >> This leads me to a couple of questions:
> >> - Is there a good reason for not using vop_stdallocate() for ZFS?
> >
> >Yes.  posix_fallocate is supposed to guarantee that subsequent writes
> >to the file will not fail with ENOSPC.  But ZFS, being a copy-on-write
> >file system, cannot possibly guarantee that.  See SVN r325320.
> However, vop_stdallocate() just does VOP_WRITE()s to the area (with
> bytes of data all zeros). Wouldn't that satisfy the criteria?

No.  It works for UFS, which is an overwriting file system.  But for
ZFS, when the user comes back later to rewrite those same offsets, ZFS
will actually allocate new LBAs for them.

>
> >> - Should I try and support both file system types via vop_stdallocate()
> >>   or not support Allocate at all?
> >
> >Since you can't possibly support it for ZFS (not to mention other file
> >systems like fusefs) you'll have to not support it at all.
> It does sound like not supporting it is the best alternative.
>
> rick
>
> >
> > Btw, as a bit of an aside, "cc" uses posix_fallocate() and in weird ways,
> > such as offset=0, len=1. Why, I have no idea?
> >
> > Thanks in advance for any comments, rick
> >



Re: RFC: Use of VOP_ALLOCATE() by NFSV4.2 nfsd

2021-10-10 Thread Warner Losh
On Sat, Oct 9, 2021, 11:58 PM Rick Macklem  wrote:

> Alan Somers wrote:
> >On Sat, Oct 9, 2021 at 7:13 PM Rick Macklem  wrote:
> >>
> >> Hi,
> >>
> >> I ran into an issue this week during the nf...@ietf.org's testing
> event.
> >> UFS - supports VOP_ALLOCATE() by using vop_stdallocate().
> >> ZFS - just return EINVAL for VOP_ALLOCATE().
> >>
> >> An NFSv4.2 server can either support Allocate or not, but it has to be
> >> for all exported file systems.
> >
> >That seems like a protocol bug to me.  Could this be fixed in a future
> >NFS revision?
> Who knows. I don't see any interest in a 4.3. 4.2 is extensible, but I
> think
> this is now "cast in stone".
>
> >>
> >> This leads me to a couple of questions:
> >> - Is there a good reason for not using vop_stdallocate() for ZFS?
> >
> >Yes.  posix_fallocate is supposed to guarantee that subsequent writes
> >to the file will not fail with ENOSPC.  But ZFS, being a copy-on-write
> >file system, cannot possibly guarantee that.  See SVN r325320.
> However, vop_stdallocate() just does VOP_WRITE()s to the area (with
> bytes of data all zeros). Wouldn't that satisfy the criteria?
>

Since it is log based, that would make it worse. The blocks aren't
instantly reclaimed when marked invalid. So you'd need storage for both and
the 0d blocks could cause a resource shortage when the real writes come in.
ZFS doesn't have a reservation system to reserve blocks in the log for a
given file...

Warner

>> - Should I try and support both file system types via vop_stdallocate()
> >>   or not support Allocate at all?
> >
> >Since you can't possibly support it for ZFS (not to mention other file
> >systems like fusefs) you'll have to not support it at all.
> It does sound like not supporting it is the best alternative.
>
> rick
>
> >
> > Btw, as a bit of an aside, "cc" uses posix_fallocate() and in weird ways,
> > such as offset=0, len=1. Why, I have no idea?
> >
> > Thanks in advance for any comments, rick
> >
>
>


Re: RFC: Use of VOP_ALLOCATE() by NFSV4.2 nfsd

2021-10-09 Thread Rick Macklem
Alan Somers wrote:
>On Sat, Oct 9, 2021 at 7:13 PM Rick Macklem  wrote:
>>
>> Hi,
>>
>> I ran into an issue this week during the nf...@ietf.org's testing event.
>> UFS - supports VOP_ALLOCATE() by using vop_stdallocate().
>> ZFS - just return EINVAL for VOP_ALLOCATE().
>>
>> An NFSv4.2 server can either support Allocate or not, but it has to be
>> for all exported file systems.
>
>That seems like a protocol bug to me.  Could this be fixed in a future
>NFS revision?
Who knows. I don't see any interest in a 4.3. 4.2 is extensible, but I think
this is now "cast in stone".

>>
>> This leads me to a couple of questions:
>> - Is there a good reason for not using vop_stdallocate() for ZFS?
>
>Yes.  posix_fallocate is supposed to guarantee that subsequent writes
>to the file will not fail with ENOSPC.  But ZFS, being a copy-on-write
>file system, cannot possibly guarantee that.  See SVN r325320.
However, vop_stdallocate() just does VOP_WRITE()s to the area (with
bytes of data all zeros). Wouldn't that satisfy the criteria?

>> - Should I try and support both file system types via vop_stdallocate()
>>   or not support Allocate at all?
>
>Since you can't possibly support it for ZFS (not to mention other file
>systems like fusefs) you'll have to not support it at all.
It does sound like not supporting it is the best alternative.

rick

>
> Btw, as a bit of an aside, "cc" uses posix_fallocate() and in weird ways,
> such as offset=0, len=1. Why, I have no idea?
>
> Thanks in advance for any comments, rick
>



Re: RFC: Use of VOP_ALLOCATE() by NFSV4.2 nfsd

2021-10-09 Thread Rick Macklem



From: Alan Somers 
Sent: Saturday, October 9, 2021 11:52 PM
To: Rick Macklem
Cc: FreeBSD Current
Subject: Re: RFC: Use of VOP_ALLOCATE() by NFSV4.2 nfsd

CAUTION: This email originated from outside of the University of Guelph. Do not 
click links or open attachments unless you recognize the sender and know the 
content is safe. If in doubt, forward suspicious emails to ith...@uoguelph.ca


On Sat, Oct 9, 2021 at 7:13 PM Rick Macklem  wrote:
>
> Hi,
>
> I ran into an issue this week during the nf...@ietf.org's testing event.
> UFS - supports VOP_ALLOCATE() by using vop_stdallocate().
> ZFS - just return EINVAL for VOP_ALLOCATE().
>
> An NFSv4.2 server can either support Allocate or not, but it has to be
> for all exported file systems.

That seems like a protocol bug to me.  Could this be fixed in a future
NFS revision?

>
> This leads me to a couple of questions:
> - Is there a good reason for not using vop_stdallocate() for ZFS?

Yes.  posix_fallocate is supposed to guarantee that subsequent writes
to the file will not fail with ENOSPC.  But ZFS, being a copy-on-write
file system, cannot possibly guarantee that.  See SVN r325320.

> - Should I try and support both file system types via vop_stdallocate()
>   or not support Allocate at all?

Since you can't possibly support it for ZFS (not to mention other file
systems like fusefs) you'll have to not support it at all.

>
> Btw, as a bit of an aside, "cc" uses posix_fallocate() and in weird ways,
> such as offset=0, len=1. Why, I have no idea?
>
> Thanks in advance for any comments, rick
>



Re: RFC: Use of VOP_ALLOCATE() by NFSV4.2 nfsd

2021-10-09 Thread Alan Somers
On Sat, Oct 9, 2021 at 7:13 PM Rick Macklem  wrote:
>
> Hi,
>
> I ran into an issue this week during the nf...@ietf.org's testing event.
> UFS - supports VOP_ALLOCATE() by using vop_stdallocate().
> ZFS - just return EINVAL for VOP_ALLOCATE().
>
> An NFSv4.2 server can either support Allocate or not, but it has to be
> for all exported file systems.

That seems like a protocol bug to me.  Could this be fixed in a future
NFS revision?

>
> This leads me to a couple of questions:
> - Is there a good reason for not using vop_stdallocate() for ZFS?

Yes.  posix_fallocate is supposed to guarantee that subsequent writes
to the file will not fail with ENOSPC.  But ZFS, being a copy-on-write
file system, cannot possibly guarantee that.  See SVN r325320.

> - Should I try and support both file system types via vop_stdallocate()
>   or not support Allocate at all?

Since you can't possibly support it for ZFS (not to mention other file
systems like fusefs) you'll have to not support it at all.

>
> Btw, as a bit of an aside, "cc" uses posix_fallocate() and in weird ways,
> such as offset=0, len=1. Why, I have no idea?
>
> Thanks in advance for any comments, rick
>



RFC: Use of VOP_ALLOCATE() by NFSV4.2 nfsd

2021-10-09 Thread Rick Macklem
Hi,

I ran into an issue this week during the nf...@ietf.org's testing event.
UFS - supports VOP_ALLOCATE() by using vop_stdallocate().
ZFS - just return EINVAL for VOP_ALLOCATE().

An NFSv4.2 server can either support Allocate or not, but it has to be
for all exported file systems.

This leads me to a couple of questions:
- Is there a good reason for not using vop_stdallocate() for ZFS?
- Should I try and support both file system types via vop_stdallocate()
  or not support Allocate at all?

Btw, as a bit of an aside, "cc" uses posix_fallocate() and in weird ways,
such as offset=0, len=1. Why, I have no idea?

Thanks in advance for any comments, rick