Re: Hole-punching, TRIM, etc

2018-11-13 Thread Poul-Henning Kamp

In message 
, Warner Losh writes:

>On a raw device it would be translated into a BIO_DELETE command directly,
>correct?

We already have ioctl(DIOCGDELETE) for that.  newfs(8) uses it.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Hole-punching, TRIM, etc

2018-11-13 Thread Conrad Meyer
On Tue, Nov 13, 2018 at 2:59 PM Alan Somers  wrote:
>
> On Tue, Nov 13, 2018 at 3:51 PM Conrad Meyer  wrote:
>>
>> On Tue, Nov 13, 2018 at 2:10 PM Alan Somers  wrote:
>> > ...
>> > 8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
>>
>> Why not just add DIOCGDELETE support to various VOP_IOCTL
>> implementations?  The file objects forward correctly through vn_ioctl
>> to VOP_IOCTL for both regular files and devfs VCHR nodes.
>>
>> We can emulate the Linux API if we want to be compatible there, but I
>> wouldn't bother with Solaris.
>
> The only reason that I prefer the Solaris API is because it doesn't require 
> adding another syscall, and because Linux's fallocate(2) does a whole bunch 
> of other things besides hole-punching.

I am imagining that if we went this route, we would implement Linux
fallocate as a library shim around the native FreeBSD ioctl (or
whatever) rather than an independent system call.  This would be for
API compatibility, not ABI compatibility.  But Linux compat can be set
aside for now, I think — it's a secondary concern.

> What about an asynchronous version?  ioctl(2) is still synchronous.  Do you 
> see any better way to hole-punch/TRIM asynchronously than with aio?

Yeah, this is a good consideration.  No, I don't have any better
suggestion for an asynchronous API.  In general our VOPs tend to be
synchronous.  Aio does seem like the logical home for a new
asynchronous API.

Best regards,
Conrad
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Hole-punching, TRIM, etc

2018-11-13 Thread Warner Losh
On Tue, Nov 13, 2018 at 3:52 PM Conrad Meyer  wrote:

> Geom devices have the DIOCGDELETE ioctl, which translates into
> BIO_DELETE (which is TRIM, as I understand it).
>

Correct. TRIM is both the catch-all term people use, as well as the name of
a specific DSM (data set management) command in the ATA command set. All
FLASH technologies have it (thought what it means under the covers varies a
bit). Thin provisioned resources like in VMs also have it.

Warner
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Hole-punching, TRIM, etc

2018-11-13 Thread Alan Somers
On Tue, Nov 13, 2018 at 3:51 PM Warner Losh  wrote:

>
>
> On Tue, Nov 13, 2018 at 3:10 PM Alan Somers  wrote:
>
>> Hole-punching has been discussed on these lists before[1].  It basically
>> means to turn a dense file into a sparse file by deallocating storage for
>> some of the blocks in the middle.  There's no standard API for it.  Linux
>> uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
>>
>> A related concept is telling a block device that some blocks are no longer
>> used.  SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
>> "Deallocate", ZBC and ZAC call it "Reset Write Pointer".  They all do
>> basically the same thing, and it's analogous to hole-punching for regular
>> files.  They are also all inaccessible from FreeBSD's userland except by
>> using pass(4), which is inconvenient and protocol-specific.
>>
>> Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
>> but it's totally undocumented and doesn't work on regular files.
>>
>> I propose adding support for all of these things using the fcntl(2) API.
>> Using the same syntax that Solaris defined, you would be able to punch a
>> hole in a regular file or TRIM blocks from an SSD.  ZFS already supports
>> it
>> (though FreeBSD's port never did, and the code was deleted in r303763).
>> Here's what I would do:
>>
>> 1) Add the F_FREESP command to fcntl(2).
>> 2) Add a .fo_space field for struct fileops
>> 3) Add a devfs_space method that implements .fo_space
>> 4) Add a .d_space field to struct cdevsw
>> 5) Add a g_dev_space method for GEOM that implements .d_space using
>> BIO_DELETE.
>> 6) Add a VOP_SPACE vop
>> 7) Implement VOP_SPACE for tmpfs
>> 8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
>>
>> The greatest beneficiaries of this work would be type 2 hypervisors like
>> QEMU and VirtualBox with guests that use TRIM, and userland filesystems
>> such as fusefs-ext2 and fusefs-exfat.  High-performance storage systems
>> using SPDK would also benefit.  The last item, aio_freesp(2), may seem
>> unnecessary but it would really benefit my application.
>>
>> Questions, objections, flames?
>>
>
> So the fcntl would deallocate blocks from a filesystem only. The
> filesystem may issue BIO_DELETE as a result, but that's up to the
> filesystem, correct?
>

Correct.


>
> On a raw device it would be translated into a BIO_DELETE command directly,
> correct?
>

Correct, modulo edge cases.


>
> Warner
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Hole-punching, TRIM, etc

2018-11-13 Thread Alan Somers
On Tue, Nov 13, 2018 at 3:51 PM Conrad Meyer  wrote:

> Hi Alan,
>
> On Tue, Nov 13, 2018 at 2:10 PM Alan Somers  wrote:
> >
> > Hole-punching has been discussed on these lists before[1].  It basically
> > means to turn a dense file into a sparse file by deallocating storage for
> > some of the blocks in the middle.  There's no standard API for it.  Linux
> > uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
> >
> > A related concept is telling a block device that some blocks are no
> longer
> > used.  SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
> > "Deallocate", ZBC and ZAC call it "Reset Write Pointer".  They all do
> > basically the same thing, and it's analogous to hole-punching for regular
> > files.  They are also all inaccessible from FreeBSD's userland except by
> > using pass(4), which is inconvenient and protocol-specific.
>
> Geom devices have the DIOCGDELETE ioctl, which translates into
> BIO_DELETE (which is TRIM, as I understand it).  It's available in
> libgeom as g_delete() and used by hastd, newfs_nandfs, and nandtool.
>

Ahh, I thought there must be such a thing, but I couldn't find it.


>
> > Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from
> userland,
> > but it's totally undocumented and doesn't work on regular files.
> >
> > I propose adding support for all of these things using the fcntl(2) API.
> > Using the same syntax that Solaris defined, you would be able to punch a
> > hole in a regular file or TRIM blocks from an SSD.  ZFS already supports
> it
> > (though FreeBSD's port never did, and the code was deleted in r303763).
> > Here's what I would do:
> >
> > 1) Add the F_FREESP command to fcntl(2).
> > 2) Add a .fo_space field for struct fileops
> > 3) Add a devfs_space method that implements .fo_space
> > 4) Add a .d_space field to struct cdevsw
> > 5) Add a g_dev_space method for GEOM that implements .d_space using
> > BIO_DELETE.
> > 6) Add a VOP_SPACE vop
> > 7) Implement VOP_SPACE for tmpfs
> > 8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
>
> Why not just add DIOCGDELETE support to various VOP_IOCTL
> implementations?  The file objects forward correctly through vn_ioctl
> to VOP_IOCTL for both regular files and devfs VCHR nodes.
>
> We can emulate the Linux API if we want to be compatible there, but I
> wouldn't bother with Solaris.
>

The only reason that I prefer the Solaris API is because it doesn't require
adding another syscall, and because Linux's fallocate(2) does a whole bunch
of other things besides hole-punching.

What about an asynchronous version?  ioctl(2) is still synchronous.  Do you
see any better way to hole-punch/TRIM asynchronously than with aio?


>
> Best,
> Conrad
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Hole-punching, TRIM, etc

2018-11-13 Thread Conrad Meyer
Hi Alan,

On Tue, Nov 13, 2018 at 2:10 PM Alan Somers  wrote:
>
> Hole-punching has been discussed on these lists before[1].  It basically
> means to turn a dense file into a sparse file by deallocating storage for
> some of the blocks in the middle.  There's no standard API for it.  Linux
> uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
>
> A related concept is telling a block device that some blocks are no longer
> used.  SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
> "Deallocate", ZBC and ZAC call it "Reset Write Pointer".  They all do
> basically the same thing, and it's analogous to hole-punching for regular
> files.  They are also all inaccessible from FreeBSD's userland except by
> using pass(4), which is inconvenient and protocol-specific.

Geom devices have the DIOCGDELETE ioctl, which translates into
BIO_DELETE (which is TRIM, as I understand it).  It's available in
libgeom as g_delete() and used by hastd, newfs_nandfs, and nandtool.

> Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
> but it's totally undocumented and doesn't work on regular files.
>
> I propose adding support for all of these things using the fcntl(2) API.
> Using the same syntax that Solaris defined, you would be able to punch a
> hole in a regular file or TRIM blocks from an SSD.  ZFS already supports it
> (though FreeBSD's port never did, and the code was deleted in r303763).
> Here's what I would do:
>
> 1) Add the F_FREESP command to fcntl(2).
> 2) Add a .fo_space field for struct fileops
> 3) Add a devfs_space method that implements .fo_space
> 4) Add a .d_space field to struct cdevsw
> 5) Add a g_dev_space method for GEOM that implements .d_space using
> BIO_DELETE.
> 6) Add a VOP_SPACE vop
> 7) Implement VOP_SPACE for tmpfs
> 8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).

Why not just add DIOCGDELETE support to various VOP_IOCTL
implementations?  The file objects forward correctly through vn_ioctl
to VOP_IOCTL for both regular files and devfs VCHR nodes.

We can emulate the Linux API if we want to be compatible there, but I
wouldn't bother with Solaris.

Best,
Conrad
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Hole-punching, TRIM, etc

2018-11-13 Thread Warner Losh
On Tue, Nov 13, 2018 at 3:10 PM Alan Somers  wrote:

> Hole-punching has been discussed on these lists before[1].  It basically
> means to turn a dense file into a sparse file by deallocating storage for
> some of the blocks in the middle.  There's no standard API for it.  Linux
> uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
>
> A related concept is telling a block device that some blocks are no longer
> used.  SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
> "Deallocate", ZBC and ZAC call it "Reset Write Pointer".  They all do
> basically the same thing, and it's analogous to hole-punching for regular
> files.  They are also all inaccessible from FreeBSD's userland except by
> using pass(4), which is inconvenient and protocol-specific.
>
> Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
> but it's totally undocumented and doesn't work on regular files.
>
> I propose adding support for all of these things using the fcntl(2) API.
> Using the same syntax that Solaris defined, you would be able to punch a
> hole in a regular file or TRIM blocks from an SSD.  ZFS already supports it
> (though FreeBSD's port never did, and the code was deleted in r303763).
> Here's what I would do:
>
> 1) Add the F_FREESP command to fcntl(2).
> 2) Add a .fo_space field for struct fileops
> 3) Add a devfs_space method that implements .fo_space
> 4) Add a .d_space field to struct cdevsw
> 5) Add a g_dev_space method for GEOM that implements .d_space using
> BIO_DELETE.
> 6) Add a VOP_SPACE vop
> 7) Implement VOP_SPACE for tmpfs
> 8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
>
> The greatest beneficiaries of this work would be type 2 hypervisors like
> QEMU and VirtualBox with guests that use TRIM, and userland filesystems
> such as fusefs-ext2 and fusefs-exfat.  High-performance storage systems
> using SPDK would also benefit.  The last item, aio_freesp(2), may seem
> unnecessary but it would really benefit my application.
>
> Questions, objections, flames?
>

So the fcntl would deallocate blocks from a filesystem only. The filesystem
may issue BIO_DELETE as a result, but that's up to the filesystem, correct?

On a raw device it would be translated into a BIO_DELETE command directly,
correct?

Warner
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Hole-punching, TRIM, etc

2018-11-13 Thread Alan Somers
Hole-punching has been discussed on these lists before[1].  It basically
means to turn a dense file into a sparse file by deallocating storage for
some of the blocks in the middle.  There's no standard API for it.  Linux
uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).

A related concept is telling a block device that some blocks are no longer
used.  SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
"Deallocate", ZBC and ZAC call it "Reset Write Pointer".  They all do
basically the same thing, and it's analogous to hole-punching for regular
files.  They are also all inaccessible from FreeBSD's userland except by
using pass(4), which is inconvenient and protocol-specific.

Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
but it's totally undocumented and doesn't work on regular files.

I propose adding support for all of these things using the fcntl(2) API.
Using the same syntax that Solaris defined, you would be able to punch a
hole in a regular file or TRIM blocks from an SSD.  ZFS already supports it
(though FreeBSD's port never did, and the code was deleted in r303763).
Here's what I would do:

1) Add the F_FREESP command to fcntl(2).
2) Add a .fo_space field for struct fileops
3) Add a devfs_space method that implements .fo_space
4) Add a .d_space field to struct cdevsw
5) Add a g_dev_space method for GEOM that implements .d_space using
BIO_DELETE.
6) Add a VOP_SPACE vop
7) Implement VOP_SPACE for tmpfs
8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).

The greatest beneficiaries of this work would be type 2 hypervisors like
QEMU and VirtualBox with guests that use TRIM, and userland filesystems
such as fusefs-ext2 and fusefs-exfat.  High-performance storage systems
using SPDK would also benefit.  The last item, aio_freesp(2), may seem
unnecessary but it would really benefit my application.

Questions, objections, flames?

-Alan

[1] https://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010881.html
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"