On 31/03/2016 16:27, Alex Bligh wrote: > > > IE why not always permit trimming PROVIDED the data always reads back > > > as zero? This would be far simpler. > > > > Because trimming can make future operations more expensive and cause > > fragmentation (which may not be as bad as it used to be at the media > > level, but it is still somewhat bad at the filesystem level). > > > > So if you want a fully-provisioned file, the simplest way to do so is to > > write zeroes to it, and trimming is undesirable. > But isn't the server in a better position to know this than the > client?
There are at least three possible states for a sector: - hole (thin-provisioned) - allocated as data (disk contains actual zeroes) - allocated as unwritten (blocks reserved on backing storage, reads as zeroes but the disk may not contain actual zeroes) It's always okay for the backend to convert a zero block to an unwritten extent; it's generally not okay for a backend to take a request to create an unwritten extent and instead create a hole. It's all an "as if" situation. The server must provide the semantics requested by the client. For example, writing to a hole could cause ENOSPC, writing to an unwritten extend could not. The server might know better, because it certainly is in a better position to know how to fulfill the client's request. But even if it's just a hint, it makes sense for NBD to provide it. It's not a coincidence that this hint exists at all levels: SCSI has an UNMAP bit that can be set in the WRITE SAME command (and it has UNMAP which matches NBD's TRIM); the fallocate system call has FALLOC_FL_ZERO_RANGE and FALLOC_FL_PUNCH_HOLE (plus Linux has the BLKDISCARD ioctl which again matches NBD's TRIM for block devices). > EG if the server has a back end implementation (as I suspect > Ceph on qemu-nbd does) Ceph doesn't, but gluster does. > which never actually stores all zero blocks, > it won't make a difference, and conceivably you're generating a whole > pile of I/O to avoid sparseness when sparseness might be faster. Take > for example a persistent memory interface, where fragmentation is > irrelevant, and writing piles of zeroes to memory is a waste of time. It certainly isn't a waste of time if your intention is to scrub data belonging to a previous tenant, before giving access to someone else! If you have a metadata layer above then you can handle the command there (that's why we're adding it); if you haven't you do have to write the zeroes. Paolo