Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Mon, Aug 14, 2017 at 02:40:59PM +0200, Jan Kara wrote: > Hum, this proposal (and the problems you are trying to deal with) seem very > similar to Peter Zijlstra's mpin() proposal from 2014 [1], just moved to > the DAX area (and so additionally complicated by the fact that filesystems > now have to care). The patch set was not merged due to lack of interest I > think but it looked sensible and the proposed API would make sense for more > stuff than just DAX so maybe it would be better than MAP_DIRECT flag? > > [1] https://lwn.net/Articles/600502/ Thanks for thinking of that. The main sticking point was that I never got it working for RDMA, I got hopelessly lost in that code. Also I feel (and still do) that mpin() would be very useful for CMA, mpin() would be a good moment to migrate/compact the pages and get out of the way.
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Tue 15-08-17 16:50:55, Dan Williams wrote: > On Tue, Aug 15, 2017 at 1:37 AM, Jan Kara wrote: > > On Mon 14-08-17 09:14:42, Dan Williams wrote: > >> On Mon, Aug 14, 2017 at 5:40 AM, Jan Kara wrote: > >> > On Sun 13-08-17 13:31:45, Dan Williams wrote: > >> >> On Sun, Aug 13, 2017 at 2:24 AM, Christoph Hellwig wrote: > >> >> > Thay being said I think we absolutely should support RDMA memory > >> >> > registrations for DAX mappings. I'm just not sure how > >> >> > S_IOMAP_IMMUTABLE > >> >> > helps with that. We'll want a MAP_SYNC | MAP_POPULATE to make sure > >> >> > all the blocks are polulated and all ptes are set up. Second we need > >> >> > to make sure get_user_page works, which for now means we'll need a > >> >> > struct page mapping for the region (which will be really annoying > >> >> > for PCIe mappings, like the upcoming NVMe persistent memory region), > >> >> > and we need to gurantee that the extent mapping won't change while > >> >> > the get_user_pages holds the pages inside it. I think that is true > >> >> > due to side effects even with the current DAX code, but we'll need to > >> >> > make it explicit. And maybe that's where we need to converge - > >> >> > "sealing" the extent map makes sense as such a temporary measure > >> >> > that is not persisted on disk, which automatically gets released > >> >> > when the holding process exits, because we sort of already do this > >> >> > implicitly. It might also make sense to have explicitl breakable > >> >> > seals similar to what I do for the pNFS blocks kernel server, as > >> >> > any userspace RDMA file server would also need those semantics. > >> >> > >> >> Ok, how about a MAP_DIRECT flag that arranges for faults to that range > >> >> to: > >> >> > >> >> 1/ only succeed if the fault can be satisfied without page cache > >> >> > >> >> 2/ only install a pte for the fault if it can do so without > >> >> triggering block map updates > >> >> > >> >> So, I think it would still end up setting an inode flag to make > >> >> xfs_bmapi_write() fail while any process has a MAP_DIRECT mapping > >> >> active. However, it would not record that state in the on-disk > >> >> metadata and it would automatically clear at munmap time. That should > >> >> be enough to support the host-persistent-memory, and > >> >> NVMe-persistent-memory use cases (provided we have struct page for > >> >> NVMe). Although, we need more safety infrastructure in the NVMe case > >> >> where we would need to software manage I/O coherence. > >> > > >> > Hum, this proposal (and the problems you are trying to deal with) seem > >> > very > >> > similar to Peter Zijlstra's mpin() proposal from 2014 [1], just moved to > >> > the DAX area (and so additionally complicated by the fact that > >> > filesystems > >> > now have to care). The patch set was not merged due to lack of interest I > >> > think but it looked sensible and the proposed API would make sense for > >> > more > >> > stuff than just DAX so maybe it would be better than MAP_DIRECT flag? > >> > >> Interesting, but I'm not sure I see the correlation. mm_mpin() makes a > >> "no-fault" guarantee and fixes the accounting of locked System RAM. > >> MAP_DIRECT still allows faults, and DAX mappings don't consume System > >> RAM so the accounting problem is not there for DAX. mm_pin() also does > >> not appear to have a relationship to a file backed memory like mmap > >> allows. > > > > So the accounting part is probably non-interesting for DAX purposes and I > > agree there are other differences as well. But mm_mpin() prevented page > > migrations which is parallel to your requirement of "offset->block mapping > > is permanent". Furthermore mm_mpin() work was there for RDMA so that it > > has saner interface to pin pages than get_user_pages() and you mention RDMA > > and similar technologies as a usecase for your work for similar reasons. > > So my thought was that possibly we should have the same API for pinning > > "storage" for RDMA transfers regardless of whether the backing is page > > cache or pmem and the API should be usable for in-kernel users as well? > > mmap flag seems a bit clumsy in this regard so maybe a form of a separate > > syscall - be it mpin(start, len) or some other name - might be more > > suitable? > > Can you say about more about why an mmap flag for this feels awkward > to you? I think there's symmetry between O_SYNC / O_DIRECT setting up > synchronous / page-cache-bypass file descriptors and MAP_SYNC / > MAP_DIRECT setting up synchronous and page-cache bypass mappings. So my thinking was, that for in-kernel users it might be a bit more difficult to use mmap flag directly as they generally won't need to setup the mapping. But that can be certainly dealt with by proper helpers for in-kernel users. > "Pinning" also feels like the wrong mechanism when you consider > hardware is moving toward eliminating the pinning requirement over > time. SVM "Shared Virtual Memory" hardware will just ope
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Tue, Aug 15, 2017 at 1:37 AM, Jan Kara wrote: > On Mon 14-08-17 09:14:42, Dan Williams wrote: >> On Mon, Aug 14, 2017 at 5:40 AM, Jan Kara wrote: >> > On Sun 13-08-17 13:31:45, Dan Williams wrote: >> >> On Sun, Aug 13, 2017 at 2:24 AM, Christoph Hellwig wrote: >> >> > Thay being said I think we absolutely should support RDMA memory >> >> > registrations for DAX mappings. I'm just not sure how S_IOMAP_IMMUTABLE >> >> > helps with that. We'll want a MAP_SYNC | MAP_POPULATE to make sure >> >> > all the blocks are polulated and all ptes are set up. Second we need >> >> > to make sure get_user_page works, which for now means we'll need a >> >> > struct page mapping for the region (which will be really annoying >> >> > for PCIe mappings, like the upcoming NVMe persistent memory region), >> >> > and we need to gurantee that the extent mapping won't change while >> >> > the get_user_pages holds the pages inside it. I think that is true >> >> > due to side effects even with the current DAX code, but we'll need to >> >> > make it explicit. And maybe that's where we need to converge - >> >> > "sealing" the extent map makes sense as such a temporary measure >> >> > that is not persisted on disk, which automatically gets released >> >> > when the holding process exits, because we sort of already do this >> >> > implicitly. It might also make sense to have explicitl breakable >> >> > seals similar to what I do for the pNFS blocks kernel server, as >> >> > any userspace RDMA file server would also need those semantics. >> >> >> >> Ok, how about a MAP_DIRECT flag that arranges for faults to that range to: >> >> >> >> 1/ only succeed if the fault can be satisfied without page cache >> >> >> >> 2/ only install a pte for the fault if it can do so without >> >> triggering block map updates >> >> >> >> So, I think it would still end up setting an inode flag to make >> >> xfs_bmapi_write() fail while any process has a MAP_DIRECT mapping >> >> active. However, it would not record that state in the on-disk >> >> metadata and it would automatically clear at munmap time. That should >> >> be enough to support the host-persistent-memory, and >> >> NVMe-persistent-memory use cases (provided we have struct page for >> >> NVMe). Although, we need more safety infrastructure in the NVMe case >> >> where we would need to software manage I/O coherence. >> > >> > Hum, this proposal (and the problems you are trying to deal with) seem very >> > similar to Peter Zijlstra's mpin() proposal from 2014 [1], just moved to >> > the DAX area (and so additionally complicated by the fact that filesystems >> > now have to care). The patch set was not merged due to lack of interest I >> > think but it looked sensible and the proposed API would make sense for more >> > stuff than just DAX so maybe it would be better than MAP_DIRECT flag? >> >> Interesting, but I'm not sure I see the correlation. mm_mpin() makes a >> "no-fault" guarantee and fixes the accounting of locked System RAM. >> MAP_DIRECT still allows faults, and DAX mappings don't consume System >> RAM so the accounting problem is not there for DAX. mm_pin() also does >> not appear to have a relationship to a file backed memory like mmap >> allows. > > So the accounting part is probably non-interesting for DAX purposes and I > agree there are other differences as well. But mm_mpin() prevented page > migrations which is parallel to your requirement of "offset->block mapping > is permanent". Furthermore mm_mpin() work was there for RDMA so that it > has saner interface to pin pages than get_user_pages() and you mention RDMA > and similar technologies as a usecase for your work for similar reasons. > So my thought was that possibly we should have the same API for pinning > "storage" for RDMA transfers regardless of whether the backing is page > cache or pmem and the API should be usable for in-kernel users as well? > mmap flag seems a bit clumsy in this regard so maybe a form of a separate > syscall - be it mpin(start, len) or some other name - might be more > suitable? Can you say about more about why an mmap flag for this feels awkward to you? I think there's symmetry between O_SYNC / O_DIRECT setting up synchronous / page-cache-bypass file descriptors and MAP_SYNC / MAP_DIRECT setting up synchronous and page-cache bypass mappings. "Pinning" also feels like the wrong mechanism when you consider hardware is moving toward eliminating the pinning requirement over time. SVM "Shared Virtual Memory" hardware will just operate on cpu virtual addresses directly and generate typical faults. On such hardware MAP_DIRECT would be a nop relative to MAP_SYNC, so you wouldn't want your application to be stuck with the legacy concept that pages need to be explicitly "pinned".
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Mon 14-08-17 09:14:42, Dan Williams wrote: > On Mon, Aug 14, 2017 at 5:40 AM, Jan Kara wrote: > > On Sun 13-08-17 13:31:45, Dan Williams wrote: > >> On Sun, Aug 13, 2017 at 2:24 AM, Christoph Hellwig wrote: > >> > Thay being said I think we absolutely should support RDMA memory > >> > registrations for DAX mappings. I'm just not sure how S_IOMAP_IMMUTABLE > >> > helps with that. We'll want a MAP_SYNC | MAP_POPULATE to make sure > >> > all the blocks are polulated and all ptes are set up. Second we need > >> > to make sure get_user_page works, which for now means we'll need a > >> > struct page mapping for the region (which will be really annoying > >> > for PCIe mappings, like the upcoming NVMe persistent memory region), > >> > and we need to gurantee that the extent mapping won't change while > >> > the get_user_pages holds the pages inside it. I think that is true > >> > due to side effects even with the current DAX code, but we'll need to > >> > make it explicit. And maybe that's where we need to converge - > >> > "sealing" the extent map makes sense as such a temporary measure > >> > that is not persisted on disk, which automatically gets released > >> > when the holding process exits, because we sort of already do this > >> > implicitly. It might also make sense to have explicitl breakable > >> > seals similar to what I do for the pNFS blocks kernel server, as > >> > any userspace RDMA file server would also need those semantics. > >> > >> Ok, how about a MAP_DIRECT flag that arranges for faults to that range to: > >> > >> 1/ only succeed if the fault can be satisfied without page cache > >> > >> 2/ only install a pte for the fault if it can do so without > >> triggering block map updates > >> > >> So, I think it would still end up setting an inode flag to make > >> xfs_bmapi_write() fail while any process has a MAP_DIRECT mapping > >> active. However, it would not record that state in the on-disk > >> metadata and it would automatically clear at munmap time. That should > >> be enough to support the host-persistent-memory, and > >> NVMe-persistent-memory use cases (provided we have struct page for > >> NVMe). Although, we need more safety infrastructure in the NVMe case > >> where we would need to software manage I/O coherence. > > > > Hum, this proposal (and the problems you are trying to deal with) seem very > > similar to Peter Zijlstra's mpin() proposal from 2014 [1], just moved to > > the DAX area (and so additionally complicated by the fact that filesystems > > now have to care). The patch set was not merged due to lack of interest I > > think but it looked sensible and the proposed API would make sense for more > > stuff than just DAX so maybe it would be better than MAP_DIRECT flag? > > Interesting, but I'm not sure I see the correlation. mm_mpin() makes a > "no-fault" guarantee and fixes the accounting of locked System RAM. > MAP_DIRECT still allows faults, and DAX mappings don't consume System > RAM so the accounting problem is not there for DAX. mm_pin() also does > not appear to have a relationship to a file backed memory like mmap > allows. So the accounting part is probably non-interesting for DAX purposes and I agree there are other differences as well. But mm_mpin() prevented page migrations which is parallel to your requirement of "offset->block mapping is permanent". Furthermore mm_mpin() work was there for RDMA so that it has saner interface to pin pages than get_user_pages() and you mention RDMA and similar technologies as a usecase for your work for similar reasons. So my thought was that possibly we should have the same API for pinning "storage" for RDMA transfers regardless of whether the backing is page cache or pmem and the API should be usable for in-kernel users as well? mmap flag seems a bit clumsy in this regard so maybe a form of a separate syscall - be it mpin(start, len) or some other name - might be more suitable? Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Sun, Aug 13, 2017 at 01:31:45PM -0700, Dan Williams wrote: > On Sun, Aug 13, 2017 at 2:24 AM, Christoph Hellwig wrote: > > On Sat, Aug 12, 2017 at 12:19:50PM -0700, Dan Williams wrote: > >> The application does not need to know the storage address, it needs to > >> know that the storage address to file offset is fixed. With this > >> information it can make assumptions about the permanence of results it > >> gets from the kernel. > > > > Only if we clearly document that fact - and documenting the permanence > > is different from saying the block map won't change. > > I can get on board with that. > > > > >> For example get_user_pages() today makes no guarantees outside of > >> "page will not be freed", > > > > It also makes the extremely important gurantee that the page won't > > _move_ - e.g. that we won't do a memory migration for compaction or > > other reasons. That's why for example RDMA can use to register > > memory and then we can later set up memory windows that point to this > > registration from userspace and implement userspace RDMA. > > > >> but with immutable files and dax you now > >> have a mechanism for userspace to coordinate direct access to storage > >> addresses. Those raw storage addresses need not be exposed to the > >> application, as you say it doesn't need to know that detail. MAP_SYNC > >> does not fully satisfy this case because it requires agents that can > >> generate MMU faults to coordinate with the filesystem. > > > > The file system is always in the fault path, can you explain what other > > agents you are talking about? > > Exactly the one's you mention below. SVM hardware can just use a > MAP_SYNC mapping and be sure that its metadata dirtying writes are > synchronized with the filesystem through the fault path. Hardware that > does not have SVM, or hypervisors like Xen that want to attach their > own static metadata about the file offset to physical block mapping, > need a mechanism to make sure the block map is sealed while they have > it mapped. > > >> All I know is that SMB Direct for persistent memory seems like a > >> potential consumer. I know they're not going to use a userspace > >> filesystem or put an SMB server in the kernel. > > > > Last I talked to the Samba folks they didn't expect a userspace > > SMB direct implementation to work anyway due to the fact that > > libibverbs memory registrations interact badly with their fork()ing > > daemon model. That being said during the recent submission of the > > RDMA client code some comments were made about userspace versions of > > it, so I'm not sure if that opinion has changed in one way or another. > > Ok. > > > > > Thay being said I think we absolutely should support RDMA memory > > registrations for DAX mappings. I'm just not sure how S_IOMAP_IMMUTABLE > > helps with that. We'll want a MAP_SYNC | MAP_POPULATE to make sure > > all the blocks are polulated and all ptes are set up. Second we need > > to make sure get_user_page works, which for now means we'll need a > > struct page mapping for the region (which will be really annoying > > for PCIe mappings, like the upcoming NVMe persistent memory region), > > and we need to gurantee that the extent mapping won't change while > > the get_user_pages holds the pages inside it. I think that is true > > due to side effects even with the current DAX code, but we'll need to > > make it explicit. And maybe that's where we need to converge - > > "sealing" the extent map makes sense as such a temporary measure > > that is not persisted on disk, which automatically gets released > > when the holding process exits, because we sort of already do this > > implicitly. It might also make sense to have explicitl breakable > > seals similar to what I do for the pNFS blocks kernel server, as > > any userspace RDMA file server would also need those semantics. > > Ok, how about a MAP_DIRECT flag that arranges for faults to that range to: > > 1/ only succeed if the fault can be satisfied without page cache > > 2/ only install a pte for the fault if it can do so without > triggering block map updates > > So, I think it would still end up setting an inode flag to make > xfs_bmapi_write() fail while any process has a MAP_DIRECT mapping > active. However, it would not record that state in the on-disk > metadata and it would automatically clear at munmap time. That should TBH even after the last round of 'do we need this on-disk flag?' I still wasn't 100% convinced that we really needed a permanent flag vs. requiring apps to ask for a sealed iomap mmap like what you just described, so I'm glad this converation has continue. :) --D > be enough to support the host-persistent-memory, and > NVMe-persistent-memory use cases (provided we have struct page for > NVMe). Although, we need more safety infrastructure in the NVMe case > where we would need to software manage I/O coherence. > > > Last but not least we have any interesting additional case for modern > >
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Mon, Aug 14, 2017 at 5:40 AM, Jan Kara wrote: > On Sun 13-08-17 13:31:45, Dan Williams wrote: >> On Sun, Aug 13, 2017 at 2:24 AM, Christoph Hellwig wrote: >> > Thay being said I think we absolutely should support RDMA memory >> > registrations for DAX mappings. I'm just not sure how S_IOMAP_IMMUTABLE >> > helps with that. We'll want a MAP_SYNC | MAP_POPULATE to make sure >> > all the blocks are polulated and all ptes are set up. Second we need >> > to make sure get_user_page works, which for now means we'll need a >> > struct page mapping for the region (which will be really annoying >> > for PCIe mappings, like the upcoming NVMe persistent memory region), >> > and we need to gurantee that the extent mapping won't change while >> > the get_user_pages holds the pages inside it. I think that is true >> > due to side effects even with the current DAX code, but we'll need to >> > make it explicit. And maybe that's where we need to converge - >> > "sealing" the extent map makes sense as such a temporary measure >> > that is not persisted on disk, which automatically gets released >> > when the holding process exits, because we sort of already do this >> > implicitly. It might also make sense to have explicitl breakable >> > seals similar to what I do for the pNFS blocks kernel server, as >> > any userspace RDMA file server would also need those semantics. >> >> Ok, how about a MAP_DIRECT flag that arranges for faults to that range to: >> >> 1/ only succeed if the fault can be satisfied without page cache >> >> 2/ only install a pte for the fault if it can do so without >> triggering block map updates >> >> So, I think it would still end up setting an inode flag to make >> xfs_bmapi_write() fail while any process has a MAP_DIRECT mapping >> active. However, it would not record that state in the on-disk >> metadata and it would automatically clear at munmap time. That should >> be enough to support the host-persistent-memory, and >> NVMe-persistent-memory use cases (provided we have struct page for >> NVMe). Although, we need more safety infrastructure in the NVMe case >> where we would need to software manage I/O coherence. > > Hum, this proposal (and the problems you are trying to deal with) seem very > similar to Peter Zijlstra's mpin() proposal from 2014 [1], just moved to > the DAX area (and so additionally complicated by the fact that filesystems > now have to care). The patch set was not merged due to lack of interest I > think but it looked sensible and the proposed API would make sense for more > stuff than just DAX so maybe it would be better than MAP_DIRECT flag? Interesting, but I'm not sure I see the correlation. mm_mpin() makes a "no-fault" guarantee and fixes the accounting of locked System RAM. MAP_DIRECT still allows faults, and DAX mappings don't consume System RAM so the accounting problem is not there for DAX. mm_pin() also does not appear to have a relationship to a file backed memory like mmap allows.
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Sun 13-08-17 13:31:45, Dan Williams wrote: > On Sun, Aug 13, 2017 at 2:24 AM, Christoph Hellwig wrote: > > Thay being said I think we absolutely should support RDMA memory > > registrations for DAX mappings. I'm just not sure how S_IOMAP_IMMUTABLE > > helps with that. We'll want a MAP_SYNC | MAP_POPULATE to make sure > > all the blocks are polulated and all ptes are set up. Second we need > > to make sure get_user_page works, which for now means we'll need a > > struct page mapping for the region (which will be really annoying > > for PCIe mappings, like the upcoming NVMe persistent memory region), > > and we need to gurantee that the extent mapping won't change while > > the get_user_pages holds the pages inside it. I think that is true > > due to side effects even with the current DAX code, but we'll need to > > make it explicit. And maybe that's where we need to converge - > > "sealing" the extent map makes sense as such a temporary measure > > that is not persisted on disk, which automatically gets released > > when the holding process exits, because we sort of already do this > > implicitly. It might also make sense to have explicitl breakable > > seals similar to what I do for the pNFS blocks kernel server, as > > any userspace RDMA file server would also need those semantics. > > Ok, how about a MAP_DIRECT flag that arranges for faults to that range to: > > 1/ only succeed if the fault can be satisfied without page cache > > 2/ only install a pte for the fault if it can do so without > triggering block map updates > > So, I think it would still end up setting an inode flag to make > xfs_bmapi_write() fail while any process has a MAP_DIRECT mapping > active. However, it would not record that state in the on-disk > metadata and it would automatically clear at munmap time. That should > be enough to support the host-persistent-memory, and > NVMe-persistent-memory use cases (provided we have struct page for > NVMe). Although, we need more safety infrastructure in the NVMe case > where we would need to software manage I/O coherence. Hum, this proposal (and the problems you are trying to deal with) seem very similar to Peter Zijlstra's mpin() proposal from 2014 [1], just moved to the DAX area (and so additionally complicated by the fact that filesystems now have to care). The patch set was not merged due to lack of interest I think but it looked sensible and the proposed API would make sense for more stuff than just DAX so maybe it would be better than MAP_DIRECT flag? [1] https://lwn.net/Articles/600502/ Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Sun, Aug 13, 2017 at 11:24:36AM +0200, Christoph Hellwig wrote: > And maybe that's where we need to converge - > "sealing" the extent map makes sense as such a temporary measure > that is not persisted on disk, which automatically gets released > when the holding process exits, because we sort of already do this > implicitly. That seems reasonable to me. Personally I don't need persistent state, and I'd only intended persistence to be so that we didn't get arbitrary processes whacking holes in the file when the DAX app wasn't running that would then cause for userspace data sync. Seeing as the interface is morphing away from a "fill holes and persist" interface to just a "seal the existing map" interface, it'll be up to the app/library to prep check file layout for sanity every time it is sealed. > It might also make sense to have explicitl breakable > seals similar to what I do for the pNFS blocks kernel server, as > any userspace RDMA file server would also need those semantics. How would that work? IIUC, we'd need userspace to take out a file lease so that it gets notified when the seal is going to be broken by the filesystem via the break_layouts() interface, and the break then blocks until the app releases the lease? So the seal lifetime is bounded by the lease? Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Sun, Aug 13, 2017 at 2:24 AM, Christoph Hellwig wrote: > On Sat, Aug 12, 2017 at 12:19:50PM -0700, Dan Williams wrote: >> The application does not need to know the storage address, it needs to >> know that the storage address to file offset is fixed. With this >> information it can make assumptions about the permanence of results it >> gets from the kernel. > > Only if we clearly document that fact - and documenting the permanence > is different from saying the block map won't change. I can get on board with that. > >> For example get_user_pages() today makes no guarantees outside of >> "page will not be freed", > > It also makes the extremely important gurantee that the page won't > _move_ - e.g. that we won't do a memory migration for compaction or > other reasons. That's why for example RDMA can use to register > memory and then we can later set up memory windows that point to this > registration from userspace and implement userspace RDMA. > >> but with immutable files and dax you now >> have a mechanism for userspace to coordinate direct access to storage >> addresses. Those raw storage addresses need not be exposed to the >> application, as you say it doesn't need to know that detail. MAP_SYNC >> does not fully satisfy this case because it requires agents that can >> generate MMU faults to coordinate with the filesystem. > > The file system is always in the fault path, can you explain what other > agents you are talking about? Exactly the one's you mention below. SVM hardware can just use a MAP_SYNC mapping and be sure that its metadata dirtying writes are synchronized with the filesystem through the fault path. Hardware that does not have SVM, or hypervisors like Xen that want to attach their own static metadata about the file offset to physical block mapping, need a mechanism to make sure the block map is sealed while they have it mapped. >> All I know is that SMB Direct for persistent memory seems like a >> potential consumer. I know they're not going to use a userspace >> filesystem or put an SMB server in the kernel. > > Last I talked to the Samba folks they didn't expect a userspace > SMB direct implementation to work anyway due to the fact that > libibverbs memory registrations interact badly with their fork()ing > daemon model. That being said during the recent submission of the > RDMA client code some comments were made about userspace versions of > it, so I'm not sure if that opinion has changed in one way or another. Ok. > > Thay being said I think we absolutely should support RDMA memory > registrations for DAX mappings. I'm just not sure how S_IOMAP_IMMUTABLE > helps with that. We'll want a MAP_SYNC | MAP_POPULATE to make sure > all the blocks are polulated and all ptes are set up. Second we need > to make sure get_user_page works, which for now means we'll need a > struct page mapping for the region (which will be really annoying > for PCIe mappings, like the upcoming NVMe persistent memory region), > and we need to gurantee that the extent mapping won't change while > the get_user_pages holds the pages inside it. I think that is true > due to side effects even with the current DAX code, but we'll need to > make it explicit. And maybe that's where we need to converge - > "sealing" the extent map makes sense as such a temporary measure > that is not persisted on disk, which automatically gets released > when the holding process exits, because we sort of already do this > implicitly. It might also make sense to have explicitl breakable > seals similar to what I do for the pNFS blocks kernel server, as > any userspace RDMA file server would also need those semantics. Ok, how about a MAP_DIRECT flag that arranges for faults to that range to: 1/ only succeed if the fault can be satisfied without page cache 2/ only install a pte for the fault if it can do so without triggering block map updates So, I think it would still end up setting an inode flag to make xfs_bmapi_write() fail while any process has a MAP_DIRECT mapping active. However, it would not record that state in the on-disk metadata and it would automatically clear at munmap time. That should be enough to support the host-persistent-memory, and NVMe-persistent-memory use cases (provided we have struct page for NVMe). Although, we need more safety infrastructure in the NVMe case where we would need to software manage I/O coherence. > Last but not least we have any interesting additional case for modern > Mellanox hardware - On Demand Paging where we don't actually do a > get_user_pages but the hardware implements SVM and thus gets fed > virtual addresses directly. My head spins when talking about the > implications for DAX mappings on that, so I'm just throwing that in > for now instead of trying to come up with a solution. Yeah, DAX + SVM needs more thought.
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Sat, Aug 12, 2017 at 12:19:50PM -0700, Dan Williams wrote: > The application does not need to know the storage address, it needs to > know that the storage address to file offset is fixed. With this > information it can make assumptions about the permanence of results it > gets from the kernel. Only if we clearly document that fact - and documenting the permanence is different from saying the block map won't change. > For example get_user_pages() today makes no guarantees outside of > "page will not be freed", It also makes the extremely important gurantee that the page won't _move_ - e.g. that we won't do a memory migration for compaction or other reasons. That's why for example RDMA can use to register memory and then we can later set up memory windows that point to this registration from userspace and implement userspace RDMA. > but with immutable files and dax you now > have a mechanism for userspace to coordinate direct access to storage > addresses. Those raw storage addresses need not be exposed to the > application, as you say it doesn't need to know that detail. MAP_SYNC > does not fully satisfy this case because it requires agents that can > generate MMU faults to coordinate with the filesystem. The file system is always in the fault path, can you explain what other agents you are talking about? > All I know is that SMB Direct for persistent memory seems like a > potential consumer. I know they're not going to use a userspace > filesystem or put an SMB server in the kernel. Last I talked to the Samba folks they didn't expect a userspace SMB direct implementation to work anyway due to the fact that libibverbs memory registrations interact badly with their fork()ing daemon model. That being said during the recent submission of the RDMA client code some comments were made about userspace versions of it, so I'm not sure if that opinion has changed in one way or another. Thay being said I think we absolutely should support RDMA memory registrations for DAX mappings. I'm just not sure how S_IOMAP_IMMUTABLE helps with that. We'll want a MAP_SYNC | MAP_POPULATE to make sure all the blocks are polulated and all ptes are set up. Second we need to make sure get_user_page works, which for now means we'll need a struct page mapping for the region (which will be really annoying for PCIe mappings, like the upcoming NVMe persistent memory region), and we need to gurantee that the extent mapping won't change while the get_user_pages holds the pages inside it. I think that is true due to side effects even with the current DAX code, but we'll need to make it explicit. And maybe that's where we need to converge - "sealing" the extent map makes sense as such a temporary measure that is not persisted on disk, which automatically gets released when the holding process exits, because we sort of already do this implicitly. It might also make sense to have explicitl breakable seals similar to what I do for the pNFS blocks kernel server, as any userspace RDMA file server would also need those semantics. Last but not least we have any interesting additional case for modern Mellanox hardware - On Demand Paging where we don't actually do a get_user_pages but the hardware implements SVM and thus gets fed virtual addresses directly. My head spins when talking about the implications for DAX mappings on that, so I'm just throwing that in for now instead of trying to come up with a solution.
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Sat, Aug 12, 2017 at 12:33 AM, Christoph Hellwig wrote: > On Fri, Aug 11, 2017 at 03:26:05PM -0700, Dan Williams wrote: >> Right, but they let userspace make inferences about the state of >> metadata relative to I/O to a given storage address. In this regard >> S_IOMAP_IMMUTABLE is no different than MAP_SYNC, but 'immutable' goes >> a step further to let an application infer that the storage address is >> stable. This enables applications that MAP_SYNC does not, see below. > > But the application must not know (and cannot know) the storage address, > so it doesn't matter. > >> > What is the observable behavior of an extent map change? How can you >> > describe your immutable extent map behavior so that when I violate >> > them by e.g. moving one extent to a different place on disk you can >> > observe that in userspace? >> >> The violation is blocked, it's immutable. Using this feature means the >> application is taking away some of the kernel's freedom. That is a >> valid / safe tradeoff for the set of applications that would otherwise >> resort to raw device access. > > What can the application do with it safely that it can't otherwise do? > Short answer: nothing. The application does not need to know the storage address, it needs to know that the storage address to file offset is fixed. With this information it can make assumptions about the permanence of results it gets from the kernel. For example get_user_pages() today makes no guarantees outside of "page will not be freed", but with immutable files and dax you now have a mechanism for userspace to coordinate direct access to storage addresses. Those raw storage addresses need not be exposed to the application, as you say it doesn't need to know that detail. MAP_SYNC does not fully satisfy this case because it requires agents that can generate MMU faults to coordinate with the filesystem. >> > >> > Please explain how this interface allows for any sort of safe userspace >> > DMA. >> >> So this is where I continue to see S_IOMAP_IMMUTABLE being able to >> support applications that MAP_SYNC does not. Dave mentioned userspace >> pNFS4 servers, but there's also Samba and other protocols that want to >> negotiate a direct path to pmem outside the kernel. > > Userspace pNFS servers must use a userspace file system. Everything > else is just brainded stupid due to the amount of communication they > need to do. Also note that the only pNFS layouts that would even cause > direct block access are pNFS block/scsi and for those the > S_IOMAP_IMMUTABLE semantics are not very useful (background: I wrote > the Linux implementation for those, and authored the scsi layout spec) > Understood. All I know is that SMB Direct for persistent memory seems like a potential consumer. I know they're not going to use a userspace filesystem or put an SMB server in the kernel. > >> Applications that just want flush from userspace can use MAP_SYNC, >> those that need to temporarily pin the block for RDMA can use the >> in-kernel pNFS server, and those that need to coordinate both from >> userspace can use S_IOMAP_IMMUTABLE. It's a continuum, not a >> competition. > > Again - how does your application even know that I moved your block > around with your S_IOMAP_IMMUTABLE? We should never add interfaces > that mandate implementations - we should based interfaces based on > user observable behavior - and debug tools like fiemap don't count. I'm still not grokking this "I moved your block" example. What agent is moving blocks while the file is immutable? > Before going any further please write a man page that describeѕ your > intended semantics in a way that an application programmer understands. Sure, I'll try to write this up in terms of the use cases I know about that can immediately consume it and switch away from device-dax.
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Fri, Aug 11, 2017 at 08:57:18PM -0700, Andy Lutomirski wrote: > One thing that makes me quite nervous about S_IOMAP_IMMUTABLE is the > degree to which things go badly if one program relies on it while > another program clears the flag: you risk corrupting unrelated > filesystem metadata. I think a userspace interface to pin the extent > mapping of a file really wants a way to reliably keep it pinned (or to > reliably zap the userspace application if it gets unpinned). The nice thing is that no application can rely on it anyway..
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Fri, Aug 11, 2017 at 03:26:05PM -0700, Dan Williams wrote: > Right, but they let userspace make inferences about the state of > metadata relative to I/O to a given storage address. In this regard > S_IOMAP_IMMUTABLE is no different than MAP_SYNC, but 'immutable' goes > a step further to let an application infer that the storage address is > stable. This enables applications that MAP_SYNC does not, see below. But the application must not know (and cannot know) the storage address, so it doesn't matter. > > What is the observable behavior of an extent map change? How can you > > describe your immutable extent map behavior so that when I violate > > them by e.g. moving one extent to a different place on disk you can > > observe that in userspace? > > The violation is blocked, it's immutable. Using this feature means the > application is taking away some of the kernel's freedom. That is a > valid / safe tradeoff for the set of applications that would otherwise > resort to raw device access. What can the application do with it safely that it can't otherwise do? Short answer: nothing. > > > > Please explain how this interface allows for any sort of safe userspace > > DMA. > > So this is where I continue to see S_IOMAP_IMMUTABLE being able to > support applications that MAP_SYNC does not. Dave mentioned userspace > pNFS4 servers, but there's also Samba and other protocols that want to > negotiate a direct path to pmem outside the kernel. Userspace pNFS servers must use a userspace file system. Everything else is just brainded stupid due to the amount of communication they need to do. Also note that the only pNFS layouts that would even cause direct block access are pNFS block/scsi and for those the S_IOMAP_IMMUTABLE semantics are not very useful (background: I wrote the Linux implementation for those, and authored the scsi layout spec) > Applications that just want flush from userspace can use MAP_SYNC, > those that need to temporarily pin the block for RDMA can use the > in-kernel pNFS server, and those that need to coordinate both from > userspace can use S_IOMAP_IMMUTABLE. It's a continuum, not a > competition. Again - how does your application even know that I moved your block around with your S_IOMAP_IMMUTABLE? We should never add interfaces that mandate implementations - we should based interfaces based on user observable behavior - and debug tools like fiemap don't count. Before going any further please write a man page that describeѕ your intended semantics in a way that an application programmer understands.
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Fri, Aug 11, 2017 at 8:57 PM, Andy Lutomirski wrote: > On Fri, Aug 11, 2017 at 3:26 PM, Dan Williams > wrote: >> On Fri, Aug 11, 2017 at 3:44 AM, Christoph Hellwig wrote: >>> Please explain how this interface allows for any sort of safe userspace >>> DMA. >> >> So this is where I continue to see S_IOMAP_IMMUTABLE being able to >> support applications that MAP_SYNC does not. Dave mentioned userspace >> pNFS4 servers, but there's also Samba and other protocols that want to >> negotiate a direct path to pmem outside the kernel. Xen support has >> thus far not been able to follow in the footsteps of KVM enabling due >> to a dependence on static M2P tables that assume a static >> guest-physical to host-physical relationship [1]. Immutable files >> would allow Xen to follow the same "mmap a file" semantic as KVM. > > One thing that makes me quite nervous about S_IOMAP_IMMUTABLE is the > degree to which things go badly if one program relies on it while > another program clears the flag: you risk corrupting unrelated > filesystem metadata. I think a userspace interface to pin the extent > mapping of a file really wants a way to reliably keep it pinned (or to > reliably zap the userspace application if it gets unpinned). In the current patches, mapping_mapped() pins the immutable state.
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Fri, Aug 11, 2017 at 3:26 PM, Dan Williams wrote: > On Fri, Aug 11, 2017 at 3:44 AM, Christoph Hellwig wrote: >> Please explain how this interface allows for any sort of safe userspace >> DMA. > > So this is where I continue to see S_IOMAP_IMMUTABLE being able to > support applications that MAP_SYNC does not. Dave mentioned userspace > pNFS4 servers, but there's also Samba and other protocols that want to > negotiate a direct path to pmem outside the kernel. Xen support has > thus far not been able to follow in the footsteps of KVM enabling due > to a dependence on static M2P tables that assume a static > guest-physical to host-physical relationship [1]. Immutable files > would allow Xen to follow the same "mmap a file" semantic as KVM. One thing that makes me quite nervous about S_IOMAP_IMMUTABLE is the degree to which things go badly if one program relies on it while another program clears the flag: you risk corrupting unrelated filesystem metadata. I think a userspace interface to pin the extent mapping of a file really wants a way to reliably keep it pinned (or to reliably zap the userspace application if it gets unpinned).
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Fri, Aug 11, 2017 at 3:44 AM, Christoph Hellwig wrote: > On Sun, Aug 06, 2017 at 11:51:50AM -0700, Dan Williams wrote: >> Of course it's a useful API. An application already needs to worry >> about the block map, that's why we have fallocate, msync, fiemap >> and... > > Fallocate and msync do not expose the block map in any way. Proof: > they work just fine over say nfs. Right, but they let userspace make inferences about the state of metadata relative to I/O to a given storage address. In this regard S_IOMAP_IMMUTABLE is no different than MAP_SYNC, but 'immutable' goes a step further to let an application infer that the storage address is stable. This enables applications that MAP_SYNC does not, see below. > fiemap does indeed expose the block map, which is the whole point. > But it's a debug tool that we don't event have a man page for. And > it's not usable for anything else, if only for the fact that it doesn't > tell you what device your returned extents are relative to. True, one couldn't just use immutable + fiemap and expect to have the right storage device. > >> > We've been through this a few times but let me repeat it: The only >> > sensible API gurantee is one that is observable and usable. >> >> I'm missing how block-map immutable files violate this observable and >> usable constraint? > > What is the observable behavior of an extent map change? How can you > describe your immutable extent map behavior so that when I violate > them by e.g. moving one extent to a different place on disk you can > observe that in userspace? The violation is blocked, it's immutable. Using this feature means the application is taking away some of the kernel's freedom. That is a valid / safe tradeoff for the set of applications that would otherwise resort to raw device access. > >> This immutable approach should also go in, it solves the same problem >> without the the latency drawback, > > How is your latency going to be any different from MAP_SYNC on > a fully allocated and pre-zeroed file? So, I went back and read Jan's patches, and in the pre-allocated case I don't think we can get stuck behind a backlog of dirty metada flushing since the implementation only seems to take the synchronous fault path if the fault dirtied the block map. >> Beyond flush from userspace it also >> can be used to solve the swapfile problems you highlighted > > Which swapfile problem? The TOCTOU problem of enabling swap vs reflink that you mentioned in your criticism of the daxctl syscall, but now that I look your comments were based on the *general* case use of bmap(), However, xfs in particular as of commits: eb5e248d502b xfs: don't allow bmap on rt files db1327b16c2b xfs: report shared extent mappings to userspace correctly ...doesn't appear to have this problem. That said Dave's idea to use immutable + unwritten extents for swap makes sense to me. That's a feature, not a bug fix, but I went ahead and appended a proof-of-concept implementation to the v3 posting. >> and it >> allows safe ongoing dma to a filesystem-dax mapping beyond what we can >> already do with direct-I/O. > > Please explain how this interface allows for any sort of safe userspace > DMA. So this is where I continue to see S_IOMAP_IMMUTABLE being able to support applications that MAP_SYNC does not. Dave mentioned userspace pNFS4 servers, but there's also Samba and other protocols that want to negotiate a direct path to pmem outside the kernel. Xen support has thus far not been able to follow in the footsteps of KVM enabling due to a dependence on static M2P tables that assume a static guest-physical to host-physical relationship [1]. Immutable files would allow Xen to follow the same "mmap a file" semantic as KVM. Applications that just want flush from userspace can use MAP_SYNC, those that need to temporarily pin the block for RDMA can use the in-kernel pNFS server, and those that need to coordinate both from userspace can use S_IOMAP_IMMUTABLE. It's a continuum, not a competition. [1]: https://lists.xen.org/archives/html/xen-devel/2017-04/msg00427.html
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Sun, Aug 06, 2017 at 11:51:50AM -0700, Dan Williams wrote: > Of course it's a useful API. An application already needs to worry > about the block map, that's why we have fallocate, msync, fiemap > and... Fallocate and msync do not expose the block map in any way. Proof: they work just fine over say nfs. fiemap does indeed expose the block map, which is the whole point. But it's a debug tool that we don't event have a man page for. And it's not usable for anything else, if only for the fact that it doesn't tell you what device your returned extents are relative to. > > We've been through this a few times but let me repeat it: The only > > sensible API gurantee is one that is observable and usable. > > I'm missing how block-map immutable files violate this observable and > usable constraint? What is the observable behavior of an extent map change? How can you describe your immutable extent map behavior so that when I violate them by e.g. moving one extent to a different place on disk you can observe that in userspace? > This immutable approach should also go in, it solves the same problem > without the the latency drawback, How is your latency going to be any different from MAP_SYNC on a fully allocated and pre-zeroed file? > Beyond flush from userspace it also > can be used to solve the swapfile problems you highlighted Which swapfile problem? > and it > allows safe ongoing dma to a filesystem-dax mapping beyond what we can > already do with direct-I/O. Please explain how this interface allows for any sort of safe userspace DMA.
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Sat, Aug 5, 2017 at 2:50 AM, Christoph Hellwig wrote: > On Thu, Aug 03, 2017 at 07:38:11PM -0700, Dan Williams wrote: >> [ adding linux-api to the cover letter for notification, will send the >> full set to linux-api for v3 ] > > Just don't send this crap ever again. All the so called use cases in the > earlier thread were incorrect and highly dangerous. I usually end up coming around to your position on these types of debates because you almost always put forward unassailable technical arguments. So far, you have not in this case. > Promising that the block map is stable is not a useful userspace API, > as it the block map is a complete internal implementation detail. Of course it's a useful API. An application already needs to worry about the block map, that's why we have fallocate, msync, fiemap and... > We've been through this a few times but let me repeat it: The only > sensible API gurantee is one that is observable and usable. I'm missing how block-map immutable files violate this observable and usable constraint? > so Jan's synchronous page fault flag in one form or another makes > perfect sense as it is a clear receipe for the user: you don't > have to call msync to persist your mmap writes. This API is not, > it guarantees that the block map does not change, but the application > has absolutely no point of even knowing about the block map. Jan's approach is great, it should go in, it solves a long standing problem with dax with the only drawback being potentially unpredictable latency spikes. This immutable approach should also go in, it solves the same problem without the the latency drawback, but yes, with the administrative overhead of CAP_LINUX_IMMUTABLE. Beyond flush from userspace it also can be used to solve the swapfile problems you highlighted and it allows safe ongoing dma to a filesystem-dax mapping beyond what we can already do with direct-I/O. There is demand for these capabilities that cannot be satisfied by just hand waving them away as invalid. The magnitude of opposition to this approach is out of step with the actual risk.
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
On Thu, Aug 03, 2017 at 07:38:11PM -0700, Dan Williams wrote: > [ adding linux-api to the cover letter for notification, will send the > full set to linux-api for v3 ] Just don't send this crap ever again. All the so called use cases in the earlier thread were incorrect and highly dangerous. Promising that the block map is stable is not a useful userspace API, as it the block map is a complete internal implementation detail. We've been through this a few times but let me repeat it: The only sensible API gurantee is one that is observable and usable. so Jan's synchronous page fault flag in one form or another makes perfect sense as it is a clear receipe for the user: you don't have to call msync to persist your mmap writes. This API is not, it guarantees that the block map does not change, but the application has absolutely no point of even knowing about the block map.
Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
[ adding linux-api to the cover letter for notification, will send the full set to linux-api for v3 ] On Thu, Aug 3, 2017 at 7:28 PM, Dan Williams wrote: > Changes since v1 [1]: > * Add IS_IOMAP_IMMUTABLE() checks to xfs ioctl paths that perform block > map changes (xfs_alloc_file_space and xfs_free_file_space) (Darrick) > > * Rather than complete a partial write, fail all writes that would > attempt to extend the file size (Darrick) > > * Introduce FALLOC_FL_UNSEAL_BLOCK_MAP as an explicit operation type for > clearing S_IOMAP_IMMUTABLE (Dave) > > * Rework xfs_seal_file_space() to first complete hole-fill and unshare > operations and then check the file for suitability under > XFS_ILOCK_EXCL. (Darrick) > > * Add an FS_XFLAG_IOMAP_IMMUTABLE flag so the immutable state can be > seen by xfs_io. (Dave) > > * Move the setting of S_IOMAP_IMMUTABLE to be atomic with respect to the > successful transaction that records XFS_DIFLAG2_IOMAP_IMMUTABLE. > (Darrick, Dave) > > * Switch to a 'goto out_unlock' style in xfs_seal_file_space() to > cleanup 'if / else' tree, and use the mapping_mapped() helper. (Dave) > > * Rely on XFS_MMAPLOCK_EXCL for reading a stable state of > mapping->i_mmap. (Dave) > > [1]: http://marc.info/?l=linux-fsdevel&m=150135785712967&w=2 > > --- > > The daxfile proposal a few weeks back [2] sought to piggy back on the > swapfile implementation to approximate a block map immutable file. This > is an idea Dave originated last year to solve the dax "flush from > userspace" problem [3]. > > The discussion yielded several results. First, Christoph pointed out > that swapfiles are subtly broken [4]. Second, Darrick [5] and Dave [6] > proposed how to properly implement a block map immutable file. Finally, > Dave identified some improvements to swapfiles that can be built on the > block-map-immutable mechanism. These patches seek to implement the first > part of the proposal and save the swapfile work to build on top once the > base mechanism is complete. > > While the initial motivation for this feature is support for > byte-addressable updates of persistent memory and managing cache > maintenance from userspace, the applications of the feature are broader. > In addition to being the start of a better swapfile mechanism it can > also support a DMA-to-storage use case. This use case enables > data-acquisition hardware to DMA directly to a storage device address > while being safe in the knowledge that storage mappings will not change. > > [2]: https://lkml.org/lkml/2017/6/16/790 > [3]: https://lkml.org/lkml/2016/9/11/159 > [4]: https://lkml.org/lkml/2017/6/18/31 > [5]: https://lkml.org/lkml/2017/6/20/49 > [6]: https://www.spinics.net/lists/linux-xfs/msg07871.html > > --- > > Dan Williams (5): > fs, xfs: introduce S_IOMAP_IMMUTABLE > fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP > fs, xfs: introduce FALLOC_FL_UNSEAL_BLOCK_MAP > xfs: introduce XFS_DIFLAG2_IOMAP_IMMUTABLE > xfs: toggle XFS_DIFLAG2_IOMAP_IMMUTABLE in response to fallocate > > > fs/attr.c | 10 ++ > fs/open.c | 22 + > fs/read_write.c |3 + > fs/xfs/libxfs/xfs_format.h |5 + > fs/xfs/xfs_bmap_util.c | 181 > +++ > fs/xfs/xfs_bmap_util.h |5 + > fs/xfs/xfs_file.c | 16 +++- > fs/xfs/xfs_inode.c |2 > fs/xfs/xfs_ioctl.c |7 ++ > fs/xfs/xfs_iops.c |8 +- > include/linux/falloc.h |4 + > include/linux/fs.h |2 > include/uapi/linux/falloc.h | 20 + > include/uapi/linux/fs.h |1 > mm/filemap.c|5 + > 15 files changed, 282 insertions(+), 9 deletions(-)