Re: [PATCH v7 09/12] xfs: wire up ->lease_direct()
On Fri, Oct 06, 2017 at 03:36:06PM -0700, Dan Williams wrote: > A 'lease_direct' lease requires that the vma have a valid MAP_DIRECT > mapping established. For xfs we establish a new lease and then check if > the MAP_DIRECT mapping has been broken. We want to be sure that the > process will receive notification that the MAP_DIRECT mapping is being > torn down so it knows why other code paths are throwing failures. > > For example in the RDMA/ibverbs case we want ibv_reg_mr() to fail if the > MAP_DIRECT mapping is invalid or in the process of being invalidated. > > Cc: Jan Kara> Cc: Jeff Moyer > Cc: Christoph Hellwig > Cc: Dave Chinner > Cc: "Darrick J. Wong" > Cc: Ross Zwisler > Cc: Jeff Layton > Cc: "J. Bruce Fields" > Signed-off-by: Dan Williams > --- > fs/xfs/xfs_file.c | 28 > 1 file changed, 28 insertions(+) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index e35518600e28..823b65f17429 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -1166,6 +1166,33 @@ xfs_filemap_direct_close( > put_map_direct_vma(vma->vm_private_data); > } > > +static struct lease_direct * > +xfs_filemap_direct_lease( > + struct vm_area_struct *vma, > + void(*break_fn)(void *), > + void*owner) > +{ > + struct lease_direct *ld; > + > + ld = map_direct_lease(vma, break_fn, owner); > + > + if (IS_ERR(ld)) > + return ld; > + > + /* > + * We now have an established lease while the base MAP_DIRECT > + * lease was not broken. So, we know that the "lease holder" will > + * receive a SIGIO notification when the lease is broken and > + * take any necessary cleanup actions. > + */ > + if (!is_map_direct_broken(vma->vm_private_data)) > + return ld; > + > + map_direct_lease_destroy(ld); > + > + return ERR_PTR(-ENXIO); What's any of this got to do with XFS? Shouldn't it be in generic code, and called generic_filemap_direct_lease()? Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 06/12] xfs: wire up MAP_DIRECT
On Fri, Oct 06, 2017 at 03:35:49PM -0700, Dan Williams wrote: > MAP_DIRECT is an mmap(2) flag with the following semantics: > > MAP_DIRECT > When specified with MAP_SHARED_VALIDATE, sets up a file lease with the > same lifetime as the mapping. Unlike a typical F_RDLCK lease this lease > is broken when a "lease breaker" attempts to write(2), change the block > map (fallocate), or change the size of the file. Otherwise the mechanism > of a lease break is identical to the typical lease break case where the > lease needs to be removed (munmap) within the number of seconds > specified by /proc/sys/fs/lease-break-time. If the lease holder fails to > remove the lease in time the kernel will invalidate the mapping and > force all future accesses to the mapping to trigger SIGBUS. > > In addition to lease break timeouts causing faults in the mapping to > result in SIGBUS, other states of the file will trigger SIGBUS at fault > time: > > * The file is not DAX capable > * The file has reflinked (copy-on-write) blocks > * The fault would trigger the filesystem to allocate blocks > * The fault would trigger the filesystem to perform extent conversion > > In other words, MAP_DIRECT expects and enforces a fully allocated file > where faults can be satisfied without modifying block map metadata. > > An unprivileged process may establish a MAP_DIRECT mapping on a file > whose UID (owner) matches the filesystem UID of the process. A process > with the CAP_LEASE capability may establish a MAP_DIRECT mapping on > arbitrary files > > ERRORS > EACCES Beyond the typical mmap(2) conditions that trigger EACCES > MAP_DIRECT also requires the permission to set a file lease. > > EOPNOTSUPP The filesystem explicitly does not support the flag > > SIGBUS Attempted to write a MAP_DIRECT mapping at a file offset that > might require block-map updates, or the lease timed out and the > kernel invalidated the mapping. > > Cc: Jan Kara> Cc: Arnd Bergmann > Cc: Jeff Moyer > Cc: Christoph Hellwig > Cc: Dave Chinner > Cc: Alexander Viro > Cc: "Darrick J. Wong" > Cc: Ross Zwisler > Cc: Jeff Layton > Cc: "J. Bruce Fields" > Signed-off-by: Dan Williams > --- > fs/xfs/Kconfig |2 - > fs/xfs/xfs_file.c | 102 > +++ > include/linux/mman.h|3 + > include/uapi/asm-generic/mman.h |1 > 4 files changed, 106 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig > index f62fc6629abb..f8765653a438 100644 > --- a/fs/xfs/Kconfig > +++ b/fs/xfs/Kconfig > @@ -112,4 +112,4 @@ config XFS_ASSERT_FATAL > > config XFS_LAYOUT > def_bool y > - depends on EXPORTFS_BLOCK_OPS > + depends on EXPORTFS_BLOCK_OPS || FS_DAX > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index ebdd0bd2b261..e35518600e28 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -40,12 +40,22 @@ > #include "xfs_iomap.h" > #include "xfs_reflink.h" > > +#include > #include > #include > #include > +#include > #include > > static const struct vm_operations_struct xfs_file_vm_ops; > +static const struct vm_operations_struct xfs_file_vm_direct_ops; > + > +static inline bool > +is_xfs_map_direct( > + struct vm_area_struct *vma) > +{ > + return vma->vm_ops == _file_vm_direct_ops; > +} Namespacing (xfs_vma_is_direct) and whitespace damage. > > /* > * Clear the specified ranges to zero through either the pagecache or DAX. > @@ -1008,6 +1018,26 @@ xfs_file_llseek( > return vfs_setpos(file, offset, inode->i_sb->s_maxbytes); > } > > +static int > +xfs_vma_checks( > + struct vm_area_struct *vma, > + struct inode*inode) Exactly what are we checking for - function name doesn't tell me, and there's no comments, either? > +{ > + if (!is_xfs_map_direct(vma)) > + return 0; > + > + if (!is_map_direct_valid(vma->vm_private_data)) > + return VM_FAULT_SIGBUS; > + > + if (xfs_is_reflink_inode(XFS_I(inode))) > + return VM_FAULT_SIGBUS; > + > + if (!IS_DAX(inode)) > + return VM_FAULT_SIGBUS; And how do we get is_xfs_map_direct() set to true if we don't have a DAX inode or the inode has shared extents? > + > + return 0; > +} > + > /* > * Locking for serialisation of IO during page faults. This results in a lock > * ordering of: > @@ -1024,6 +1054,7 @@ __xfs_filemap_fault( > enum page_entry_sizepe_size, > boolwrite_fault) > { > + struct vm_area_struct *vma = vmf->vma; > struct inode*inode = file_inode(vmf->vma->vm_file); You
Re: [PATCH v7 03/12] fs: introduce i_mapdcount
On Fri, Oct 06, 2017 at 03:35:32PM -0700, Dan Williams wrote: > When ->iomap_begin() sees this count being non-zero and determines that > the block map of the file needs to be modified to satisfy the I/O > request it will instead return an error. This is needed for MAP_DIRECT > where, due to locking constraints, we can't rely on xfs_break_layouts() > to protect against allocating write-faults either from the process that > setup the MAP_DIRECT mapping nor other processes that have the file > mapped. xfs_break_layouts() requires XFS_IOLOCK which is problematic to > mix with the XFS_MMAPLOCK in the fault path. > > Cc: Jan Kara> Cc: Jeff Moyer > Cc: Christoph Hellwig > Cc: Dave Chinner > Cc: "Darrick J. Wong" > Cc: Ross Zwisler > Cc: Jeff Layton > Cc: "J. Bruce Fields" > Signed-off-by: Dan Williams > --- > fs/xfs/xfs_iomap.c |9 + > include/linux/fs.h | 31 +++ > 2 files changed, 40 insertions(+) > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index a1909bc064e9..6816f8ebbdcf 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -1053,6 +1053,15 @@ xfs_file_iomap_begin( > goto out_unlock; > } > /* > + * If a file has MAP_DIRECT mappings disable block map > + * updates. This should only effect mmap write faults as > + * other paths are protected by an FL_LAYOUT lease. > + */ > + if (i_mapdcount_read(inode)) { > + error = -ETXTBSY; > + goto out_unlock; > + } That looks really fragile. For one, it's going to miss modifications to reflinked files altogether. Ignoring that, however, I don't want to have to care one bit about the internals of the MAP_DIRECT implementation in the filesystem code. Hide it behind something with an obvious name that returns the appropriate error and the filesystem code becomes self documenting: if ((flags & IOMAP_WRITE) && imap_needs_alloc(inode, , nimaps)) { . error = iomap_can_allocate(inode); if (error) goto out_unlock; Then you can put all the MAP_DIRECT stuff and the comments explaining what is does inside the generic function that determines if we are allowed to allocate on that inode or not. > + /* >* We cap the maximum length we map here to MAX_WRITEBACK_PAGES >* pages to keep the chunks of work done where somewhat > symmetric >* with the work writeback does. This is a completely arbitrary > diff --git a/include/linux/fs.h b/include/linux/fs.h > index c2b9bf3dc4e9..f83871b188ff 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -642,6 +642,9 @@ struct inode { > atomic_ti_count; > atomic_ti_dio_count; > atomic_ti_writecount; > +#ifdef CONFIG_FS_DAX > + atomic_ti_mapdcount;/* count of MAP_DIRECT vmas */ > +#endif Is there any way to avoid growing the struct inode for this? Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
linux-nvdimm现场质量问题分析与解决
您好!linux-nvdimm 工厂现场质量问题分析与解决训练营 广州 即将开班;-详情请阅读附件大纲 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v8 2/2] IB/core: use MAP_DIRECT to fix / enable RDMA to DAX mappings
On Sat, Oct 7, 2017 at 11:45 PM, kbuild test robot <l...@intel.com> wrote: > Hi Dan, > > [auto build test ERROR on rdma/master] > [also build test ERROR on v4.14-rc3 next-20170929] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] This was a fixed up resend of patch [v7 11/12]. It's not clear how to teach the kbuild robot to be aware of patch-replies to individual patches in the series. I.e. reworked patches without resending the complete series? > url: > https://github.com/0day-ci/linux/commits/Dan-Williams/iommu-up-level-sg_num_pages-from-amd-iommu/20171008-133505 > base: https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git > master > config: i386-randconfig-n0-201741 (attached as .config) > compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4 > reproduce: > # save the attached .config to linux build tree > make ARCH=i386 > > All errors (new ones prefixed by >>): > >>> drivers/infiniband/core/umem.c:39:29: fatal error: linux/mapdirect.h: No >>> such file or directory > #include mapdirect.h indeed does not exist when missing the earlier patches in the series. It would be slick if the 0day-robot read the the "in-reply-to" header and auto replaced a patch in a series, but that would be a feature approaching magic. >compilation terminated. > > vim +39 drivers/infiniband/core/umem.c > > > 39 #include > 40 #include > 41 #include > 42 #include > 43 #include > 44 > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v8 2/2] IB/core: use MAP_DIRECT to fix / enable RDMA to DAX mappings
Hi Dan, [auto build test ERROR on rdma/master] [also build test ERROR on v4.14-rc3 next-20170929] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Dan-Williams/iommu-up-level-sg_num_pages-from-amd-iommu/20171008-133505 base: https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git master config: i386-randconfig-n0-201741 (attached as .config) compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): >> drivers/infiniband/core/umem.c:39:29: fatal error: linux/mapdirect.h: No >> such file or directory #include ^ compilation terminated. vim +39 drivers/infiniband/core/umem.c > 39 #include 40 #include 41 #include 42 #include 43 #include 44 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm