Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA
On Mon, Jan 29, 2018 at 06:33:25PM -0500, Jerome Glisse wrote: > Between i would also like to participate, in my view the burden should > be on GUP users, so if hardware is not ODP capable then you should at > least be able to kill the mapping/GUP and force the hardware to redo a > GUP if it get any more transaction on affect umem. Can non ODP hardware > do that ? Or is it out of the question ? For RDMA we can have the HW forcibly tear down the MR, but it is incredibly disruptive and nobody running applications would be happy with this outcome. Jason ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA
On Wed, Jan 24, 2018 at 07:56:02PM -0800, Dan Williams wrote: > The get_user_pages_longterm() api was recently added as a stop-gap > measure to prevent applications from growing dependencies on the > ability to to pin DAX-mapped filesystem blocks for RDMA indefinitely > with no ongoing coordination with the filesystem. This 'longterm' > pinning is also problematic for the non-DAX VMA case where the core-mm > needs a time bounded way to revoke a pin and manipulate the physical > pages. While existing RDMA applications have already grown the > assumption that they can pin page-cache pages indefinitely, the fact > that we are breaking this assumption for filesystem-dax presents an > opportunity to deprecate the 'indefinite pin' mechanisms and move to a > general interface that supports pin revocation. > > While RDMA may grow an explicit Infiniband-verb for this 'memory > registration with lease' semantic, it seems that this problem is > bigger than just RDMA. At LSF/MM it would be useful to have a > discussion between fs, mm, dax, and RDMA folks about addressing this > problem at the core level. > > Particular people that would be useful to have in attendance are > Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd). > Between i would also like to participate, in my view the burden should be on GUP users, so if hardware is not ODP capable then you should at least be able to kill the mapping/GUP and force the hardware to redo a GUP if it get any more transaction on affect umem. Can non ODP hardware do that ? Or is it out of the question ? Cheers, Jérôme ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA
On Thu, Jan 25, 2018 at 8:47 AM, Christoph Hellwigwrote: > On Thu, Jan 25, 2018 at 09:08:02AM -0700, Jason Gunthorpe wrote: >> On Wed, Jan 24, 2018 at 11:23:51PM -0800, Christoph Hellwig wrote: >> > On Wed, Jan 24, 2018 at 07:56:02PM -0800, Dan Williams wrote: >> > > Particular people that would be useful to have in attendance are >> > > Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd). >> > >> > I won't be able to make it - I'll have to do election work and >> > count the ballots for our city council and mayor election. >> >> I also have a travel conflict for that week in April and cannot make >> it. > > Are any of you going to be in the Bay Area in February for Usenix > FAST / LinuxFAST? I'll be around, but that said I still think it's worthwhile to have this conversation at LSF/MM. While we have a plan for filesystem-dax vs RDMA, there's still the open implications for the mm in other scenarios. I see Michal has also proposed this topic. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA
On Thu, Jan 25, 2018 at 09:08:02AM -0700, Jason Gunthorpe wrote: > On Wed, Jan 24, 2018 at 11:23:51PM -0800, Christoph Hellwig wrote: > > On Wed, Jan 24, 2018 at 07:56:02PM -0800, Dan Williams wrote: > > > Particular people that would be useful to have in attendance are > > > Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd). > > > > I won't be able to make it - I'll have to do election work and > > count the ballots for our city council and mayor election. > > I also have a travel conflict for that week in April and cannot make > it. Are any of you going to be in the Bay Area in February for Usenix FAST / LinuxFAST? ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA
On Thu, Jan 25, 2018 at 09:08:48AM -0700, Jason Gunthorpe wrote: > On Wed, Jan 24, 2018 at 11:02:16PM -0800, Dan Williams wrote: > > > No, in 3 dimensions since there is a need to support non-ODP RDMA > > hardware, hypervisors want to coordinate DMA for guests, and non-RDMA > > hardware also pins memory indefinitely like V4L2. So it's bigger than > > RDMA, but that will likely be the first consumer of this 'longterm > > pin' mechanism. > > BTW, did you look at VFIO? I think it should also have this problem > right? VFIO seems to have the same issue. In practice I don't think people use file system backed pages for vfio, so it's not as urgent. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA
On Wed, Jan 24, 2018 at 07:56:02PM -0800, Dan Williams wrote: > Particular people that would be useful to have in attendance are > Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd). I won't be able to make it - I'll have to do election work and count the ballots for our city council and mayor election. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA
On Wed, Jan 24, 2018 at 8:01 PM, Bart Van Asschewrote: > On Wed, 2018-01-24 at 19:56 -0800, Dan Williams wrote: >> The get_user_pages_longterm() api was recently added as a stop-gap >> measure to prevent applications from growing dependencies on the >> ability to to pin DAX-mapped filesystem blocks for RDMA indefinitely >> with no ongoing coordination with the filesystem. This 'longterm' >> pinning is also problematic for the non-DAX VMA case where the core-mm >> needs a time bounded way to revoke a pin and manipulate the physical >> pages. While existing RDMA applications have already grown the >> assumption that they can pin page-cache pages indefinitely, the fact >> that we are breaking this assumption for filesystem-dax presents an >> opportunity to deprecate the 'indefinite pin' mechanisms and move to a >> general interface that supports pin revocation. >> >> While RDMA may grow an explicit Infiniband-verb for this 'memory >> registration with lease' semantic, it seems that this problem is >> bigger than just RDMA. At LSF/MM it would be useful to have a >> discussion between fs, mm, dax, and RDMA folks about addressing this >> problem at the core level. >> >> Particular people that would be useful to have in attendance are >> Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd). > > Is on demand paging sufficient as a solution for your use case... No, in 3 dimensions since there is a need to support non-ODP RDMA hardware, hypervisors want to coordinate DMA for guests, and non-RDMA hardware also pins memory indefinitely like V4L2. So it's bigger than RDMA, but that will likely be the first consumer of this 'longterm pin' mechanism. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA
On Wed, 2018-01-24 at 19:56 -0800, Dan Williams wrote: > The get_user_pages_longterm() api was recently added as a stop-gap > measure to prevent applications from growing dependencies on the > ability to to pin DAX-mapped filesystem blocks for RDMA indefinitely > with no ongoing coordination with the filesystem. This 'longterm' > pinning is also problematic for the non-DAX VMA case where the core-mm > needs a time bounded way to revoke a pin and manipulate the physical > pages. While existing RDMA applications have already grown the > assumption that they can pin page-cache pages indefinitely, the fact > that we are breaking this assumption for filesystem-dax presents an > opportunity to deprecate the 'indefinite pin' mechanisms and move to a > general interface that supports pin revocation. > > While RDMA may grow an explicit Infiniband-verb for this 'memory > registration with lease' semantic, it seems that this problem is > bigger than just RDMA. At LSF/MM it would be useful to have a > discussion between fs, mm, dax, and RDMA folks about addressing this > problem at the core level. > > Particular people that would be useful to have in attendance are > Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd). Is on demand paging sufficient as a solution for your use case or do you perhaps need something different? See also https://www.openfabrics.org/images/eventpresos/workshops2013/2013_Workshop_Tues_0930_liss_odp.pdf Thanks, Bart. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA
The get_user_pages_longterm() api was recently added as a stop-gap measure to prevent applications from growing dependencies on the ability to to pin DAX-mapped filesystem blocks for RDMA indefinitely with no ongoing coordination with the filesystem. This 'longterm' pinning is also problematic for the non-DAX VMA case where the core-mm needs a time bounded way to revoke a pin and manipulate the physical pages. While existing RDMA applications have already grown the assumption that they can pin page-cache pages indefinitely, the fact that we are breaking this assumption for filesystem-dax presents an opportunity to deprecate the 'indefinite pin' mechanisms and move to a general interface that supports pin revocation. While RDMA may grow an explicit Infiniband-verb for this 'memory registration with lease' semantic, it seems that this problem is bigger than just RDMA. At LSF/MM it would be useful to have a discussion between fs, mm, dax, and RDMA folks about addressing this problem at the core level. Particular people that would be useful to have in attendance are Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd). ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm