Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA

2018-02-01 Thread Jason Gunthorpe
On Mon, Jan 29, 2018 at 06:33:25PM -0500, Jerome Glisse wrote:

> Between i would also like to participate, in my view the burden should
> be on GUP users, so if hardware is not ODP capable then you should at
> least be able to kill the mapping/GUP and force the hardware to redo a
> GUP if it get any more transaction on affect umem. Can non ODP hardware
> do that ? Or is it out of the question ?

For RDMA we can have the HW forcibly tear down the MR, but it is
incredibly disruptive and nobody running applications would be happy
with this outcome.

Jason
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA

2018-01-29 Thread Jerome Glisse
On Wed, Jan 24, 2018 at 07:56:02PM -0800, Dan Williams wrote:
> The get_user_pages_longterm() api was recently added as a stop-gap
> measure to prevent applications from growing dependencies on the
> ability to to pin DAX-mapped filesystem blocks for RDMA indefinitely
> with no ongoing coordination with the filesystem. This 'longterm'
> pinning is also problematic for the non-DAX VMA case where the core-mm
> needs a time bounded way to revoke a pin and manipulate the physical
> pages. While existing RDMA applications have already grown the
> assumption that they can pin page-cache pages indefinitely, the fact
> that we are breaking this assumption for filesystem-dax presents an
> opportunity to deprecate the 'indefinite pin' mechanisms and move to a
> general interface that supports pin revocation.
> 
> While RDMA may grow an explicit Infiniband-verb for this 'memory
> registration with lease' semantic, it seems that this problem is
> bigger than just RDMA. At LSF/MM it would be useful to have a
> discussion between fs, mm, dax, and RDMA folks about addressing this
> problem at the core level.
> 
> Particular people that would be useful to have in attendance are
> Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd).
> 

Between i would also like to participate, in my view the burden should
be on GUP users, so if hardware is not ODP capable then you should at
least be able to kill the mapping/GUP and force the hardware to redo a
GUP if it get any more transaction on affect umem. Can non ODP hardware
do that ? Or is it out of the question ?

Cheers,
Jérôme
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA

2018-01-26 Thread Dan Williams
On Thu, Jan 25, 2018 at 8:47 AM, Christoph Hellwig  wrote:
> On Thu, Jan 25, 2018 at 09:08:02AM -0700, Jason Gunthorpe wrote:
>> On Wed, Jan 24, 2018 at 11:23:51PM -0800, Christoph Hellwig wrote:
>> > On Wed, Jan 24, 2018 at 07:56:02PM -0800, Dan Williams wrote:
>> > > Particular people that would be useful to have in attendance are
>> > > Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd).
>> >
>> > I won't be able to make it - I'll have to do election work and
>> > count the ballots for our city council and mayor election.
>>
>> I also have a travel conflict for that week in April and cannot make
>> it.
>
> Are any of you going to be in the Bay Area in February for Usenix
> FAST / LinuxFAST?

I'll be around, but that said I still think it's worthwhile to have
this conversation at LSF/MM. While we have a plan for filesystem-dax
vs RDMA, there's still the open implications for the mm in other
scenarios. I see Michal has also proposed this topic.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA

2018-01-25 Thread Christoph Hellwig
On Thu, Jan 25, 2018 at 09:08:02AM -0700, Jason Gunthorpe wrote:
> On Wed, Jan 24, 2018 at 11:23:51PM -0800, Christoph Hellwig wrote:
> > On Wed, Jan 24, 2018 at 07:56:02PM -0800, Dan Williams wrote:
> > > Particular people that would be useful to have in attendance are
> > > Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd).
> > 
> > I won't be able to make it - I'll have to do election work and
> > count the ballots for our city council and mayor election.
> 
> I also have a travel conflict for that week in April and cannot make
> it.

Are any of you going to be in the Bay Area in February for Usenix
FAST / LinuxFAST?
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA

2018-01-25 Thread h...@infradead.org
On Thu, Jan 25, 2018 at 09:08:48AM -0700, Jason Gunthorpe wrote:
> On Wed, Jan 24, 2018 at 11:02:16PM -0800, Dan Williams wrote:
> 
> > No, in 3 dimensions since there is a need to support non-ODP RDMA
> > hardware, hypervisors want to coordinate DMA for guests, and non-RDMA
> > hardware also pins memory indefinitely like V4L2. So it's bigger than
> > RDMA, but that will likely be the first consumer of this 'longterm
> > pin' mechanism.
> 
> BTW, did you look at VFIO? I think it should also have this problem
> right?

VFIO seems to have the same issue.  In practice I don't think people
use file system backed pages for vfio, so it's not as urgent.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA

2018-01-24 Thread Christoph Hellwig
On Wed, Jan 24, 2018 at 07:56:02PM -0800, Dan Williams wrote:
> Particular people that would be useful to have in attendance are
> Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd).

I won't be able to make it - I'll have to do election work and
count the ballots for our city council and mayor election.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA

2018-01-24 Thread Dan Williams
On Wed, Jan 24, 2018 at 8:01 PM, Bart Van Assche  wrote:
> On Wed, 2018-01-24 at 19:56 -0800, Dan Williams wrote:
>> The get_user_pages_longterm() api was recently added as a stop-gap
>> measure to prevent applications from growing dependencies on the
>> ability to to pin DAX-mapped filesystem blocks for RDMA indefinitely
>> with no ongoing coordination with the filesystem. This 'longterm'
>> pinning is also problematic for the non-DAX VMA case where the core-mm
>> needs a time bounded way to revoke a pin and manipulate the physical
>> pages. While existing RDMA applications have already grown the
>> assumption that they can pin page-cache pages indefinitely, the fact
>> that we are breaking this assumption for filesystem-dax presents an
>> opportunity to deprecate the 'indefinite pin' mechanisms and move to a
>> general interface that supports pin revocation.
>>
>> While RDMA may grow an explicit Infiniband-verb for this 'memory
>> registration with lease' semantic, it seems that this problem is
>> bigger than just RDMA. At LSF/MM it would be useful to have a
>> discussion between fs, mm, dax, and RDMA folks about addressing this
>> problem at the core level.
>>
>> Particular people that would be useful to have in attendance are
>> Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd).
>
> Is on demand paging sufficient as a solution for your use case...

No, in 3 dimensions since there is a need to support non-ODP RDMA
hardware, hypervisors want to coordinate DMA for guests, and non-RDMA
hardware also pins memory indefinitely like V4L2. So it's bigger than
RDMA, but that will likely be the first consumer of this 'longterm
pin' mechanism.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA

2018-01-24 Thread Bart Van Assche
On Wed, 2018-01-24 at 19:56 -0800, Dan Williams wrote:
> The get_user_pages_longterm() api was recently added as a stop-gap
> measure to prevent applications from growing dependencies on the
> ability to to pin DAX-mapped filesystem blocks for RDMA indefinitely
> with no ongoing coordination with the filesystem. This 'longterm'
> pinning is also problematic for the non-DAX VMA case where the core-mm
> needs a time bounded way to revoke a pin and manipulate the physical
> pages. While existing RDMA applications have already grown the
> assumption that they can pin page-cache pages indefinitely, the fact
> that we are breaking this assumption for filesystem-dax presents an
> opportunity to deprecate the 'indefinite pin' mechanisms and move to a
> general interface that supports pin revocation.
> 
> While RDMA may grow an explicit Infiniband-verb for this 'memory
> registration with lease' semantic, it seems that this problem is
> bigger than just RDMA. At LSF/MM it would be useful to have a
> discussion between fs, mm, dax, and RDMA folks about addressing this
> problem at the core level.
> 
> Particular people that would be useful to have in attendance are
> Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd).

Is on demand paging sufficient as a solution for your use case or do
you perhaps need something different? See also
https://www.openfabrics.org/images/eventpresos/workshops2013/2013_Workshop_Tues_0930_liss_odp.pdf

Thanks,

Bart.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[LSF/MM TOPIC] Filesystem-DAX, page-pinning, and RDMA

2018-01-24 Thread Dan Williams
The get_user_pages_longterm() api was recently added as a stop-gap
measure to prevent applications from growing dependencies on the
ability to to pin DAX-mapped filesystem blocks for RDMA indefinitely
with no ongoing coordination with the filesystem. This 'longterm'
pinning is also problematic for the non-DAX VMA case where the core-mm
needs a time bounded way to revoke a pin and manipulate the physical
pages. While existing RDMA applications have already grown the
assumption that they can pin page-cache pages indefinitely, the fact
that we are breaking this assumption for filesystem-dax presents an
opportunity to deprecate the 'indefinite pin' mechanisms and move to a
general interface that supports pin revocation.

While RDMA may grow an explicit Infiniband-verb for this 'memory
registration with lease' semantic, it seems that this problem is
bigger than just RDMA. At LSF/MM it would be useful to have a
discussion between fs, mm, dax, and RDMA folks about addressing this
problem at the core level.

Particular people that would be useful to have in attendance are
Michal Hocko, Christoph Hellwig, and Jason Gunthorpe (cc'd).
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm