Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-03 Thread John Hubbard
On 10/3/18 9:27 AM, Jan Kara wrote:
> On Fri 28-09-18 20:12:33, John Hubbard wrote:
>>  static inline void release_user_pages(struct page **pages,
>> - unsigned long npages)
>> + unsigned long npages,
>> + bool set_dirty)
>>  {
>> -   while (npages)
>> -   put_user_page(pages[--npages]);
>> +   if (set_dirty)
>> +   release_user_pages_dirty(pages, npages);
>> +   else
>> +   release_user_pages_basic(pages, npages);
>> +}
> 
> Is there a good reason to have this with set_dirty argument? Generally bool
> arguments are not great for readability (or greppability for that matter).
> Also in this case callers can just as easily do:
>   if (set_dirty)
>   release_user_pages_dirty(...);
>   else
>   release_user_pages(...);
> 
> And furthermore it makes the code author think more whether he needs
> set_page_dirty() or set_page_dirty_lock(), rather than just passing 'true'
> and hoping the function magically does the right thing for him.
> 

Ha, I went through *precisely* that argument in my head, too--and then
got seduced with the idea that it pretties up the existing calling code, 
because it's a drop-in one-liner at the call sites. But yes, I'll change it 
back to omit the bool set_dirty argument.

thanks,
-- 
John Hubbard
NVIDIA


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-03 Thread John Hubbard
On 10/3/18 9:27 AM, Jan Kara wrote:
> On Fri 28-09-18 20:12:33, John Hubbard wrote:
>>  static inline void release_user_pages(struct page **pages,
>> - unsigned long npages)
>> + unsigned long npages,
>> + bool set_dirty)
>>  {
>> -   while (npages)
>> -   put_user_page(pages[--npages]);
>> +   if (set_dirty)
>> +   release_user_pages_dirty(pages, npages);
>> +   else
>> +   release_user_pages_basic(pages, npages);
>> +}
> 
> Is there a good reason to have this with set_dirty argument? Generally bool
> arguments are not great for readability (or greppability for that matter).
> Also in this case callers can just as easily do:
>   if (set_dirty)
>   release_user_pages_dirty(...);
>   else
>   release_user_pages(...);
> 
> And furthermore it makes the code author think more whether he needs
> set_page_dirty() or set_page_dirty_lock(), rather than just passing 'true'
> and hoping the function magically does the right thing for him.
> 

Ha, I went through *precisely* that argument in my head, too--and then
got seduced with the idea that it pretties up the existing calling code, 
because it's a drop-in one-liner at the call sites. But yes, I'll change it 
back to omit the bool set_dirty argument.

thanks,
-- 
John Hubbard
NVIDIA


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-03 Thread Jan Kara
On Fri 28-09-18 20:12:33, John Hubbard wrote:
>  static inline void release_user_pages(struct page **pages,
> - unsigned long npages)
> + unsigned long npages,
> + bool set_dirty)
>  {
> -   while (npages)
> -   put_user_page(pages[--npages]);
> +   if (set_dirty)
> +   release_user_pages_dirty(pages, npages);
> +   else
> +   release_user_pages_basic(pages, npages);
> +}

Is there a good reason to have this with set_dirty argument? Generally bool
arguments are not great for readability (or greppability for that matter).
Also in this case callers can just as easily do:
if (set_dirty)
release_user_pages_dirty(...);
else
release_user_pages(...);

And furthermore it makes the code author think more whether he needs
set_page_dirty() or set_page_dirty_lock(), rather than just passing 'true'
and hoping the function magically does the right thing for him.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-03 Thread Jan Kara
On Fri 28-09-18 20:12:33, John Hubbard wrote:
>  static inline void release_user_pages(struct page **pages,
> - unsigned long npages)
> + unsigned long npages,
> + bool set_dirty)
>  {
> -   while (npages)
> -   put_user_page(pages[--npages]);
> +   if (set_dirty)
> +   release_user_pages_dirty(pages, npages);
> +   else
> +   release_user_pages_basic(pages, npages);
> +}

Is there a good reason to have this with set_dirty argument? Generally bool
arguments are not great for readability (or greppability for that matter).
Also in this case callers can just as easily do:
if (set_dirty)
release_user_pages_dirty(...);
else
release_user_pages(...);

And furthermore it makes the code author think more whether he needs
set_page_dirty() or set_page_dirty_lock(), rather than just passing 'true'
and hoping the function magically does the right thing for him.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-02 Thread John Hubbard
On 10/1/18 7:35 AM, Dennis Dalessandro wrote:
> On 9/28/2018 11:12 PM, John Hubbard wrote:
>> On 9/28/18 8:39 AM, Jason Gunthorpe wrote:
>>> On Thu, Sep 27, 2018 at 10:39:47PM -0700, john.hubb...@gmail.com wrote:
 From: John Hubbard 
>> [...]

 diff --git a/drivers/infiniband/core/umem.c 
 b/drivers/infiniband/core/umem.c
 index a41792dbae1f..9430d697cb9f 100644
 +++ b/drivers/infiniband/core/umem.c
 @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, 
 struct ib_umem *umem, int d
   page = sg_page(sg);
   if (!PageDirty(page) && umem->writable && dirty)
   set_page_dirty_lock(page);
 -    put_page(page);
 +    put_user_page(page);
>>>
>>> Would it make sense to have a release/put_user_pages_dirtied to absorb
>>> the set_page_dity pattern too? I notice in this patch there is some
>>> variety here, I wonder what is the right way?
>>>
>>> Also, I'm told this code here is a big performance bottleneck when the
>>> number of pages becomes very long (think >> GB of memory), so having a
>>> future path to use some kind of batching/threading sound great.
>>>
>>
>> Yes. And you asked for this the first time, too. Consistent! :) Sorry for
>> being slow to pick it up. It looks like there are several patterns, and
>> we have to support both set_page_dirty() and set_page_dirty_lock(). So
>> the best combination looks to be adding a few variations of
>> release_user_pages*(), but leaving put_user_page() alone, because it's
>> the "do it yourself" basic one. Scatter-gather will be stuck with that.
>>
>> Here's a differential patch with that, that shows a nice little cleanup in
>> a couple of IB places, and as you point out, it also provides the hooks for
>> performance upgrades (via batching) in the future.
>>
>> Does this API look about right?
> 
> I'm on board with that and the changes to hfi1 and qib.
> 
> Reviewed-by: Dennis Dalessandro 

Hi Dennis, thanks for the review!

I'll add those new routines in and send out a v2 soon, now that it appears, 
from 
the recent discussion, that this aspect of the approach is still viable.


thanks,
-- 
John Hubbard
NVIDIA


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-02 Thread John Hubbard
On 10/1/18 7:35 AM, Dennis Dalessandro wrote:
> On 9/28/2018 11:12 PM, John Hubbard wrote:
>> On 9/28/18 8:39 AM, Jason Gunthorpe wrote:
>>> On Thu, Sep 27, 2018 at 10:39:47PM -0700, john.hubb...@gmail.com wrote:
 From: John Hubbard 
>> [...]

 diff --git a/drivers/infiniband/core/umem.c 
 b/drivers/infiniband/core/umem.c
 index a41792dbae1f..9430d697cb9f 100644
 +++ b/drivers/infiniband/core/umem.c
 @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, 
 struct ib_umem *umem, int d
   page = sg_page(sg);
   if (!PageDirty(page) && umem->writable && dirty)
   set_page_dirty_lock(page);
 -    put_page(page);
 +    put_user_page(page);
>>>
>>> Would it make sense to have a release/put_user_pages_dirtied to absorb
>>> the set_page_dity pattern too? I notice in this patch there is some
>>> variety here, I wonder what is the right way?
>>>
>>> Also, I'm told this code here is a big performance bottleneck when the
>>> number of pages becomes very long (think >> GB of memory), so having a
>>> future path to use some kind of batching/threading sound great.
>>>
>>
>> Yes. And you asked for this the first time, too. Consistent! :) Sorry for
>> being slow to pick it up. It looks like there are several patterns, and
>> we have to support both set_page_dirty() and set_page_dirty_lock(). So
>> the best combination looks to be adding a few variations of
>> release_user_pages*(), but leaving put_user_page() alone, because it's
>> the "do it yourself" basic one. Scatter-gather will be stuck with that.
>>
>> Here's a differential patch with that, that shows a nice little cleanup in
>> a couple of IB places, and as you point out, it also provides the hooks for
>> performance upgrades (via batching) in the future.
>>
>> Does this API look about right?
> 
> I'm on board with that and the changes to hfi1 and qib.
> 
> Reviewed-by: Dennis Dalessandro 

Hi Dennis, thanks for the review!

I'll add those new routines in and send out a v2 soon, now that it appears, 
from 
the recent discussion, that this aspect of the approach is still viable.


thanks,
-- 
John Hubbard
NVIDIA


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-01 Thread Christoph Hellwig
On Mon, Oct 01, 2018 at 08:29:29AM -0700, Matthew Wilcox wrote:
> I don't understand the dislike of the sg list.  Other than for special
> cases which we should't be optimising for (ramfs, brd, loopback
> filesystems), when we get a page to do I/O, we're going to want a dma
> mapping for them.  It makes sense to already allocate space to store
> the mapping at the outset.

We don't actually need the space - the scatterlist forces it on us,
otherwise we could translate directly in the on-disk format and
save that duplicate space.  I have prototypes for NVMe and RDMA that do
away with the scatterlist entirely.

And even if we are still using the scatterlist as we do right now we'd
need a second scatterlist at least for block / file system based I/O
as we can't plug the scatterlist into the I/O stack (nevermind that
due to splitting merging the lower one might not map 1:1 to the upper
one).

> [1] Can we ever admit that the bio_vec and the skb_frag_t are actually
> the same thing?

When I brought this up years ago the networking folks insisted that
their use of u16 offset/size fields was important for performance,
while for bio_vecs we needed the larger ones for some cases.  Since
then networking switched to 32-bit fields for what is now the fast
path, so it might be worth to give it another spin.

Than should also help with using my new bio_vec based dma-mapping
helpers to batch iommu mappings in networking, which Jesper had on
his todo list as all the indirect calls are causing performance
issues.


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-01 Thread Christoph Hellwig
On Mon, Oct 01, 2018 at 08:29:29AM -0700, Matthew Wilcox wrote:
> I don't understand the dislike of the sg list.  Other than for special
> cases which we should't be optimising for (ramfs, brd, loopback
> filesystems), when we get a page to do I/O, we're going to want a dma
> mapping for them.  It makes sense to already allocate space to store
> the mapping at the outset.

We don't actually need the space - the scatterlist forces it on us,
otherwise we could translate directly in the on-disk format and
save that duplicate space.  I have prototypes for NVMe and RDMA that do
away with the scatterlist entirely.

And even if we are still using the scatterlist as we do right now we'd
need a second scatterlist at least for block / file system based I/O
as we can't plug the scatterlist into the I/O stack (nevermind that
due to splitting merging the lower one might not map 1:1 to the upper
one).

> [1] Can we ever admit that the bio_vec and the skb_frag_t are actually
> the same thing?

When I brought this up years ago the networking folks insisted that
their use of u16 offset/size fields was important for performance,
while for bio_vecs we needed the larger ones for some cases.  Since
then networking switched to 32-bit fields for what is now the fast
path, so it might be worth to give it another spin.

Than should also help with using my new bio_vec based dma-mapping
helpers to batch iommu mappings in networking, which Jesper had on
his todo list as all the indirect calls are causing performance
issues.


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-01 Thread Matthew Wilcox
On Mon, Oct 01, 2018 at 05:50:13AM -0700, Christoph Hellwig wrote:
> On Sat, Sep 29, 2018 at 09:21:17AM -0700, Matthew Wilcox wrote:
> > > being slow to pick it up. It looks like there are several patterns, and
> > > we have to support both set_page_dirty() and set_page_dirty_lock(). So
> > > the best combination looks to be adding a few variations of
> > > release_user_pages*(), but leaving put_user_page() alone, because it's
> > > the "do it yourself" basic one. Scatter-gather will be stuck with that.
> > 
> > I think our current interfaces are wrong.  We should really have a
> > get_user_sg() / put_user_sg() function that will set up / destroy an
> > SG list appropriate for that range of user memory.  This is almost
> > orthogonal to the original intent here, so please don't see this as a
> > "must do first" kind of argument that might derail the whole thing.
> 
> The SG list really is the wrong interface, as it mixes up information
> about the pages/phys addr range and a potential dma mapping.  I think
> the right interface is an array of bio_vecs.  In fact I've recently
> been looking into a get_user_pages variant that does fill bio_vecs,
> as it fundamentally is the right thing for doing I/O on large pages,
> and will really help with direct I/O performance in that case.

I don't think the bio_vec is really a big improvement; it's just a (page,
offset, length) tuple.  Not to mention that due to the annoying divergence
between block and networking [1] this is actually a less useful interface.

I don't understand the dislike of the sg list.  Other than for special
cases which we should't be optimising for (ramfs, brd, loopback
filesystems), when we get a page to do I/O, we're going to want a dma
mapping for them.  It makes sense to already allocate space to store
the mapping at the outset.

[1] Can we ever admit that the bio_vec and the skb_frag_t are actually
the same thing?


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-01 Thread Matthew Wilcox
On Mon, Oct 01, 2018 at 05:50:13AM -0700, Christoph Hellwig wrote:
> On Sat, Sep 29, 2018 at 09:21:17AM -0700, Matthew Wilcox wrote:
> > > being slow to pick it up. It looks like there are several patterns, and
> > > we have to support both set_page_dirty() and set_page_dirty_lock(). So
> > > the best combination looks to be adding a few variations of
> > > release_user_pages*(), but leaving put_user_page() alone, because it's
> > > the "do it yourself" basic one. Scatter-gather will be stuck with that.
> > 
> > I think our current interfaces are wrong.  We should really have a
> > get_user_sg() / put_user_sg() function that will set up / destroy an
> > SG list appropriate for that range of user memory.  This is almost
> > orthogonal to the original intent here, so please don't see this as a
> > "must do first" kind of argument that might derail the whole thing.
> 
> The SG list really is the wrong interface, as it mixes up information
> about the pages/phys addr range and a potential dma mapping.  I think
> the right interface is an array of bio_vecs.  In fact I've recently
> been looking into a get_user_pages variant that does fill bio_vecs,
> as it fundamentally is the right thing for doing I/O on large pages,
> and will really help with direct I/O performance in that case.

I don't think the bio_vec is really a big improvement; it's just a (page,
offset, length) tuple.  Not to mention that due to the annoying divergence
between block and networking [1] this is actually a less useful interface.

I don't understand the dislike of the sg list.  Other than for special
cases which we should't be optimising for (ramfs, brd, loopback
filesystems), when we get a page to do I/O, we're going to want a dma
mapping for them.  It makes sense to already allocate space to store
the mapping at the outset.

[1] Can we ever admit that the bio_vec and the skb_frag_t are actually
the same thing?


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-01 Thread Dennis Dalessandro

On 9/28/2018 11:12 PM, John Hubbard wrote:

On 9/28/18 8:39 AM, Jason Gunthorpe wrote:

On Thu, Sep 27, 2018 at 10:39:47PM -0700, john.hubb...@gmail.com wrote:

From: John Hubbard 

[...]


diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index a41792dbae1f..9430d697cb9f 100644
+++ b/drivers/infiniband/core/umem.c
@@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
page = sg_page(sg);
if (!PageDirty(page) && umem->writable && dirty)
set_page_dirty_lock(page);
-   put_page(page);
+   put_user_page(page);


Would it make sense to have a release/put_user_pages_dirtied to absorb
the set_page_dity pattern too? I notice in this patch there is some
variety here, I wonder what is the right way?

Also, I'm told this code here is a big performance bottleneck when the
number of pages becomes very long (think >> GB of memory), so having a
future path to use some kind of batching/threading sound great.



Yes. And you asked for this the first time, too. Consistent! :) Sorry for
being slow to pick it up. It looks like there are several patterns, and
we have to support both set_page_dirty() and set_page_dirty_lock(). So
the best combination looks to be adding a few variations of
release_user_pages*(), but leaving put_user_page() alone, because it's
the "do it yourself" basic one. Scatter-gather will be stuck with that.

Here's a differential patch with that, that shows a nice little cleanup in
a couple of IB places, and as you point out, it also provides the hooks for
performance upgrades (via batching) in the future.

Does this API look about right?


I'm on board with that and the changes to hfi1 and qib.

Reviewed-by: Dennis Dalessandro 



Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-01 Thread Dennis Dalessandro

On 9/28/2018 11:12 PM, John Hubbard wrote:

On 9/28/18 8:39 AM, Jason Gunthorpe wrote:

On Thu, Sep 27, 2018 at 10:39:47PM -0700, john.hubb...@gmail.com wrote:

From: John Hubbard 

[...]


diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index a41792dbae1f..9430d697cb9f 100644
+++ b/drivers/infiniband/core/umem.c
@@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
page = sg_page(sg);
if (!PageDirty(page) && umem->writable && dirty)
set_page_dirty_lock(page);
-   put_page(page);
+   put_user_page(page);


Would it make sense to have a release/put_user_pages_dirtied to absorb
the set_page_dity pattern too? I notice in this patch there is some
variety here, I wonder what is the right way?

Also, I'm told this code here is a big performance bottleneck when the
number of pages becomes very long (think >> GB of memory), so having a
future path to use some kind of batching/threading sound great.



Yes. And you asked for this the first time, too. Consistent! :) Sorry for
being slow to pick it up. It looks like there are several patterns, and
we have to support both set_page_dirty() and set_page_dirty_lock(). So
the best combination looks to be adding a few variations of
release_user_pages*(), but leaving put_user_page() alone, because it's
the "do it yourself" basic one. Scatter-gather will be stuck with that.

Here's a differential patch with that, that shows a nice little cleanup in
a couple of IB places, and as you point out, it also provides the hooks for
performance upgrades (via batching) in the future.

Does this API look about right?


I'm on board with that and the changes to hfi1 and qib.

Reviewed-by: Dennis Dalessandro 



Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-01 Thread Christoph Hellwig
On Sat, Sep 29, 2018 at 09:21:17AM -0700, Matthew Wilcox wrote:
> > being slow to pick it up. It looks like there are several patterns, and
> > we have to support both set_page_dirty() and set_page_dirty_lock(). So
> > the best combination looks to be adding a few variations of
> > release_user_pages*(), but leaving put_user_page() alone, because it's
> > the "do it yourself" basic one. Scatter-gather will be stuck with that.
> 
> I think our current interfaces are wrong.  We should really have a
> get_user_sg() / put_user_sg() function that will set up / destroy an
> SG list appropriate for that range of user memory.  This is almost
> orthogonal to the original intent here, so please don't see this as a
> "must do first" kind of argument that might derail the whole thing.

The SG list really is the wrong interface, as it mixes up information
about the pages/phys addr range and a potential dma mapping.  I think
the right interface is an array of bio_vecs.  In fact I've recently
been looking into a get_user_pages variant that does fill bio_vecs,
as it fundamentally is the right thing for doing I/O on large pages,
and will really help with direct I/O performance in that case.


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-10-01 Thread Christoph Hellwig
On Sat, Sep 29, 2018 at 09:21:17AM -0700, Matthew Wilcox wrote:
> > being slow to pick it up. It looks like there are several patterns, and
> > we have to support both set_page_dirty() and set_page_dirty_lock(). So
> > the best combination looks to be adding a few variations of
> > release_user_pages*(), but leaving put_user_page() alone, because it's
> > the "do it yourself" basic one. Scatter-gather will be stuck with that.
> 
> I think our current interfaces are wrong.  We should really have a
> get_user_sg() / put_user_sg() function that will set up / destroy an
> SG list appropriate for that range of user memory.  This is almost
> orthogonal to the original intent here, so please don't see this as a
> "must do first" kind of argument that might derail the whole thing.

The SG list really is the wrong interface, as it mixes up information
about the pages/phys addr range and a potential dma mapping.  I think
the right interface is an array of bio_vecs.  In fact I've recently
been looking into a get_user_pages variant that does fill bio_vecs,
as it fundamentally is the right thing for doing I/O on large pages,
and will really help with direct I/O performance in that case.


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-09-29 Thread Jason Gunthorpe
On Sat, Sep 29, 2018 at 09:21:17AM -0700, Matthew Wilcox wrote:
> On Fri, Sep 28, 2018 at 08:12:33PM -0700, John Hubbard wrote:
> > >> +++ b/drivers/infiniband/core/umem.c
> > >> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, 
> > >> struct ib_umem *umem, int d
> > >>  page = sg_page(sg);
> > >>  if (!PageDirty(page) && umem->writable && dirty)
> > >>  set_page_dirty_lock(page);
> > >> -put_page(page);
> > >> +put_user_page(page);
> > > 
> > > Would it make sense to have a release/put_user_pages_dirtied to absorb
> > > the set_page_dity pattern too? I notice in this patch there is some
> > > variety here, I wonder what is the right way?
> > > 
> > > Also, I'm told this code here is a big performance bottleneck when the
> > > number of pages becomes very long (think >> GB of memory), so having a
> > > future path to use some kind of batching/threading sound great.
> > 
> > Yes. And you asked for this the first time, too. Consistent! :) Sorry for
> > being slow to pick it up. It looks like there are several patterns, and
> > we have to support both set_page_dirty() and set_page_dirty_lock(). So
> > the best combination looks to be adding a few variations of
> > release_user_pages*(), but leaving put_user_page() alone, because it's
> > the "do it yourself" basic one. Scatter-gather will be stuck with that.
> 
> I think our current interfaces are wrong.  We should really have a
> get_user_sg() / put_user_sg() function that will set up / destroy an
> SG list appropriate for that range of user memory.  This is almost
> orthogonal to the original intent here, so please don't see this as a
> "must do first" kind of argument that might derail the whole thing.

This would be an excellent API, and it is exactly what this 'umem'
code in RDMA is trying to do for RDMA drivers..

I suspect it could do a much better job if it wasn't hindered by the
get_user_pages API - ie it could directly stuff huge pages into the SG
list instead of breaking them up, maybe also avoid the temporary memory
allocations for page * pointers, etc?

Jason


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-09-29 Thread Jason Gunthorpe
On Sat, Sep 29, 2018 at 09:21:17AM -0700, Matthew Wilcox wrote:
> On Fri, Sep 28, 2018 at 08:12:33PM -0700, John Hubbard wrote:
> > >> +++ b/drivers/infiniband/core/umem.c
> > >> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, 
> > >> struct ib_umem *umem, int d
> > >>  page = sg_page(sg);
> > >>  if (!PageDirty(page) && umem->writable && dirty)
> > >>  set_page_dirty_lock(page);
> > >> -put_page(page);
> > >> +put_user_page(page);
> > > 
> > > Would it make sense to have a release/put_user_pages_dirtied to absorb
> > > the set_page_dity pattern too? I notice in this patch there is some
> > > variety here, I wonder what is the right way?
> > > 
> > > Also, I'm told this code here is a big performance bottleneck when the
> > > number of pages becomes very long (think >> GB of memory), so having a
> > > future path to use some kind of batching/threading sound great.
> > 
> > Yes. And you asked for this the first time, too. Consistent! :) Sorry for
> > being slow to pick it up. It looks like there are several patterns, and
> > we have to support both set_page_dirty() and set_page_dirty_lock(). So
> > the best combination looks to be adding a few variations of
> > release_user_pages*(), but leaving put_user_page() alone, because it's
> > the "do it yourself" basic one. Scatter-gather will be stuck with that.
> 
> I think our current interfaces are wrong.  We should really have a
> get_user_sg() / put_user_sg() function that will set up / destroy an
> SG list appropriate for that range of user memory.  This is almost
> orthogonal to the original intent here, so please don't see this as a
> "must do first" kind of argument that might derail the whole thing.

This would be an excellent API, and it is exactly what this 'umem'
code in RDMA is trying to do for RDMA drivers..

I suspect it could do a much better job if it wasn't hindered by the
get_user_pages API - ie it could directly stuff huge pages into the SG
list instead of breaking them up, maybe also avoid the temporary memory
allocations for page * pointers, etc?

Jason


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-09-29 Thread Matthew Wilcox
On Fri, Sep 28, 2018 at 08:12:33PM -0700, John Hubbard wrote:
> >> +++ b/drivers/infiniband/core/umem.c
> >> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, 
> >> struct ib_umem *umem, int d
> >>page = sg_page(sg);
> >>if (!PageDirty(page) && umem->writable && dirty)
> >>set_page_dirty_lock(page);
> >> -  put_page(page);
> >> +  put_user_page(page);
> > 
> > Would it make sense to have a release/put_user_pages_dirtied to absorb
> > the set_page_dity pattern too? I notice in this patch there is some
> > variety here, I wonder what is the right way?
> > 
> > Also, I'm told this code here is a big performance bottleneck when the
> > number of pages becomes very long (think >> GB of memory), so having a
> > future path to use some kind of batching/threading sound great.
> 
> Yes. And you asked for this the first time, too. Consistent! :) Sorry for
> being slow to pick it up. It looks like there are several patterns, and
> we have to support both set_page_dirty() and set_page_dirty_lock(). So
> the best combination looks to be adding a few variations of
> release_user_pages*(), but leaving put_user_page() alone, because it's
> the "do it yourself" basic one. Scatter-gather will be stuck with that.

I think our current interfaces are wrong.  We should really have a
get_user_sg() / put_user_sg() function that will set up / destroy an
SG list appropriate for that range of user memory.  This is almost
orthogonal to the original intent here, so please don't see this as a
"must do first" kind of argument that might derail the whole thing.


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-09-29 Thread Matthew Wilcox
On Fri, Sep 28, 2018 at 08:12:33PM -0700, John Hubbard wrote:
> >> +++ b/drivers/infiniband/core/umem.c
> >> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, 
> >> struct ib_umem *umem, int d
> >>page = sg_page(sg);
> >>if (!PageDirty(page) && umem->writable && dirty)
> >>set_page_dirty_lock(page);
> >> -  put_page(page);
> >> +  put_user_page(page);
> > 
> > Would it make sense to have a release/put_user_pages_dirtied to absorb
> > the set_page_dity pattern too? I notice in this patch there is some
> > variety here, I wonder what is the right way?
> > 
> > Also, I'm told this code here is a big performance bottleneck when the
> > number of pages becomes very long (think >> GB of memory), so having a
> > future path to use some kind of batching/threading sound great.
> 
> Yes. And you asked for this the first time, too. Consistent! :) Sorry for
> being slow to pick it up. It looks like there are several patterns, and
> we have to support both set_page_dirty() and set_page_dirty_lock(). So
> the best combination looks to be adding a few variations of
> release_user_pages*(), but leaving put_user_page() alone, because it's
> the "do it yourself" basic one. Scatter-gather will be stuck with that.

I think our current interfaces are wrong.  We should really have a
get_user_sg() / put_user_sg() function that will set up / destroy an
SG list appropriate for that range of user memory.  This is almost
orthogonal to the original intent here, so please don't see this as a
"must do first" kind of argument that might derail the whole thing.


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-09-28 Thread John Hubbard
On 9/28/18 8:39 AM, Jason Gunthorpe wrote:
> On Thu, Sep 27, 2018 at 10:39:47PM -0700, john.hubb...@gmail.com wrote:
>> From: John Hubbard 
[...]
>>
>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> index a41792dbae1f..9430d697cb9f 100644
>> +++ b/drivers/infiniband/core/umem.c
>> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, 
>> struct ib_umem *umem, int d
>>  page = sg_page(sg);
>>  if (!PageDirty(page) && umem->writable && dirty)
>>  set_page_dirty_lock(page);
>> -put_page(page);
>> +put_user_page(page);
> 
> Would it make sense to have a release/put_user_pages_dirtied to absorb
> the set_page_dity pattern too? I notice in this patch there is some
> variety here, I wonder what is the right way?
> 
> Also, I'm told this code here is a big performance bottleneck when the
> number of pages becomes very long (think >> GB of memory), so having a
> future path to use some kind of batching/threading sound great.
> 

Yes. And you asked for this the first time, too. Consistent! :) Sorry for
being slow to pick it up. It looks like there are several patterns, and
we have to support both set_page_dirty() and set_page_dirty_lock(). So
the best combination looks to be adding a few variations of
release_user_pages*(), but leaving put_user_page() alone, because it's
the "do it yourself" basic one. Scatter-gather will be stuck with that.

Here's a differential patch with that, that shows a nice little cleanup in 
a couple of IB places, and as you point out, it also provides the hooks for 
performance upgrades (via batching) in the future.

Does this API look about right?

diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index c7516029af33..48afec362c31 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -123,11 +123,7 @@ void hfi1_release_user_pages(struct mm_struct *mm, struct 
page **p,
 {
size_t i;
 
-   for (i = 0; i < npages; i++) {
-   if (dirty)
-   set_page_dirty_lock(p[i]);
-   put_user_page(p[i]);
-   }
+   release_user_pages_lock(p, npages, dirty);
 
if (mm) { /* during close after signal, mm can be NULL */
down_write(>mmap_sem);
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c 
b/drivers/infiniband/hw/qib/qib_user_pages.c
index 3f8fd42dd7fc..c57a3a6730b6 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -40,13 +40,7 @@
 static void __qib_release_user_pages(struct page **p, size_t num_pages,
 int dirty)
 {
-   size_t i;
-
-   for (i = 0; i < num_pages; i++) {
-   if (dirty)
-   set_page_dirty_lock(p[i]);
-   put_user_page(p[i]);
-   }
+   release_user_pages_lock(p, num_pages, dirty);
 }
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 72caf803115f..b280d0181e06 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -138,6 +138,9 @@ extern int overcommit_ratio_handler(struct ctl_table *, 
int, void __user *,
 extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *,
size_t *, loff_t *);
 
+int set_page_dirty(struct page *page);
+int set_page_dirty_lock(struct page *page);
+
 #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
 
 /* to align the pointer to the (next) page boundary */
@@ -949,12 +952,56 @@ static inline void put_user_page(struct page *page)
put_page(page);
 }
 
-/* A drop-in replacement for release_pages(): */
+/* For get_user_pages*()-pinned pages, use these variants instead of
+ * release_pages():
+ */
+static inline void release_user_pages_dirty(struct page **pages,
+   unsigned long npages)
+{
+   while (npages) {
+   set_page_dirty(pages[npages]);
+   put_user_page(pages[npages]);
+   --npages;
+   }
+}
+
+static inline void release_user_pages_dirty_lock(struct page **pages,
+unsigned long npages)
+{
+   while (npages) {
+   set_page_dirty_lock(pages[npages]);
+   put_user_page(pages[npages]);
+   --npages;
+   }
+}
+
+static inline void release_user_pages_basic(struct page **pages,
+   unsigned long npages)
+{
+   while (npages) {
+   put_user_page(pages[npages]);
+   --npages;
+   }
+}
+
 static inline void release_user_pages(struct page **pages,
- unsigned long npages)
+ unsigned long npages,
+ bool set_dirty)
 {
-   while (npages)
-   

Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-09-28 Thread John Hubbard
On 9/28/18 8:39 AM, Jason Gunthorpe wrote:
> On Thu, Sep 27, 2018 at 10:39:47PM -0700, john.hubb...@gmail.com wrote:
>> From: John Hubbard 
[...]
>>
>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> index a41792dbae1f..9430d697cb9f 100644
>> +++ b/drivers/infiniband/core/umem.c
>> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, 
>> struct ib_umem *umem, int d
>>  page = sg_page(sg);
>>  if (!PageDirty(page) && umem->writable && dirty)
>>  set_page_dirty_lock(page);
>> -put_page(page);
>> +put_user_page(page);
> 
> Would it make sense to have a release/put_user_pages_dirtied to absorb
> the set_page_dity pattern too? I notice in this patch there is some
> variety here, I wonder what is the right way?
> 
> Also, I'm told this code here is a big performance bottleneck when the
> number of pages becomes very long (think >> GB of memory), so having a
> future path to use some kind of batching/threading sound great.
> 

Yes. And you asked for this the first time, too. Consistent! :) Sorry for
being slow to pick it up. It looks like there are several patterns, and
we have to support both set_page_dirty() and set_page_dirty_lock(). So
the best combination looks to be adding a few variations of
release_user_pages*(), but leaving put_user_page() alone, because it's
the "do it yourself" basic one. Scatter-gather will be stuck with that.

Here's a differential patch with that, that shows a nice little cleanup in 
a couple of IB places, and as you point out, it also provides the hooks for 
performance upgrades (via batching) in the future.

Does this API look about right?

diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index c7516029af33..48afec362c31 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -123,11 +123,7 @@ void hfi1_release_user_pages(struct mm_struct *mm, struct 
page **p,
 {
size_t i;
 
-   for (i = 0; i < npages; i++) {
-   if (dirty)
-   set_page_dirty_lock(p[i]);
-   put_user_page(p[i]);
-   }
+   release_user_pages_lock(p, npages, dirty);
 
if (mm) { /* during close after signal, mm can be NULL */
down_write(>mmap_sem);
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c 
b/drivers/infiniband/hw/qib/qib_user_pages.c
index 3f8fd42dd7fc..c57a3a6730b6 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -40,13 +40,7 @@
 static void __qib_release_user_pages(struct page **p, size_t num_pages,
 int dirty)
 {
-   size_t i;
-
-   for (i = 0; i < num_pages; i++) {
-   if (dirty)
-   set_page_dirty_lock(p[i]);
-   put_user_page(p[i]);
-   }
+   release_user_pages_lock(p, num_pages, dirty);
 }
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 72caf803115f..b280d0181e06 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -138,6 +138,9 @@ extern int overcommit_ratio_handler(struct ctl_table *, 
int, void __user *,
 extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *,
size_t *, loff_t *);
 
+int set_page_dirty(struct page *page);
+int set_page_dirty_lock(struct page *page);
+
 #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
 
 /* to align the pointer to the (next) page boundary */
@@ -949,12 +952,56 @@ static inline void put_user_page(struct page *page)
put_page(page);
 }
 
-/* A drop-in replacement for release_pages(): */
+/* For get_user_pages*()-pinned pages, use these variants instead of
+ * release_pages():
+ */
+static inline void release_user_pages_dirty(struct page **pages,
+   unsigned long npages)
+{
+   while (npages) {
+   set_page_dirty(pages[npages]);
+   put_user_page(pages[npages]);
+   --npages;
+   }
+}
+
+static inline void release_user_pages_dirty_lock(struct page **pages,
+unsigned long npages)
+{
+   while (npages) {
+   set_page_dirty_lock(pages[npages]);
+   put_user_page(pages[npages]);
+   --npages;
+   }
+}
+
+static inline void release_user_pages_basic(struct page **pages,
+   unsigned long npages)
+{
+   while (npages) {
+   put_user_page(pages[npages]);
+   --npages;
+   }
+}
+
 static inline void release_user_pages(struct page **pages,
- unsigned long npages)
+ unsigned long npages,
+ bool set_dirty)
 {
-   while (npages)
-   

Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-09-28 Thread Jason Gunthorpe
On Thu, Sep 27, 2018 at 10:39:47PM -0700, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> For code that retains pages via get_user_pages*(),
> release those pages via the new put_user_page(),
> instead of put_page().
> 
> This prepares for eventually fixing the problem described
> in [1], and is following a plan listed in [2].
> 
> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
> 
> [2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubb...@nvidia.com
> Proposed steps for fixing get_user_pages() + DMA problems.
> 
> CC: Doug Ledford 
> CC: Jason Gunthorpe 
> CC: Mike Marciniszyn 
> CC: Dennis Dalessandro 
> CC: Christian Benvenuti 
> 
> CC: linux-r...@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: linux...@kvack.org
> Signed-off-by: John Hubbard 
>  drivers/infiniband/core/umem.c  | 2 +-
>  drivers/infiniband/core/umem_odp.c  | 2 +-
>  drivers/infiniband/hw/hfi1/user_pages.c | 2 +-
>  drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +++---
>  drivers/infiniband/hw/qib/qib_user_pages.c  | 2 +-
>  drivers/infiniband/hw/qib/qib_user_sdma.c   | 8 
>  drivers/infiniband/hw/usnic/usnic_uiom.c| 2 +-
>  7 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index a41792dbae1f..9430d697cb9f 100644
> +++ b/drivers/infiniband/core/umem.c
> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
> ib_umem *umem, int d
>   page = sg_page(sg);
>   if (!PageDirty(page) && umem->writable && dirty)
>   set_page_dirty_lock(page);
> - put_page(page);
> + put_user_page(page);

Would it make sense to have a release/put_user_pages_dirtied to absorb
the set_page_dity pattern too? I notice in this patch there is some
variety here, I wonder what is the right way?

Also, I'm told this code here is a big performance bottleneck when the
number of pages becomes very long (think >> GB of memory), so having a
future path to use some kind of batching/threading sound great.

Otherwise this RDMA part seems fine to me, there might be some minor
conflicts however. I assume you want to run this through the -mm tree?

Acked-by: Jason Gunthorpe 

Jason


Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-09-28 Thread Jason Gunthorpe
On Thu, Sep 27, 2018 at 10:39:47PM -0700, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> For code that retains pages via get_user_pages*(),
> release those pages via the new put_user_page(),
> instead of put_page().
> 
> This prepares for eventually fixing the problem described
> in [1], and is following a plan listed in [2].
> 
> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
> 
> [2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubb...@nvidia.com
> Proposed steps for fixing get_user_pages() + DMA problems.
> 
> CC: Doug Ledford 
> CC: Jason Gunthorpe 
> CC: Mike Marciniszyn 
> CC: Dennis Dalessandro 
> CC: Christian Benvenuti 
> 
> CC: linux-r...@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: linux...@kvack.org
> Signed-off-by: John Hubbard 
>  drivers/infiniband/core/umem.c  | 2 +-
>  drivers/infiniband/core/umem_odp.c  | 2 +-
>  drivers/infiniband/hw/hfi1/user_pages.c | 2 +-
>  drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +++---
>  drivers/infiniband/hw/qib/qib_user_pages.c  | 2 +-
>  drivers/infiniband/hw/qib/qib_user_sdma.c   | 8 
>  drivers/infiniband/hw/usnic/usnic_uiom.c| 2 +-
>  7 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index a41792dbae1f..9430d697cb9f 100644
> +++ b/drivers/infiniband/core/umem.c
> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
> ib_umem *umem, int d
>   page = sg_page(sg);
>   if (!PageDirty(page) && umem->writable && dirty)
>   set_page_dirty_lock(page);
> - put_page(page);
> + put_user_page(page);

Would it make sense to have a release/put_user_pages_dirtied to absorb
the set_page_dity pattern too? I notice in this patch there is some
variety here, I wonder what is the right way?

Also, I'm told this code here is a big performance bottleneck when the
number of pages becomes very long (think >> GB of memory), so having a
future path to use some kind of batching/threading sound great.

Otherwise this RDMA part seems fine to me, there might be some minor
conflicts however. I assume you want to run this through the -mm tree?

Acked-by: Jason Gunthorpe 

Jason


[PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-09-27 Thread john . hubbard
From: John Hubbard 

For code that retains pages via get_user_pages*(),
release those pages via the new put_user_page(),
instead of put_page().

This prepares for eventually fixing the problem described
in [1], and is following a plan listed in [2].

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

[2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubb...@nvidia.com
Proposed steps for fixing get_user_pages() + DMA problems.

CC: Doug Ledford 
CC: Jason Gunthorpe 
CC: Mike Marciniszyn 
CC: Dennis Dalessandro 
CC: Christian Benvenuti 

CC: linux-r...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux...@kvack.org
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c  | 2 +-
 drivers/infiniband/core/umem_odp.c  | 2 +-
 drivers/infiniband/hw/hfi1/user_pages.c | 2 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +++---
 drivers/infiniband/hw/qib/qib_user_pages.c  | 2 +-
 drivers/infiniband/hw/qib/qib_user_sdma.c   | 8 
 drivers/infiniband/hw/usnic/usnic_uiom.c| 2 +-
 7 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index a41792dbae1f..9430d697cb9f 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
page = sg_page(sg);
if (!PageDirty(page) && umem->writable && dirty)
set_page_dirty_lock(page);
-   put_page(page);
+   put_user_page(page);
}
 
sg_free_table(>sg_head);
diff --git a/drivers/infiniband/core/umem_odp.c 
b/drivers/infiniband/core/umem_odp.c
index 6ec748eccff7..6227b89cf05c 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -717,7 +717,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 
user_virt, u64 bcnt,
ret = -EFAULT;
break;
}
-   put_page(local_page_list[j]);
+   put_user_page(local_page_list[j]);
continue;
}
 
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index e341e6dcc388..c7516029af33 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -126,7 +126,7 @@ void hfi1_release_user_pages(struct mm_struct *mm, struct 
page **p,
for (i = 0; i < npages; i++) {
if (dirty)
set_page_dirty_lock(p[i]);
-   put_page(p[i]);
+   put_user_page(p[i]);
}
 
if (mm) { /* during close after signal, mm can be NULL */
diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c 
b/drivers/infiniband/hw/mthca/mthca_memfree.c
index cc9c0c8ccba3..b8b12effd009 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -481,7 +481,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct 
mthca_uar *uar,
 
ret = pci_map_sg(dev->pdev, _tab->page[i].mem, 1, PCI_DMA_TODEVICE);
if (ret < 0) {
-   put_page(pages[0]);
+   put_user_page(pages[0]);
goto out;
}
 
@@ -489,7 +489,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct 
mthca_uar *uar,
 mthca_uarc_virt(dev, uar, i));
if (ret) {
pci_unmap_sg(dev->pdev, _tab->page[i].mem, 1, 
PCI_DMA_TODEVICE);
-   put_page(sg_page(_tab->page[i].mem));
+   put_user_page(sg_page(_tab->page[i].mem));
goto out;
}
 
@@ -555,7 +555,7 @@ void mthca_cleanup_user_db_tab(struct mthca_dev *dev, 
struct mthca_uar *uar,
if (db_tab->page[i].uvirt) {
mthca_UNMAP_ICM(dev, mthca_uarc_virt(dev, uar, i), 1);
pci_unmap_sg(dev->pdev, _tab->page[i].mem, 1, 
PCI_DMA_TODEVICE);
-   put_page(sg_page(_tab->page[i].mem));
+   put_user_page(sg_page(_tab->page[i].mem));
}
}
 
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c 
b/drivers/infiniband/hw/qib/qib_user_pages.c
index 16543d5e80c3..3f8fd42dd7fc 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -45,7 +45,7 @@ static void __qib_release_user_pages(struct page **p, size_t 
num_pages,
for (i = 0; i < num_pages; i++) {
if (dirty)
set_page_dirty_lock(p[i]);
-   put_page(p[i]);
+   put_user_page(p[i]);
}
 }
 
diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c 
b/drivers/infiniband/hw/qib/qib_user_sdma.c
index 926f3c8eba69..14f94d823907 

[PATCH 3/4] infiniband/mm: convert to the new put_user_page() call

2018-09-27 Thread john . hubbard
From: John Hubbard 

For code that retains pages via get_user_pages*(),
release those pages via the new put_user_page(),
instead of put_page().

This prepares for eventually fixing the problem described
in [1], and is following a plan listed in [2].

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

[2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubb...@nvidia.com
Proposed steps for fixing get_user_pages() + DMA problems.

CC: Doug Ledford 
CC: Jason Gunthorpe 
CC: Mike Marciniszyn 
CC: Dennis Dalessandro 
CC: Christian Benvenuti 

CC: linux-r...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux...@kvack.org
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c  | 2 +-
 drivers/infiniband/core/umem_odp.c  | 2 +-
 drivers/infiniband/hw/hfi1/user_pages.c | 2 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +++---
 drivers/infiniband/hw/qib/qib_user_pages.c  | 2 +-
 drivers/infiniband/hw/qib/qib_user_sdma.c   | 8 
 drivers/infiniband/hw/usnic/usnic_uiom.c| 2 +-
 7 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index a41792dbae1f..9430d697cb9f 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
page = sg_page(sg);
if (!PageDirty(page) && umem->writable && dirty)
set_page_dirty_lock(page);
-   put_page(page);
+   put_user_page(page);
}
 
sg_free_table(>sg_head);
diff --git a/drivers/infiniband/core/umem_odp.c 
b/drivers/infiniband/core/umem_odp.c
index 6ec748eccff7..6227b89cf05c 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -717,7 +717,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 
user_virt, u64 bcnt,
ret = -EFAULT;
break;
}
-   put_page(local_page_list[j]);
+   put_user_page(local_page_list[j]);
continue;
}
 
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index e341e6dcc388..c7516029af33 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -126,7 +126,7 @@ void hfi1_release_user_pages(struct mm_struct *mm, struct 
page **p,
for (i = 0; i < npages; i++) {
if (dirty)
set_page_dirty_lock(p[i]);
-   put_page(p[i]);
+   put_user_page(p[i]);
}
 
if (mm) { /* during close after signal, mm can be NULL */
diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c 
b/drivers/infiniband/hw/mthca/mthca_memfree.c
index cc9c0c8ccba3..b8b12effd009 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -481,7 +481,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct 
mthca_uar *uar,
 
ret = pci_map_sg(dev->pdev, _tab->page[i].mem, 1, PCI_DMA_TODEVICE);
if (ret < 0) {
-   put_page(pages[0]);
+   put_user_page(pages[0]);
goto out;
}
 
@@ -489,7 +489,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct 
mthca_uar *uar,
 mthca_uarc_virt(dev, uar, i));
if (ret) {
pci_unmap_sg(dev->pdev, _tab->page[i].mem, 1, 
PCI_DMA_TODEVICE);
-   put_page(sg_page(_tab->page[i].mem));
+   put_user_page(sg_page(_tab->page[i].mem));
goto out;
}
 
@@ -555,7 +555,7 @@ void mthca_cleanup_user_db_tab(struct mthca_dev *dev, 
struct mthca_uar *uar,
if (db_tab->page[i].uvirt) {
mthca_UNMAP_ICM(dev, mthca_uarc_virt(dev, uar, i), 1);
pci_unmap_sg(dev->pdev, _tab->page[i].mem, 1, 
PCI_DMA_TODEVICE);
-   put_page(sg_page(_tab->page[i].mem));
+   put_user_page(sg_page(_tab->page[i].mem));
}
}
 
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c 
b/drivers/infiniband/hw/qib/qib_user_pages.c
index 16543d5e80c3..3f8fd42dd7fc 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -45,7 +45,7 @@ static void __qib_release_user_pages(struct page **p, size_t 
num_pages,
for (i = 0; i < num_pages; i++) {
if (dirty)
set_page_dirty_lock(p[i]);
-   put_page(p[i]);
+   put_user_page(p[i]);
}
 }
 
diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c 
b/drivers/infiniband/hw/qib/qib_user_sdma.c
index 926f3c8eba69..14f94d823907