Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-10 Thread Jan Kara
On Mon 09-07-18 13:00:49, Matthew Wilcox wrote:
> On Mon, Jul 09, 2018 at 09:47:40PM +0200, Jan Kara wrote:
> > On Mon 09-07-18 10:16:51, Matthew Wilcox wrote:
> > > > 2) What to do when some page is pinned but we need to do e.g.
> > > > clear_page_dirty_for_io(). After some more thinking I agree with you 
> > > > that
> > > > just blocking waiting for page to unpin will create deadlocks like:
> > > 
> > > Why are we trying to writeback a page that is pinned?  It's presumed to
> > > be continuously redirtied by its pinner.  We can't evict it.
> > 
> > So what should be a result of fsync(file), where some 'file' pages are
> > pinned e.g. by running direct IO? If we just skip those pages, we'll lie to
> > userspace that data was committed while it was not (and it's not only about
> > data that has landed in those pages via DMA, you can have first 1k of a page
> > modified by normal IO in parallel to DMA modifying second 1k chunk). If
> > fsync(2) returns error, it would be really unexpected by userspace and most
> > apps will just not handle that correctly. So what else can you do than
> > block?
> 
> I was thinking about writeback, and neglected the fsync case.

For memory cleaning writeback skipping is certainly the right thing to do
and that's what we plan to do.

> For fsync, we could copy the "current" contents of the page to a
> freshly-allocated page and write _that_ to disc?  As long as we redirty
> the real page after the pin is dropped, I think we're fine.

So for record, this technique is called "bouncing" in block layer
terminology and we do have a support for it there (see block/bounce.c). It
would need some tweaking (e.g. a bio flag to indicate that some page in a
bio needs bouncing if underlying storage requires stable pages) but that is
easy to do - we even had support for something similar some years back as
ext3 needed it to provide guarantee metadata buffer cannot be modified
while IO is running on it.

I was actually already considering using this some time ago but then
disregarded it as it seemed it won't buy us much compared to blocking /
skipping. But now seeing the troubles with blocking, using page bouncing
for situations where we cannot just skip page writeout looks indeed
appealing. Thanks for suggesting that!

As a side note I'm not 100% decided whether it is better to keep the
original page dirty all the time while it is pinned or not. I'm more
inclined to keeping it dirty all the time as it gives mm more accurate
information about the amount of really dirty pages, prevents reclaim of
filesystem's dirtiness / allocation tracking information (buffers or
whatever it has attached to the page), and generally avoids "surprising"
set_page_dirty() once page is unpinned (one less dirtying path for
filesystems to care about). OTOH it would make flusher threads always try
to writeback these pages only to skip them, fsync(2) would always write
them, etc...

Honza

-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-10 Thread Jan Kara
On Mon 09-07-18 13:00:49, Matthew Wilcox wrote:
> On Mon, Jul 09, 2018 at 09:47:40PM +0200, Jan Kara wrote:
> > On Mon 09-07-18 10:16:51, Matthew Wilcox wrote:
> > > > 2) What to do when some page is pinned but we need to do e.g.
> > > > clear_page_dirty_for_io(). After some more thinking I agree with you 
> > > > that
> > > > just blocking waiting for page to unpin will create deadlocks like:
> > > 
> > > Why are we trying to writeback a page that is pinned?  It's presumed to
> > > be continuously redirtied by its pinner.  We can't evict it.
> > 
> > So what should be a result of fsync(file), where some 'file' pages are
> > pinned e.g. by running direct IO? If we just skip those pages, we'll lie to
> > userspace that data was committed while it was not (and it's not only about
> > data that has landed in those pages via DMA, you can have first 1k of a page
> > modified by normal IO in parallel to DMA modifying second 1k chunk). If
> > fsync(2) returns error, it would be really unexpected by userspace and most
> > apps will just not handle that correctly. So what else can you do than
> > block?
> 
> I was thinking about writeback, and neglected the fsync case.

For memory cleaning writeback skipping is certainly the right thing to do
and that's what we plan to do.

> For fsync, we could copy the "current" contents of the page to a
> freshly-allocated page and write _that_ to disc?  As long as we redirty
> the real page after the pin is dropped, I think we're fine.

So for record, this technique is called "bouncing" in block layer
terminology and we do have a support for it there (see block/bounce.c). It
would need some tweaking (e.g. a bio flag to indicate that some page in a
bio needs bouncing if underlying storage requires stable pages) but that is
easy to do - we even had support for something similar some years back as
ext3 needed it to provide guarantee metadata buffer cannot be modified
while IO is running on it.

I was actually already considering using this some time ago but then
disregarded it as it seemed it won't buy us much compared to blocking /
skipping. But now seeing the troubles with blocking, using page bouncing
for situations where we cannot just skip page writeout looks indeed
appealing. Thanks for suggesting that!

As a side note I'm not 100% decided whether it is better to keep the
original page dirty all the time while it is pinned or not. I'm more
inclined to keeping it dirty all the time as it gives mm more accurate
information about the amount of really dirty pages, prevents reclaim of
filesystem's dirtiness / allocation tracking information (buffers or
whatever it has attached to the page), and generally avoids "surprising"
set_page_dirty() once page is unpinned (one less dirtying path for
filesystems to care about). OTOH it would make flusher threads always try
to writeback these pages only to skip them, fsync(2) would always write
them, etc...

Honza

-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-10 Thread Jan Kara
On Mon 09-07-18 13:56:57, Jason Gunthorpe wrote:
> On Mon, Jul 09, 2018 at 09:47:40PM +0200, Jan Kara wrote:
> > On Mon 09-07-18 10:16:51, Matthew Wilcox wrote:
> > > On Mon, Jul 09, 2018 at 06:08:06PM +0200, Jan Kara wrote:
> > > > On Mon 09-07-18 18:49:37, Nicholas Piggin wrote:
> > > > > The problem with blocking in clear_page_dirty_for_io is that the fs is
> > > > > holding the page lock (or locks) and possibly others too. If you
> > > > > expect to have a bunch of long term references hanging around on the
> > > > > page, then there will be hangs and deadlocks everywhere. And if you do
> > > > > not have such log term references, then page lock (or some similar 
> > > > > lock
> > > > > bit) for the duration of the DMA should be about enough?
> > > > 
> > > > There are two separate questions:
> > > > 
> > > > 1) How to identify pages pinned for DMA? We have no bit in struct page 
> > > > to
> > > > use and we cannot reuse page lock as that immediately creates lock
> > > > inversions e.g. in direct IO code (which could be fixed but then good 
> > > > luck
> > > > with auditing all the other GUP users). Matthew had an idea and John
> > > > implemented it based on removing page from LRU and using that space in
> > > > struct page. So we at least have a way to identify pages that are pinned
> > > > and can track their pin count.
> > > > 
> > > > 2) What to do when some page is pinned but we need to do e.g.
> > > > clear_page_dirty_for_io(). After some more thinking I agree with you 
> > > > that
> > > > just blocking waiting for page to unpin will create deadlocks like:
> > > 
> > > Why are we trying to writeback a page that is pinned?  It's presumed to
> > > be continuously redirtied by its pinner.  We can't evict it.
> > 
> > So what should be a result of fsync(file), where some 'file' pages are
> > pinned e.g. by running direct IO? If we just skip those pages, we'll lie to
> > userspace that data was committed while it was not (and it's not only about
> > data that has landed in those pages via DMA, you can have first 1k of a page
> > modified by normal IO in parallel to DMA modifying second 1k chunk). If
> > fsync(2) returns error, it would be really unexpected by userspace and most
> > apps will just not handle that correctly. So what else can you do than
> > block?
> 
> I think as a userspace I would expect the 'current content' to be
> flushed without waiting..

Yes but the problem is we cannot generally write out a page whose contents
is possibly changing (e.g. RAID5 checksums would then be wrong). But maybe
using bounce pages (and keeping original page still dirty) in such case would
be worth it - originally I thought using bounce pages would not bring us
much but now seeing problems with blocking in more detail maybe they are
worth the trouble after all...

> If you block fsync() then anyone using a RDMA MR with it will just
> dead lock. What happens if two processes open the same file and
> one makes a MR and the other calls fsync()? Sounds bad.

Yes, that's one of the reasons why we were discussing revoke mechanisms for
long term pins. But with bounce pages we could possibly avoid that (except
for cases like DAX + truncate where it's really unavoidable but there it's
a new functionality so mandating revoke and returning error otherwise is
fine I guess).

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-10 Thread Jan Kara
On Mon 09-07-18 13:56:57, Jason Gunthorpe wrote:
> On Mon, Jul 09, 2018 at 09:47:40PM +0200, Jan Kara wrote:
> > On Mon 09-07-18 10:16:51, Matthew Wilcox wrote:
> > > On Mon, Jul 09, 2018 at 06:08:06PM +0200, Jan Kara wrote:
> > > > On Mon 09-07-18 18:49:37, Nicholas Piggin wrote:
> > > > > The problem with blocking in clear_page_dirty_for_io is that the fs is
> > > > > holding the page lock (or locks) and possibly others too. If you
> > > > > expect to have a bunch of long term references hanging around on the
> > > > > page, then there will be hangs and deadlocks everywhere. And if you do
> > > > > not have such log term references, then page lock (or some similar 
> > > > > lock
> > > > > bit) for the duration of the DMA should be about enough?
> > > > 
> > > > There are two separate questions:
> > > > 
> > > > 1) How to identify pages pinned for DMA? We have no bit in struct page 
> > > > to
> > > > use and we cannot reuse page lock as that immediately creates lock
> > > > inversions e.g. in direct IO code (which could be fixed but then good 
> > > > luck
> > > > with auditing all the other GUP users). Matthew had an idea and John
> > > > implemented it based on removing page from LRU and using that space in
> > > > struct page. So we at least have a way to identify pages that are pinned
> > > > and can track their pin count.
> > > > 
> > > > 2) What to do when some page is pinned but we need to do e.g.
> > > > clear_page_dirty_for_io(). After some more thinking I agree with you 
> > > > that
> > > > just blocking waiting for page to unpin will create deadlocks like:
> > > 
> > > Why are we trying to writeback a page that is pinned?  It's presumed to
> > > be continuously redirtied by its pinner.  We can't evict it.
> > 
> > So what should be a result of fsync(file), where some 'file' pages are
> > pinned e.g. by running direct IO? If we just skip those pages, we'll lie to
> > userspace that data was committed while it was not (and it's not only about
> > data that has landed in those pages via DMA, you can have first 1k of a page
> > modified by normal IO in parallel to DMA modifying second 1k chunk). If
> > fsync(2) returns error, it would be really unexpected by userspace and most
> > apps will just not handle that correctly. So what else can you do than
> > block?
> 
> I think as a userspace I would expect the 'current content' to be
> flushed without waiting..

Yes but the problem is we cannot generally write out a page whose contents
is possibly changing (e.g. RAID5 checksums would then be wrong). But maybe
using bounce pages (and keeping original page still dirty) in such case would
be worth it - originally I thought using bounce pages would not bring us
much but now seeing problems with blocking in more detail maybe they are
worth the trouble after all...

> If you block fsync() then anyone using a RDMA MR with it will just
> dead lock. What happens if two processes open the same file and
> one makes a MR and the other calls fsync()? Sounds bad.

Yes, that's one of the reasons why we were discussing revoke mechanisms for
long term pins. But with bounce pages we could possibly avoid that (except
for cases like DAX + truncate where it's really unavoidable but there it's
a new functionality so mandating revoke and returning error otherwise is
fine I guess).

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Matthew Wilcox
On Mon, Jul 09, 2018 at 09:47:40PM +0200, Jan Kara wrote:
> On Mon 09-07-18 10:16:51, Matthew Wilcox wrote:
> > > 2) What to do when some page is pinned but we need to do e.g.
> > > clear_page_dirty_for_io(). After some more thinking I agree with you that
> > > just blocking waiting for page to unpin will create deadlocks like:
> > 
> > Why are we trying to writeback a page that is pinned?  It's presumed to
> > be continuously redirtied by its pinner.  We can't evict it.
> 
> So what should be a result of fsync(file), where some 'file' pages are
> pinned e.g. by running direct IO? If we just skip those pages, we'll lie to
> userspace that data was committed while it was not (and it's not only about
> data that has landed in those pages via DMA, you can have first 1k of a page
> modified by normal IO in parallel to DMA modifying second 1k chunk). If
> fsync(2) returns error, it would be really unexpected by userspace and most
> apps will just not handle that correctly. So what else can you do than
> block?

I was thinking about writeback, and neglected the fsync case.  For fsync,
we could copy the "current" contents of the page to a freshly-allocated
page and write _that_ to disc?  As long as we redirty the real page after
the pin is dropped, I think we're fine.


Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Matthew Wilcox
On Mon, Jul 09, 2018 at 09:47:40PM +0200, Jan Kara wrote:
> On Mon 09-07-18 10:16:51, Matthew Wilcox wrote:
> > > 2) What to do when some page is pinned but we need to do e.g.
> > > clear_page_dirty_for_io(). After some more thinking I agree with you that
> > > just blocking waiting for page to unpin will create deadlocks like:
> > 
> > Why are we trying to writeback a page that is pinned?  It's presumed to
> > be continuously redirtied by its pinner.  We can't evict it.
> 
> So what should be a result of fsync(file), where some 'file' pages are
> pinned e.g. by running direct IO? If we just skip those pages, we'll lie to
> userspace that data was committed while it was not (and it's not only about
> data that has landed in those pages via DMA, you can have first 1k of a page
> modified by normal IO in parallel to DMA modifying second 1k chunk). If
> fsync(2) returns error, it would be really unexpected by userspace and most
> apps will just not handle that correctly. So what else can you do than
> block?

I was thinking about writeback, and neglected the fsync case.  For fsync,
we could copy the "current" contents of the page to a freshly-allocated
page and write _that_ to disc?  As long as we redirty the real page after
the pin is dropped, I think we're fine.


Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Jason Gunthorpe
On Mon, Jul 09, 2018 at 09:47:40PM +0200, Jan Kara wrote:
> On Mon 09-07-18 10:16:51, Matthew Wilcox wrote:
> > On Mon, Jul 09, 2018 at 06:08:06PM +0200, Jan Kara wrote:
> > > On Mon 09-07-18 18:49:37, Nicholas Piggin wrote:
> > > > The problem with blocking in clear_page_dirty_for_io is that the fs is
> > > > holding the page lock (or locks) and possibly others too. If you
> > > > expect to have a bunch of long term references hanging around on the
> > > > page, then there will be hangs and deadlocks everywhere. And if you do
> > > > not have such log term references, then page lock (or some similar lock
> > > > bit) for the duration of the DMA should be about enough?
> > > 
> > > There are two separate questions:
> > > 
> > > 1) How to identify pages pinned for DMA? We have no bit in struct page to
> > > use and we cannot reuse page lock as that immediately creates lock
> > > inversions e.g. in direct IO code (which could be fixed but then good luck
> > > with auditing all the other GUP users). Matthew had an idea and John
> > > implemented it based on removing page from LRU and using that space in
> > > struct page. So we at least have a way to identify pages that are pinned
> > > and can track their pin count.
> > > 
> > > 2) What to do when some page is pinned but we need to do e.g.
> > > clear_page_dirty_for_io(). After some more thinking I agree with you that
> > > just blocking waiting for page to unpin will create deadlocks like:
> > 
> > Why are we trying to writeback a page that is pinned?  It's presumed to
> > be continuously redirtied by its pinner.  We can't evict it.
> 
> So what should be a result of fsync(file), where some 'file' pages are
> pinned e.g. by running direct IO? If we just skip those pages, we'll lie to
> userspace that data was committed while it was not (and it's not only about
> data that has landed in those pages via DMA, you can have first 1k of a page
> modified by normal IO in parallel to DMA modifying second 1k chunk). If
> fsync(2) returns error, it would be really unexpected by userspace and most
> apps will just not handle that correctly. So what else can you do than
> block?

I think as a userspace I would expect the 'current content' to be
flushed without waiting..

If you block fsync() then anyone using a RDMA MR with it will just
dead lock. What happens if two processes open the same file and
one makes a MR and the other calls fsync()? Sounds bad.

Jason


Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Jason Gunthorpe
On Mon, Jul 09, 2018 at 09:47:40PM +0200, Jan Kara wrote:
> On Mon 09-07-18 10:16:51, Matthew Wilcox wrote:
> > On Mon, Jul 09, 2018 at 06:08:06PM +0200, Jan Kara wrote:
> > > On Mon 09-07-18 18:49:37, Nicholas Piggin wrote:
> > > > The problem with blocking in clear_page_dirty_for_io is that the fs is
> > > > holding the page lock (or locks) and possibly others too. If you
> > > > expect to have a bunch of long term references hanging around on the
> > > > page, then there will be hangs and deadlocks everywhere. And if you do
> > > > not have such log term references, then page lock (or some similar lock
> > > > bit) for the duration of the DMA should be about enough?
> > > 
> > > There are two separate questions:
> > > 
> > > 1) How to identify pages pinned for DMA? We have no bit in struct page to
> > > use and we cannot reuse page lock as that immediately creates lock
> > > inversions e.g. in direct IO code (which could be fixed but then good luck
> > > with auditing all the other GUP users). Matthew had an idea and John
> > > implemented it based on removing page from LRU and using that space in
> > > struct page. So we at least have a way to identify pages that are pinned
> > > and can track their pin count.
> > > 
> > > 2) What to do when some page is pinned but we need to do e.g.
> > > clear_page_dirty_for_io(). After some more thinking I agree with you that
> > > just blocking waiting for page to unpin will create deadlocks like:
> > 
> > Why are we trying to writeback a page that is pinned?  It's presumed to
> > be continuously redirtied by its pinner.  We can't evict it.
> 
> So what should be a result of fsync(file), where some 'file' pages are
> pinned e.g. by running direct IO? If we just skip those pages, we'll lie to
> userspace that data was committed while it was not (and it's not only about
> data that has landed in those pages via DMA, you can have first 1k of a page
> modified by normal IO in parallel to DMA modifying second 1k chunk). If
> fsync(2) returns error, it would be really unexpected by userspace and most
> apps will just not handle that correctly. So what else can you do than
> block?

I think as a userspace I would expect the 'current content' to be
flushed without waiting..

If you block fsync() then anyone using a RDMA MR with it will just
dead lock. What happens if two processes open the same file and
one makes a MR and the other calls fsync()? Sounds bad.

Jason


Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Jan Kara
On Mon 09-07-18 10:16:51, Matthew Wilcox wrote:
> On Mon, Jul 09, 2018 at 06:08:06PM +0200, Jan Kara wrote:
> > On Mon 09-07-18 18:49:37, Nicholas Piggin wrote:
> > > The problem with blocking in clear_page_dirty_for_io is that the fs is
> > > holding the page lock (or locks) and possibly others too. If you
> > > expect to have a bunch of long term references hanging around on the
> > > page, then there will be hangs and deadlocks everywhere. And if you do
> > > not have such log term references, then page lock (or some similar lock
> > > bit) for the duration of the DMA should be about enough?
> > 
> > There are two separate questions:
> > 
> > 1) How to identify pages pinned for DMA? We have no bit in struct page to
> > use and we cannot reuse page lock as that immediately creates lock
> > inversions e.g. in direct IO code (which could be fixed but then good luck
> > with auditing all the other GUP users). Matthew had an idea and John
> > implemented it based on removing page from LRU and using that space in
> > struct page. So we at least have a way to identify pages that are pinned
> > and can track their pin count.
> > 
> > 2) What to do when some page is pinned but we need to do e.g.
> > clear_page_dirty_for_io(). After some more thinking I agree with you that
> > just blocking waiting for page to unpin will create deadlocks like:
> 
> Why are we trying to writeback a page that is pinned?  It's presumed to
> be continuously redirtied by its pinner.  We can't evict it.

So what should be a result of fsync(file), where some 'file' pages are
pinned e.g. by running direct IO? If we just skip those pages, we'll lie to
userspace that data was committed while it was not (and it's not only about
data that has landed in those pages via DMA, you can have first 1k of a page
modified by normal IO in parallel to DMA modifying second 1k chunk). If
fsync(2) returns error, it would be really unexpected by userspace and most
apps will just not handle that correctly. So what else can you do than
block?

> > ext4_writepages()   ext4_direct_IO_write()
> >   __blockdev_direct_IO()
> > iov_iter_get_pages()
> >   - pins page
> >   handle = ext4_journal_start_with_reserve(inode, ...)
> > - starts transaction
> >   ...
> > lock_page(page)
> > mpage_submit_page()
> >   clear_page_dirty_for_io(page) -> blocks on pin
> 
> I don't think it should block.  It should fail.

See above...

Honza

-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Jan Kara
On Mon 09-07-18 10:16:51, Matthew Wilcox wrote:
> On Mon, Jul 09, 2018 at 06:08:06PM +0200, Jan Kara wrote:
> > On Mon 09-07-18 18:49:37, Nicholas Piggin wrote:
> > > The problem with blocking in clear_page_dirty_for_io is that the fs is
> > > holding the page lock (or locks) and possibly others too. If you
> > > expect to have a bunch of long term references hanging around on the
> > > page, then there will be hangs and deadlocks everywhere. And if you do
> > > not have such log term references, then page lock (or some similar lock
> > > bit) for the duration of the DMA should be about enough?
> > 
> > There are two separate questions:
> > 
> > 1) How to identify pages pinned for DMA? We have no bit in struct page to
> > use and we cannot reuse page lock as that immediately creates lock
> > inversions e.g. in direct IO code (which could be fixed but then good luck
> > with auditing all the other GUP users). Matthew had an idea and John
> > implemented it based on removing page from LRU and using that space in
> > struct page. So we at least have a way to identify pages that are pinned
> > and can track their pin count.
> > 
> > 2) What to do when some page is pinned but we need to do e.g.
> > clear_page_dirty_for_io(). After some more thinking I agree with you that
> > just blocking waiting for page to unpin will create deadlocks like:
> 
> Why are we trying to writeback a page that is pinned?  It's presumed to
> be continuously redirtied by its pinner.  We can't evict it.

So what should be a result of fsync(file), where some 'file' pages are
pinned e.g. by running direct IO? If we just skip those pages, we'll lie to
userspace that data was committed while it was not (and it's not only about
data that has landed in those pages via DMA, you can have first 1k of a page
modified by normal IO in parallel to DMA modifying second 1k chunk). If
fsync(2) returns error, it would be really unexpected by userspace and most
apps will just not handle that correctly. So what else can you do than
block?

> > ext4_writepages()   ext4_direct_IO_write()
> >   __blockdev_direct_IO()
> > iov_iter_get_pages()
> >   - pins page
> >   handle = ext4_journal_start_with_reserve(inode, ...)
> > - starts transaction
> >   ...
> > lock_page(page)
> > mpage_submit_page()
> >   clear_page_dirty_for_io(page) -> blocks on pin
> 
> I don't think it should block.  It should fail.

See above...

Honza

-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Matthew Wilcox
On Mon, Jul 09, 2018 at 06:08:06PM +0200, Jan Kara wrote:
> On Mon 09-07-18 18:49:37, Nicholas Piggin wrote:
> > The problem with blocking in clear_page_dirty_for_io is that the fs is
> > holding the page lock (or locks) and possibly others too. If you
> > expect to have a bunch of long term references hanging around on the
> > page, then there will be hangs and deadlocks everywhere. And if you do
> > not have such log term references, then page lock (or some similar lock
> > bit) for the duration of the DMA should be about enough?
> 
> There are two separate questions:
> 
> 1) How to identify pages pinned for DMA? We have no bit in struct page to
> use and we cannot reuse page lock as that immediately creates lock
> inversions e.g. in direct IO code (which could be fixed but then good luck
> with auditing all the other GUP users). Matthew had an idea and John
> implemented it based on removing page from LRU and using that space in
> struct page. So we at least have a way to identify pages that are pinned
> and can track their pin count.
> 
> 2) What to do when some page is pinned but we need to do e.g.
> clear_page_dirty_for_io(). After some more thinking I agree with you that
> just blocking waiting for page to unpin will create deadlocks like:

Why are we trying to writeback a page that is pinned?  It's presumed to
be continuously redirtied by its pinner.  We can't evict it.

> ext4_writepages() ext4_direct_IO_write()
> __blockdev_direct_IO()
>   iov_iter_get_pages()
> - pins page
>   handle = ext4_journal_start_with_reserve(inode, ...)
> - starts transaction
>   ...
> lock_page(page)
> mpage_submit_page()
>   clear_page_dirty_for_io(page) -> blocks on pin

I don't think it should block.  It should fail.



Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Matthew Wilcox
On Mon, Jul 09, 2018 at 06:08:06PM +0200, Jan Kara wrote:
> On Mon 09-07-18 18:49:37, Nicholas Piggin wrote:
> > The problem with blocking in clear_page_dirty_for_io is that the fs is
> > holding the page lock (or locks) and possibly others too. If you
> > expect to have a bunch of long term references hanging around on the
> > page, then there will be hangs and deadlocks everywhere. And if you do
> > not have such log term references, then page lock (or some similar lock
> > bit) for the duration of the DMA should be about enough?
> 
> There are two separate questions:
> 
> 1) How to identify pages pinned for DMA? We have no bit in struct page to
> use and we cannot reuse page lock as that immediately creates lock
> inversions e.g. in direct IO code (which could be fixed but then good luck
> with auditing all the other GUP users). Matthew had an idea and John
> implemented it based on removing page from LRU and using that space in
> struct page. So we at least have a way to identify pages that are pinned
> and can track their pin count.
> 
> 2) What to do when some page is pinned but we need to do e.g.
> clear_page_dirty_for_io(). After some more thinking I agree with you that
> just blocking waiting for page to unpin will create deadlocks like:

Why are we trying to writeback a page that is pinned?  It's presumed to
be continuously redirtied by its pinner.  We can't evict it.

> ext4_writepages() ext4_direct_IO_write()
> __blockdev_direct_IO()
>   iov_iter_get_pages()
> - pins page
>   handle = ext4_journal_start_with_reserve(inode, ...)
> - starts transaction
>   ...
> lock_page(page)
> mpage_submit_page()
>   clear_page_dirty_for_io(page) -> blocks on pin

I don't think it should block.  It should fail.



Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Jan Kara
Hi,

On Mon 09-07-18 01:05:52, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> With respect to tracking get_user_pages*() pages with page->dma_pinned*
> fields [1], I spent a few days retrofitting most of the get_user_pages*()
> call sites, by adding calls to a new put_user_page() function, in place
> of put_page(), where appropriate. This will work, but it's a large effort.
> 
> Design note: I didn't see anything that hinted at a way to fix this
> problem, without actually changing all of the get_user_pages*() call sites,
> so I think it's reasonable to start with that.

Agreed.

> Anyway, it's still incomplete, but because this is a large, tree-wide
> change (that will take some time and testing), I'd like to propose a plan,
> before spamming zillions of people with put_user_page() conversion patches.
> So I picked out the first two patches to show where this is going.
> 
> Proposed steps:
> 
> Step 1:
> 
> Start with the patches here, then continue with...dozens more.
> This will eventually convert all of the call sites to use put_user_page().
> This is easy in some places, but complex in others, such as:
> 
> -- drivers/gpu/drm/amd
> -- bio
> -- fuse
> -- cifs
> -- anything from:
>git grep  iov_iter_get_pages | cut -f1 -d ':' | sort | uniq
> 
> The easy ones can be grouped into a single patchset, perhaps, and the
> complex ones probably each need a patchset, in order to get the in-depth
> review they'll need.

Agreed.

> Furthermore, some of these areas I hope to attract some help on, once
> this starts going.
> 
> Step 2:
> 
> In parallel, tidy up the core patchset that was discussed in [1], (version
> 2 has already been reviewed, so I know what to do), and get it perfected
> and reviewed. Don't apply it until step 1 is all done, though.
>
> Step 3:
> 
> Activate refcounting of dma-pinned pages (essentially, patch #5, which is
> [1]), but don't use it yet. Place a few WARN_ON_ONCE calls to start
> mopping up any missed call sites.
> 
> Step 4:
> 
> After some soak time, actually connect it up (patch #6 of [1]) and start
> taking action based on the new page->dma_pinned* fields.
> 
> [1] https://www.spinics.net/lists/linux-mm/msg156409.html
> 
>   or, the same thread on LKML if it's working for you:
> 
> https://lkml.org/lkml/2018/7/4/368

Yeah, but as Nick pointed out we have some more work to do in step 4 to
avoid deadlocks. Still there's a lot of work to do on which the direction
to progress is clear :).

Honza

-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Jan Kara
Hi,

On Mon 09-07-18 01:05:52, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> With respect to tracking get_user_pages*() pages with page->dma_pinned*
> fields [1], I spent a few days retrofitting most of the get_user_pages*()
> call sites, by adding calls to a new put_user_page() function, in place
> of put_page(), where appropriate. This will work, but it's a large effort.
> 
> Design note: I didn't see anything that hinted at a way to fix this
> problem, without actually changing all of the get_user_pages*() call sites,
> so I think it's reasonable to start with that.

Agreed.

> Anyway, it's still incomplete, but because this is a large, tree-wide
> change (that will take some time and testing), I'd like to propose a plan,
> before spamming zillions of people with put_user_page() conversion patches.
> So I picked out the first two patches to show where this is going.
> 
> Proposed steps:
> 
> Step 1:
> 
> Start with the patches here, then continue with...dozens more.
> This will eventually convert all of the call sites to use put_user_page().
> This is easy in some places, but complex in others, such as:
> 
> -- drivers/gpu/drm/amd
> -- bio
> -- fuse
> -- cifs
> -- anything from:
>git grep  iov_iter_get_pages | cut -f1 -d ':' | sort | uniq
> 
> The easy ones can be grouped into a single patchset, perhaps, and the
> complex ones probably each need a patchset, in order to get the in-depth
> review they'll need.

Agreed.

> Furthermore, some of these areas I hope to attract some help on, once
> this starts going.
> 
> Step 2:
> 
> In parallel, tidy up the core patchset that was discussed in [1], (version
> 2 has already been reviewed, so I know what to do), and get it perfected
> and reviewed. Don't apply it until step 1 is all done, though.
>
> Step 3:
> 
> Activate refcounting of dma-pinned pages (essentially, patch #5, which is
> [1]), but don't use it yet. Place a few WARN_ON_ONCE calls to start
> mopping up any missed call sites.
> 
> Step 4:
> 
> After some soak time, actually connect it up (patch #6 of [1]) and start
> taking action based on the new page->dma_pinned* fields.
> 
> [1] https://www.spinics.net/lists/linux-mm/msg156409.html
> 
>   or, the same thread on LKML if it's working for you:
> 
> https://lkml.org/lkml/2018/7/4/368

Yeah, but as Nick pointed out we have some more work to do in step 4 to
avoid deadlocks. Still there's a lot of work to do on which the direction
to progress is clear :).

Honza

-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Jan Kara
On Mon 09-07-18 18:49:37, Nicholas Piggin wrote:
> On Mon,  9 Jul 2018 01:05:52 -0700
> john.hubb...@gmail.com wrote:
> 
> > From: John Hubbard 
> > 
> > Hi,
> > 
> > With respect to tracking get_user_pages*() pages with page->dma_pinned*
> > fields [1], I spent a few days retrofitting most of the get_user_pages*()
> > call sites, by adding calls to a new put_user_page() function, in place
> > of put_page(), where appropriate. This will work, but it's a large effort.
> > 
> > Design note: I didn't see anything that hinted at a way to fix this
> > problem, without actually changing all of the get_user_pages*() call sites,
> > so I think it's reasonable to start with that.
> > 
> > Anyway, it's still incomplete, but because this is a large, tree-wide
> > change (that will take some time and testing), I'd like to propose a plan,
> > before spamming zillions of people with put_user_page() conversion patches.
> > So I picked out the first two patches to show where this is going.
> > 
> > Proposed steps:
> > 
> > Step 1:
> > 
> > Start with the patches here, then continue with...dozens more.
> > This will eventually convert all of the call sites to use put_user_page().
> > This is easy in some places, but complex in others, such as:
> > 
> > -- drivers/gpu/drm/amd
> > -- bio
> > -- fuse
> > -- cifs
> > -- anything from:
> >git grep  iov_iter_get_pages | cut -f1 -d ':' | sort | uniq
> > 
> > The easy ones can be grouped into a single patchset, perhaps, and the
> > complex ones probably each need a patchset, in order to get the in-depth
> > review they'll need.
> > 
> > Furthermore, some of these areas I hope to attract some help on, once
> > this starts going.
> > 
> > Step 2:
> > 
> > In parallel, tidy up the core patchset that was discussed in [1], (version
> > 2 has already been reviewed, so I know what to do), and get it perfected
> > and reviewed. Don't apply it until step 1 is all done, though.
> > 
> > Step 3:
> > 
> > Activate refcounting of dma-pinned pages (essentially, patch #5, which is
> > [1]), but don't use it yet. Place a few WARN_ON_ONCE calls to start
> > mopping up any missed call sites.
> > 
> > Step 4:
> > 
> > After some soak time, actually connect it up (patch #6 of [1]) and start
> > taking action based on the new page->dma_pinned* fields.
> 
> You can use my decade old patch!
> 
> https://lkml.org/lkml/2009/2/17/113

The problem has a longer history than I thought ;)

> The problem with blocking in clear_page_dirty_for_io is that the fs is
> holding the page lock (or locks) and possibly others too. If you
> expect to have a bunch of long term references hanging around on the
> page, then there will be hangs and deadlocks everywhere. And if you do
> not have such log term references, then page lock (or some similar lock
> bit) for the duration of the DMA should be about enough?

There are two separate questions:

1) How to identify pages pinned for DMA? We have no bit in struct page to
use and we cannot reuse page lock as that immediately creates lock
inversions e.g. in direct IO code (which could be fixed but then good luck
with auditing all the other GUP users). Matthew had an idea and John
implemented it based on removing page from LRU and using that space in
struct page. So we at least have a way to identify pages that are pinned
and can track their pin count.

2) What to do when some page is pinned but we need to do e.g.
clear_page_dirty_for_io(). After some more thinking I agree with you that
just blocking waiting for page to unpin will create deadlocks like:

ext4_writepages()   ext4_direct_IO_write()
  __blockdev_direct_IO()
iov_iter_get_pages()
  - pins page
  handle = ext4_journal_start_with_reserve(inode, ...)
- starts transaction
  ...
lock_page(page)
mpage_submit_page()
  clear_page_dirty_for_io(page) -> blocks on pin


ext4_dio_get_block_unwritten_sync()
  - called to allocate
blocks for DIO
ext4_journal_start()
  - may block and wait
for transaction
started by
ext4_writepages() to
finish

> I think it has to be more fundamental to the filesystem. Filesystem
> would get callbacks to register such long term dirtying on its files.
> Then it can do locking, resource allocation, -ENOTSUPP, etc.

Well, direct IO would not classify as long term dirtying I guess but still

Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Jan Kara
On Mon 09-07-18 18:49:37, Nicholas Piggin wrote:
> On Mon,  9 Jul 2018 01:05:52 -0700
> john.hubb...@gmail.com wrote:
> 
> > From: John Hubbard 
> > 
> > Hi,
> > 
> > With respect to tracking get_user_pages*() pages with page->dma_pinned*
> > fields [1], I spent a few days retrofitting most of the get_user_pages*()
> > call sites, by adding calls to a new put_user_page() function, in place
> > of put_page(), where appropriate. This will work, but it's a large effort.
> > 
> > Design note: I didn't see anything that hinted at a way to fix this
> > problem, without actually changing all of the get_user_pages*() call sites,
> > so I think it's reasonable to start with that.
> > 
> > Anyway, it's still incomplete, but because this is a large, tree-wide
> > change (that will take some time and testing), I'd like to propose a plan,
> > before spamming zillions of people with put_user_page() conversion patches.
> > So I picked out the first two patches to show where this is going.
> > 
> > Proposed steps:
> > 
> > Step 1:
> > 
> > Start with the patches here, then continue with...dozens more.
> > This will eventually convert all of the call sites to use put_user_page().
> > This is easy in some places, but complex in others, such as:
> > 
> > -- drivers/gpu/drm/amd
> > -- bio
> > -- fuse
> > -- cifs
> > -- anything from:
> >git grep  iov_iter_get_pages | cut -f1 -d ':' | sort | uniq
> > 
> > The easy ones can be grouped into a single patchset, perhaps, and the
> > complex ones probably each need a patchset, in order to get the in-depth
> > review they'll need.
> > 
> > Furthermore, some of these areas I hope to attract some help on, once
> > this starts going.
> > 
> > Step 2:
> > 
> > In parallel, tidy up the core patchset that was discussed in [1], (version
> > 2 has already been reviewed, so I know what to do), and get it perfected
> > and reviewed. Don't apply it until step 1 is all done, though.
> > 
> > Step 3:
> > 
> > Activate refcounting of dma-pinned pages (essentially, patch #5, which is
> > [1]), but don't use it yet. Place a few WARN_ON_ONCE calls to start
> > mopping up any missed call sites.
> > 
> > Step 4:
> > 
> > After some soak time, actually connect it up (patch #6 of [1]) and start
> > taking action based on the new page->dma_pinned* fields.
> 
> You can use my decade old patch!
> 
> https://lkml.org/lkml/2009/2/17/113

The problem has a longer history than I thought ;)

> The problem with blocking in clear_page_dirty_for_io is that the fs is
> holding the page lock (or locks) and possibly others too. If you
> expect to have a bunch of long term references hanging around on the
> page, then there will be hangs and deadlocks everywhere. And if you do
> not have such log term references, then page lock (or some similar lock
> bit) for the duration of the DMA should be about enough?

There are two separate questions:

1) How to identify pages pinned for DMA? We have no bit in struct page to
use and we cannot reuse page lock as that immediately creates lock
inversions e.g. in direct IO code (which could be fixed but then good luck
with auditing all the other GUP users). Matthew had an idea and John
implemented it based on removing page from LRU and using that space in
struct page. So we at least have a way to identify pages that are pinned
and can track their pin count.

2) What to do when some page is pinned but we need to do e.g.
clear_page_dirty_for_io(). After some more thinking I agree with you that
just blocking waiting for page to unpin will create deadlocks like:

ext4_writepages()   ext4_direct_IO_write()
  __blockdev_direct_IO()
iov_iter_get_pages()
  - pins page
  handle = ext4_journal_start_with_reserve(inode, ...)
- starts transaction
  ...
lock_page(page)
mpage_submit_page()
  clear_page_dirty_for_io(page) -> blocks on pin


ext4_dio_get_block_unwritten_sync()
  - called to allocate
blocks for DIO
ext4_journal_start()
  - may block and wait
for transaction
started by
ext4_writepages() to
finish

> I think it has to be more fundamental to the filesystem. Filesystem
> would get callbacks to register such long term dirtying on its files.
> Then it can do locking, resource allocation, -ENOTSUPP, etc.

Well, direct IO would not classify as long term dirtying I guess but still

Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Nicholas Piggin
On Mon,  9 Jul 2018 01:05:52 -0700
john.hubb...@gmail.com wrote:

> From: John Hubbard 
> 
> Hi,
> 
> With respect to tracking get_user_pages*() pages with page->dma_pinned*
> fields [1], I spent a few days retrofitting most of the get_user_pages*()
> call sites, by adding calls to a new put_user_page() function, in place
> of put_page(), where appropriate. This will work, but it's a large effort.
> 
> Design note: I didn't see anything that hinted at a way to fix this
> problem, without actually changing all of the get_user_pages*() call sites,
> so I think it's reasonable to start with that.
> 
> Anyway, it's still incomplete, but because this is a large, tree-wide
> change (that will take some time and testing), I'd like to propose a plan,
> before spamming zillions of people with put_user_page() conversion patches.
> So I picked out the first two patches to show where this is going.
> 
> Proposed steps:
> 
> Step 1:
> 
> Start with the patches here, then continue with...dozens more.
> This will eventually convert all of the call sites to use put_user_page().
> This is easy in some places, but complex in others, such as:
> 
> -- drivers/gpu/drm/amd
> -- bio
> -- fuse
> -- cifs
> -- anything from:
>git grep  iov_iter_get_pages | cut -f1 -d ':' | sort | uniq
> 
> The easy ones can be grouped into a single patchset, perhaps, and the
> complex ones probably each need a patchset, in order to get the in-depth
> review they'll need.
> 
> Furthermore, some of these areas I hope to attract some help on, once
> this starts going.
> 
> Step 2:
> 
> In parallel, tidy up the core patchset that was discussed in [1], (version
> 2 has already been reviewed, so I know what to do), and get it perfected
> and reviewed. Don't apply it until step 1 is all done, though.
> 
> Step 3:
> 
> Activate refcounting of dma-pinned pages (essentially, patch #5, which is
> [1]), but don't use it yet. Place a few WARN_ON_ONCE calls to start
> mopping up any missed call sites.
> 
> Step 4:
> 
> After some soak time, actually connect it up (patch #6 of [1]) and start
> taking action based on the new page->dma_pinned* fields.

You can use my decade old patch!

https://lkml.org/lkml/2009/2/17/113

The problem with blocking in clear_page_dirty_for_io is that the fs is
holding the page lock (or locks) and possibly others too. If you
expect to have a bunch of long term references hanging around on the
page, then there will be hangs and deadlocks everywhere. And if you do
not have such log term references, then page lock (or some similar lock
bit) for the duration of the DMA should be about enough?

I think it has to be more fundamental to the filesystem. Filesystem
would get callbacks to register such long term dirtying on its files.
Then it can do locking, resource allocation, -ENOTSUPP, etc.

Thanks,
Nick


Re: [PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread Nicholas Piggin
On Mon,  9 Jul 2018 01:05:52 -0700
john.hubb...@gmail.com wrote:

> From: John Hubbard 
> 
> Hi,
> 
> With respect to tracking get_user_pages*() pages with page->dma_pinned*
> fields [1], I spent a few days retrofitting most of the get_user_pages*()
> call sites, by adding calls to a new put_user_page() function, in place
> of put_page(), where appropriate. This will work, but it's a large effort.
> 
> Design note: I didn't see anything that hinted at a way to fix this
> problem, without actually changing all of the get_user_pages*() call sites,
> so I think it's reasonable to start with that.
> 
> Anyway, it's still incomplete, but because this is a large, tree-wide
> change (that will take some time and testing), I'd like to propose a plan,
> before spamming zillions of people with put_user_page() conversion patches.
> So I picked out the first two patches to show where this is going.
> 
> Proposed steps:
> 
> Step 1:
> 
> Start with the patches here, then continue with...dozens more.
> This will eventually convert all of the call sites to use put_user_page().
> This is easy in some places, but complex in others, such as:
> 
> -- drivers/gpu/drm/amd
> -- bio
> -- fuse
> -- cifs
> -- anything from:
>git grep  iov_iter_get_pages | cut -f1 -d ':' | sort | uniq
> 
> The easy ones can be grouped into a single patchset, perhaps, and the
> complex ones probably each need a patchset, in order to get the in-depth
> review they'll need.
> 
> Furthermore, some of these areas I hope to attract some help on, once
> this starts going.
> 
> Step 2:
> 
> In parallel, tidy up the core patchset that was discussed in [1], (version
> 2 has already been reviewed, so I know what to do), and get it perfected
> and reviewed. Don't apply it until step 1 is all done, though.
> 
> Step 3:
> 
> Activate refcounting of dma-pinned pages (essentially, patch #5, which is
> [1]), but don't use it yet. Place a few WARN_ON_ONCE calls to start
> mopping up any missed call sites.
> 
> Step 4:
> 
> After some soak time, actually connect it up (patch #6 of [1]) and start
> taking action based on the new page->dma_pinned* fields.

You can use my decade old patch!

https://lkml.org/lkml/2009/2/17/113

The problem with blocking in clear_page_dirty_for_io is that the fs is
holding the page lock (or locks) and possibly others too. If you
expect to have a bunch of long term references hanging around on the
page, then there will be hangs and deadlocks everywhere. And if you do
not have such log term references, then page lock (or some similar lock
bit) for the duration of the DMA should be about enough?

I think it has to be more fundamental to the filesystem. Filesystem
would get callbacks to register such long term dirtying on its files.
Then it can do locking, resource allocation, -ENOTSUPP, etc.

Thanks,
Nick


[PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread john . hubbard
From: John Hubbard 

Hi,

With respect to tracking get_user_pages*() pages with page->dma_pinned*
fields [1], I spent a few days retrofitting most of the get_user_pages*()
call sites, by adding calls to a new put_user_page() function, in place
of put_page(), where appropriate. This will work, but it's a large effort.

Design note: I didn't see anything that hinted at a way to fix this
problem, without actually changing all of the get_user_pages*() call sites,
so I think it's reasonable to start with that.

Anyway, it's still incomplete, but because this is a large, tree-wide
change (that will take some time and testing), I'd like to propose a plan,
before spamming zillions of people with put_user_page() conversion patches.
So I picked out the first two patches to show where this is going.

Proposed steps:

Step 1:

Start with the patches here, then continue with...dozens more.
This will eventually convert all of the call sites to use put_user_page().
This is easy in some places, but complex in others, such as:

-- drivers/gpu/drm/amd
-- bio
-- fuse
-- cifs
-- anything from:
   git grep  iov_iter_get_pages | cut -f1 -d ':' | sort | uniq

The easy ones can be grouped into a single patchset, perhaps, and the
complex ones probably each need a patchset, in order to get the in-depth
review they'll need.

Furthermore, some of these areas I hope to attract some help on, once
this starts going.

Step 2:

In parallel, tidy up the core patchset that was discussed in [1], (version
2 has already been reviewed, so I know what to do), and get it perfected
and reviewed. Don't apply it until step 1 is all done, though.

Step 3:

Activate refcounting of dma-pinned pages (essentially, patch #5, which is
[1]), but don't use it yet. Place a few WARN_ON_ONCE calls to start
mopping up any missed call sites.

Step 4:

After some soak time, actually connect it up (patch #6 of [1]) and start
taking action based on the new page->dma_pinned* fields.

[1] https://www.spinics.net/lists/linux-mm/msg156409.html

  or, the same thread on LKML if it's working for you:

https://lkml.org/lkml/2018/7/4/368

John Hubbard (2):
  mm: introduce put_user_page(), placeholder version
  goldfish_pipe/mm: convert to the new put_user_page() call

 drivers/platform/goldfish/goldfish_pipe.c |  6 +++---
 include/linux/mm.h| 14 ++
 2 files changed, 17 insertions(+), 3 deletions(-)

-- 
2.18.0



[PATCH 0/2] mm/fs: put_user_page() proposal

2018-07-09 Thread john . hubbard
From: John Hubbard 

Hi,

With respect to tracking get_user_pages*() pages with page->dma_pinned*
fields [1], I spent a few days retrofitting most of the get_user_pages*()
call sites, by adding calls to a new put_user_page() function, in place
of put_page(), where appropriate. This will work, but it's a large effort.

Design note: I didn't see anything that hinted at a way to fix this
problem, without actually changing all of the get_user_pages*() call sites,
so I think it's reasonable to start with that.

Anyway, it's still incomplete, but because this is a large, tree-wide
change (that will take some time and testing), I'd like to propose a plan,
before spamming zillions of people with put_user_page() conversion patches.
So I picked out the first two patches to show where this is going.

Proposed steps:

Step 1:

Start with the patches here, then continue with...dozens more.
This will eventually convert all of the call sites to use put_user_page().
This is easy in some places, but complex in others, such as:

-- drivers/gpu/drm/amd
-- bio
-- fuse
-- cifs
-- anything from:
   git grep  iov_iter_get_pages | cut -f1 -d ':' | sort | uniq

The easy ones can be grouped into a single patchset, perhaps, and the
complex ones probably each need a patchset, in order to get the in-depth
review they'll need.

Furthermore, some of these areas I hope to attract some help on, once
this starts going.

Step 2:

In parallel, tidy up the core patchset that was discussed in [1], (version
2 has already been reviewed, so I know what to do), and get it perfected
and reviewed. Don't apply it until step 1 is all done, though.

Step 3:

Activate refcounting of dma-pinned pages (essentially, patch #5, which is
[1]), but don't use it yet. Place a few WARN_ON_ONCE calls to start
mopping up any missed call sites.

Step 4:

After some soak time, actually connect it up (patch #6 of [1]) and start
taking action based on the new page->dma_pinned* fields.

[1] https://www.spinics.net/lists/linux-mm/msg156409.html

  or, the same thread on LKML if it's working for you:

https://lkml.org/lkml/2018/7/4/368

John Hubbard (2):
  mm: introduce put_user_page(), placeholder version
  goldfish_pipe/mm: convert to the new put_user_page() call

 drivers/platform/goldfish/goldfish_pipe.c |  6 +++---
 include/linux/mm.h| 14 ++
 2 files changed, 17 insertions(+), 3 deletions(-)

-- 
2.18.0