Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-06-04 Thread Peter Xu
On Fri, Jun 04, 2021 at 11:07:42AM +1000, Alistair Popple wrote:
> On Friday, 4 June 2021 12:47:40 AM AEST Peter Xu wrote:
> > External email: Use caution opening links or attachments
> > 
> > On Thu, Jun 03, 2021 at 09:39:32PM +1000, Alistair Popple wrote:
> > > Reclaim won't run on the page due to the extra references from the special
> > > swap entries.
> > 
> > That sounds reasonable, but I didn't find the point that stops it, probably
> > due to my limited knowledge on the reclaim code.  Could you elaborate?
> 
> Sure, it isn't immediately obvious but it ends up being detected at the start 
> of is_page_cache_freeable() in the pageout code:
> 
> 
> static pageout_t pageout(struct page *page, struct address_space *mapping)
> {
> 
> [...]
> 
>   if (!is_page_cache_freeable(page))
>   return PAGE_KEEP;

I did look at pageout() but still missed this small helper indeed (while it's
so important to know..), thanks!

-- 
Peter Xu

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-06-04 Thread Peter Xu
On Thu, Jun 03, 2021 at 09:39:32PM +1000, Alistair Popple wrote:
> Reclaim won't run on the page due to the extra references from the special 
> swap entries.

That sounds reasonable, but I didn't find the point that stops it, probably due
to my limited knowledge on the reclaim code.  Could you elaborate?

-- 
Peter Xu

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-06-03 Thread Alistair Popple
On Friday, 4 June 2021 12:47:40 AM AEST Peter Xu wrote:
> External email: Use caution opening links or attachments
> 
> On Thu, Jun 03, 2021 at 09:39:32PM +1000, Alistair Popple wrote:
> > Reclaim won't run on the page due to the extra references from the special
> > swap entries.
> 
> That sounds reasonable, but I didn't find the point that stops it, probably
> due to my limited knowledge on the reclaim code.  Could you elaborate?

Sure, it isn't immediately obvious but it ends up being detected at the start 
of is_page_cache_freeable() in the pageout code:


static pageout_t pageout(struct page *page, struct address_space *mapping)
{

[...]

if (!is_page_cache_freeable(page))
return PAGE_KEEP;

 - Alistair

> --
> Peter Xu




___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-06-03 Thread Alistair Popple
On Thursday, 3 June 2021 12:37:30 AM AEST Peter Xu wrote:
> External email: Use caution opening links or attachments
> 
> On Wed, Jun 02, 2021 at 06:50:37PM +1000, Balbir Singh wrote:
> > On Wed, May 26, 2021 at 12:17:18AM -0700, John Hubbard wrote:
> > > On 5/25/21 4:51 AM, Balbir Singh wrote:
> > > ...
> > > 
> > > > > How beneficial is this code to nouveau users?  I see that it permits
> > > > > a
> > > > > part of OpenCL to be implemented, but how useful/important is this
> > > > > in
> > > > > the real world?
> > > > 
> > > > That is a very good question! I've not reviewed the code, but a sample
> > > > program with the described use case would make things easy to parse.
> > > > I suspect that is not easy to build at the moment?
> > > 
> > > The cover letter says this:
> > > 
> > > This has been tested with upstream Mesa 21.1.0 and a simple OpenCL
> > > program
> > > which checks that GPU atomic accesses to system memory are atomic.
> > > Without
> > > this series the test fails as there is no way of write-protecting the
> > > page
> > > mapping which results in the device clobbering CPU writes. For reference
> > > the test is available at https://ozlabs.org/~apopple/opencl_svm_atomics/
> > > 
> > > Further testing has been performed by adding support for testing
> > > exclusive
> > > access to the hmm-tests kselftests.
> > > 
> > > ...so that seems to cover the "sample program" request, at least.
> > 
> > Thanks, I'll take a look
> > 
> > > > I wonder how we co-ordinate all the work the mm is doing, page
> > > > migration,
> > > > reclaim with device exclusive access? Do we have any numbers for the
> > > > worst
> > > > case page fault latency when something is marked away for exclusive
> > > > access?
> > > 
> > > CPU page fault latency is approximately "terrible", if a page is
> > > resident on the GPU. We have to spin up a DMA engine on the GPU and
> > > have it copy the page over the PCIe bus, after all.
> > > 
> > > > I presume for now this is anonymous memory only? SWP_DEVICE_EXCLUSIVE
> > > > would
> > > 
> > > Yes, for now.
> > > 
> > > > only impact the address space of programs using the GPU. Should the
> > > > exclusively marked range live in the unreclaimable list and recycled
> > > > back to active/in-active to account for the fact that
> > > > 
> > > > 1. It is not reclaimable and reclaim will only hurt via page faults?
> > > > 2. It ages the page correctly or at-least allows for that possibility
> > > > when the> > > 
> > > > page is used by the GPU.
> > > 
> > > I'm not sure that that is *necessarily* something we can conclude. It
> > > depends upon access patterns of each program. For example, a
> > > "reduction" parallel program sends over lots of data to the GPU, and
> > > only a tiny bit of (reduced!) data comes back to the CPU. In that case,
> > > freeing the physical page on the CPU is actually the best decision for
> > > the OS to make (if the OS is sufficiently prescient).> 
> > With a shared device or a device exclusive range, it would be good to get
> > the device usage pattern and update the mm with that knowledge, so that
> > the LRU can be better maintained. With your comment you seem to suggest
> > that a page used by the GPU might be a good candidate for reclaim based
> > on the CPU's understanding of the age of the page should not account for
> > use by the device
> > (are GPU workloads - access once and discard?)
> 
> Hmm, besides the aging info, this reminded me: do we need to isolate the
> page from lru too when marking device exclusive access?
> 
> Afaict the current patch didn't do that so I think it's reclaimable.  If we
> still have the rmap then we'll get a mmu notify CLEAR when unmapping that
> special pte, so device driver should be able to drop the ownership.  However
> we dropped the rmap when marking exclusive.  Now I don't know whether and
> how it'll work if page reclaim runs with the page being exclusively owned
> if without isolating the page..

Reclaim won't run on the page due to the extra references from the special 
swap entries.

> --
> Peter Xu




___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-06-03 Thread John Hubbard

On 6/2/21 1:50 AM, Balbir Singh wrote:
...

only impact the address space of programs using the GPU. Should the exclusively
marked range live in the unreclaimable list and recycled back to 
active/in-active
to account for the fact that

1. It is not reclaimable and reclaim will only hurt via page faults?
2. It ages the page correctly or at-least allows for that possibility when the
 page is used by the GPU.


I'm not sure that that is *necessarily* something we can conclude. It depends 
upon
access patterns of each program. For example, a "reduction" parallel program 
sends
over lots of data to the GPU, and only a tiny bit of (reduced!) data comes back
to the CPU. In that case, freeing the physical page on the CPU is actually the
best decision for the OS to make (if the OS is sufficiently prescient).



With a shared device or a device exclusive range, it would be good to get the 
device
usage pattern and update the mm with that knowledge, so that the LRU can be 
better


Integrating a GPU (or "device") processor and it's mm behavior with the Linux 
kernel is
always an interesting concept. Certainly worth exploring, although it's probably
not a small project by any means.


maintained. With your comment you seem to suggest that a page used by the GPU 
might
be a good candidate for reclaim based on the CPU's understanding of the age of
the page should not account for use by the device
(are GPU workloads - access once and discard?)



Well, that's a little too narrow of an interpretation. The GPU is a fairly 
general
purpose processor, and so it has all kinds of workloads. I'm trying to 
discourage
any hopes that one can know, in advance, precisely how the GPU's pages need to 
be
managed. It's similar to the the CPU, in that regard. My example was just one, 
out
of a vast pool of possible behaviors.

thanks,
--
John Hubbard
NVIDIA
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-06-02 Thread Peter Xu
On Wed, Jun 02, 2021 at 06:50:37PM +1000, Balbir Singh wrote:
> On Wed, May 26, 2021 at 12:17:18AM -0700, John Hubbard wrote:
> > On 5/25/21 4:51 AM, Balbir Singh wrote:
> > ...
> > > > How beneficial is this code to nouveau users?  I see that it permits a
> > > > part of OpenCL to be implemented, but how useful/important is this in
> > > > the real world?
> > > 
> > > That is a very good question! I've not reviewed the code, but a sample
> > > program with the described use case would make things easy to parse.
> > > I suspect that is not easy to build at the moment?
> > > 
> > 
> > The cover letter says this:
> > 
> > This has been tested with upstream Mesa 21.1.0 and a simple OpenCL program
> > which checks that GPU atomic accesses to system memory are atomic. Without
> > this series the test fails as there is no way of write-protecting the page
> > mapping which results in the device clobbering CPU writes. For reference
> > the test is available at https://ozlabs.org/~apopple/opencl_svm_atomics/
> > 
> > Further testing has been performed by adding support for testing exclusive
> > access to the hmm-tests kselftests.
> > 
> > ...so that seems to cover the "sample program" request, at least.
> 
> Thanks, I'll take a look
> 
> > 
> > > I wonder how we co-ordinate all the work the mm is doing, page migration,
> > > reclaim with device exclusive access? Do we have any numbers for the worst
> > > case page fault latency when something is marked away for exclusive 
> > > access?
> > 
> > CPU page fault latency is approximately "terrible", if a page is resident on
> > the GPU. We have to spin up a DMA engine on the GPU and have it copy the 
> > page
> > over the PCIe bus, after all.
> > 
> > > I presume for now this is anonymous memory only? SWP_DEVICE_EXCLUSIVE 
> > > would
> > 
> > Yes, for now.
> > 
> > > only impact the address space of programs using the GPU. Should the 
> > > exclusively
> > > marked range live in the unreclaimable list and recycled back to 
> > > active/in-active
> > > to account for the fact that
> > > 
> > > 1. It is not reclaimable and reclaim will only hurt via page faults?
> > > 2. It ages the page correctly or at-least allows for that possibility 
> > > when the
> > > page is used by the GPU.
> > 
> > I'm not sure that that is *necessarily* something we can conclude. It 
> > depends upon
> > access patterns of each program. For example, a "reduction" parallel 
> > program sends
> > over lots of data to the GPU, and only a tiny bit of (reduced!) data comes 
> > back
> > to the CPU. In that case, freeing the physical page on the CPU is actually 
> > the
> > best decision for the OS to make (if the OS is sufficiently prescient).
> >
> 
> With a shared device or a device exclusive range, it would be good to get the 
> device
> usage pattern and update the mm with that knowledge, so that the LRU can be 
> better
> maintained. With your comment you seem to suggest that a page used by the GPU 
> might
> be a good candidate for reclaim based on the CPU's understanding of the age of
> the page should not account for use by the device
> (are GPU workloads - access once and discard?) 

Hmm, besides the aging info, this reminded me: do we need to isolate the page
from lru too when marking device exclusive access?

Afaict the current patch didn't do that so I think it's reclaimable.  If we
still have the rmap then we'll get a mmu notify CLEAR when unmapping that
special pte, so device driver should be able to drop the ownership.  However we
dropped the rmap when marking exclusive.  Now I don't know whether and how
it'll work if page reclaim runs with the page being exclusively owned if
without isolating the page..

-- 
Peter Xu

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-06-02 Thread Balbir Singh
On Wed, May 26, 2021 at 12:17:18AM -0700, John Hubbard wrote:
> On 5/25/21 4:51 AM, Balbir Singh wrote:
> ...
> > > How beneficial is this code to nouveau users?  I see that it permits a
> > > part of OpenCL to be implemented, but how useful/important is this in
> > > the real world?
> > 
> > That is a very good question! I've not reviewed the code, but a sample
> > program with the described use case would make things easy to parse.
> > I suspect that is not easy to build at the moment?
> > 
> 
> The cover letter says this:
> 
> This has been tested with upstream Mesa 21.1.0 and a simple OpenCL program
> which checks that GPU atomic accesses to system memory are atomic. Without
> this series the test fails as there is no way of write-protecting the page
> mapping which results in the device clobbering CPU writes. For reference
> the test is available at https://ozlabs.org/~apopple/opencl_svm_atomics/
> 
> Further testing has been performed by adding support for testing exclusive
> access to the hmm-tests kselftests.
> 
> ...so that seems to cover the "sample program" request, at least.

Thanks, I'll take a look

> 
> > I wonder how we co-ordinate all the work the mm is doing, page migration,
> > reclaim with device exclusive access? Do we have any numbers for the worst
> > case page fault latency when something is marked away for exclusive access?
> 
> CPU page fault latency is approximately "terrible", if a page is resident on
> the GPU. We have to spin up a DMA engine on the GPU and have it copy the page
> over the PCIe bus, after all.
> 
> > I presume for now this is anonymous memory only? SWP_DEVICE_EXCLUSIVE would
> 
> Yes, for now.
> 
> > only impact the address space of programs using the GPU. Should the 
> > exclusively
> > marked range live in the unreclaimable list and recycled back to 
> > active/in-active
> > to account for the fact that
> > 
> > 1. It is not reclaimable and reclaim will only hurt via page faults?
> > 2. It ages the page correctly or at-least allows for that possibility when 
> > the
> > page is used by the GPU.
> 
> I'm not sure that that is *necessarily* something we can conclude. It depends 
> upon
> access patterns of each program. For example, a "reduction" parallel program 
> sends
> over lots of data to the GPU, and only a tiny bit of (reduced!) data comes 
> back
> to the CPU. In that case, freeing the physical page on the CPU is actually the
> best decision for the OS to make (if the OS is sufficiently prescient).
>

With a shared device or a device exclusive range, it would be good to get the 
device
usage pattern and update the mm with that knowledge, so that the LRU can be 
better
maintained. With your comment you seem to suggest that a page used by the GPU 
might
be a good candidate for reclaim based on the CPU's understanding of the age of
the page should not account for use by the device
(are GPU workloads - access once and discard?) 

Balbir Singh.

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-05-28 Thread Peter Xu
On Fri, May 28, 2021 at 11:48:40AM +1000, Alistair Popple wrote:

[...]

> > > > > + while (page_vma_mapped_walk()) {
> > > > > + /* Unexpected PMD-mapped THP? */
> > > > > + VM_BUG_ON_PAGE(!pvmw.pte, page);
> > > > > +
> > > > > + if (!pte_present(*pvmw.pte)) {
> > > > > + ret = false;
> > > > > + page_vma_mapped_walk_done();
> > > > > + break;
> > > > > + }
> > > > > +
> > > > > + subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte);
> > > > 
> > > > I see that all pages passed in should be done after FOLL_SPLIT_PMD, so
> > > > is
> > > > this needed?  Or say, should subpage==page always be true?
> > > 
> > > Not always, in the case of a thp there are small ptes which will get
> > > device
> > > exclusive entries.
> > 
> > FOLL_SPLIT_PMD will first split the huge thp into smaller pages, then do
> > follow_page_pte() on them (in follow_pmd_mask):
> > 
> > if (flags & FOLL_SPLIT_PMD) {
> > int ret;
> > page = pmd_page(*pmd);
> > if (is_huge_zero_page(page)) {
> > spin_unlock(ptl);
> > ret = 0;
> > split_huge_pmd(vma, pmd, address);
> > if (pmd_trans_unstable(pmd))
> > ret = -EBUSY;
> > } else {
> > spin_unlock(ptl);
> > split_huge_pmd(vma, pmd, address);
> > ret = pte_alloc(mm, pmd) ? -ENOMEM : 0;
> > }
> > 
> > return ret ? ERR_PTR(ret) :
> > follow_page_pte(vma, address, pmd, flags,
> > >pgmap); }
> > 
> > So I thought all pages are small pages?
> 
> The page will remain as a transparent huge page though (at least as I 
> understand things). FOLL_SPLIT_PMD turns it into a pte mapped thp by 
> splitting 
> the pmd and creating pte's mapping the subpages but doesn't split the page 
> itself. For comparison FOLL_SPLIT (which has been removed in v5.13 due to 
> lack 
> of use) is what would be used to split the page in the above GUP code by 
> calling split_huge_page() rather than split_huge_pmd().

But shouldn't FOLL_SPLIT_PMD filled in small pfns for each pte?  See
__split_huge_pmd_locked():

for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) {
...
} else {
entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot));
...
}
...
set_pte_at(mm, addr, pte, entry);
}

Then iiuc the coming follow_page_pte() will directly fetch the small pages?

-- 
Peter Xu

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-05-27 Thread Alistair Popple
On Thursday, 27 May 2021 11:04:57 PM AEST Peter Xu wrote:
> On Thu, May 27, 2021 at 01:35:39PM +1000, Alistair Popple wrote:
> > > > + *
> > > > + * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device
> > > > will
> > > > no + * longer have exclusive access to the page. May ignore the
> > > > invalidation that's + * part of make_device_exclusive_range() if the
> > > > owner field
> > > > + * matches the value passed to make_device_exclusive_range().
> > > 
> > > Perhaps s/matches/does not match/?
> > 
> > No, "matches" is correct. The MMU_NOTIFY_EXCLUSIVE notifier is to notify a
> > listener that a range is being invalidated for the purpose of making the
> > range available for some device to have exclusive access to. Which does
> > also mean a device getting the notification no longer has exclusive
> > access if it already did.
> > 
> > A unique type is needed because when creating the range a driver needs to
> > form a mmu critical section (with mmu_interval_read_begin()/
> > mmu_interval_read_end()) to ensure the entry remains valid long enough to
> > program the device pte and hasn't been invalidated.
> > 
> > However without a way of filtering any invalidations will result in a
> > retry, but make_device_exclusive_range() needs to do an invalidation
> > during installation of the entry. To avoid this causing infinite retries
> > the driver ignores specific invalidation events that it knows don't
> > apply, ie. the invalidations that are a result of that driver asking for
> > device exclusive entries.
> 
> OK I think I get it now.. so the driver checks both EXCLUSIVE and owner, if
> all match it skips the notify, otherwise it's treated like all the rest. 
> Thanks.
> 
> However then it's still confusing (as I raised it too in previous comment)
> that we use CLEAR when re-installing the valid pte.  It's merely against
> what CLEAR means.

Oh, thanks. I understand where you are coming from now - the pte is already 
invalid so ordinarily wouldn't need clearing.

> How about sending EXCLUSIVE for both mark/restore?  Just that when restore
> we notify with owner==NULL telling that no one is owning it anymore so
> driver needs to drop the ownership.  I assume your driver patch does not
> need change too.  Would that be much cleaner than CLEAR?  I bet it also
> makes commenting the new notify easier.
> 
> What do you think?

That seems like a good and avoids adding another type. And as you say they 
driver patch shouldn't need changing either (will need to confirm though).
 
> [...]
> 
> > > > +   vma->vm_mm, address,
> > > > min(vma->vm_end,
> > > > +   address + page_size(page)),
> > > > args->owner); + mmu_notifier_invalidate_range_start();
> > > > +
> > > > + while (page_vma_mapped_walk()) {
> > > > + /* Unexpected PMD-mapped THP? */
> > > > + VM_BUG_ON_PAGE(!pvmw.pte, page);
> > > > +
> > > > + if (!pte_present(*pvmw.pte)) {
> > > > + ret = false;
> > > > + page_vma_mapped_walk_done();
> > > > + break;
> > > > + }
> > > > +
> > > > + subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte);
> > > 
> > > I see that all pages passed in should be done after FOLL_SPLIT_PMD, so
> > > is
> > > this needed?  Or say, should subpage==page always be true?
> > 
> > Not always, in the case of a thp there are small ptes which will get
> > device
> > exclusive entries.
> 
> FOLL_SPLIT_PMD will first split the huge thp into smaller pages, then do
> follow_page_pte() on them (in follow_pmd_mask):
> 
> if (flags & FOLL_SPLIT_PMD) {
> int ret;
> page = pmd_page(*pmd);
> if (is_huge_zero_page(page)) {
> spin_unlock(ptl);
> ret = 0;
> split_huge_pmd(vma, pmd, address);
> if (pmd_trans_unstable(pmd))
> ret = -EBUSY;
> } else {
> spin_unlock(ptl);
> split_huge_pmd(vma, pmd, address);
> ret = pte_alloc(mm, pmd) ? -ENOMEM : 0;
> }
> 
> return ret ? ERR_PTR(ret) :
> follow_page_pte(vma, address, pmd, flags,
> >pgmap); }
> 
> So I thought all pages are small pages?

The page will remain as a transparent huge page though (at least as I 
understand things). FOLL_SPLIT_PMD turns it into a pte mapped thp by splitting 
the pmd and creating pte's mapping the subpages but doesn't split the page 
itself. For comparison FOLL_SPLIT (which has been removed in v5.13 due to lack 
of use) is what would be used to split the page in the above GUP code by 
calling split_huge_page() rather than split_huge_pmd().

This was done to avoid adding code for handling device exclusive entries at 
the pmd level as well which would 

Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-05-27 Thread Peter Xu
On Thu, May 27, 2021 at 01:35:39PM +1000, Alistair Popple wrote:
> > > + *
> > > + * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device will
> > > no + * longer have exclusive access to the page. May ignore the
> > > invalidation that's + * part of make_device_exclusive_range() if the
> > > owner field
> > > + * matches the value passed to make_device_exclusive_range().
> > 
> > Perhaps s/matches/does not match/?
> 
> No, "matches" is correct. The MMU_NOTIFY_EXCLUSIVE notifier is to notify a 
> listener that a range is being invalidated for the purpose of making the 
> range 
> available for some device to have exclusive access to. Which does also mean a 
> device getting the notification no longer has exclusive access if it already 
> did.
> 
> A unique type is needed because when creating the range a driver needs to 
> form 
> a mmu critical section (with mmu_interval_read_begin()/
> mmu_interval_read_end()) to ensure the entry remains valid long enough to 
> program the device pte and hasn't been invalidated.
> 
> However without a way of filtering any invalidations will result in a retry, 
> but make_device_exclusive_range() needs to do an invalidation during 
> installation of the entry. To avoid this causing infinite retries the driver 
> ignores specific invalidation events that it knows don't apply, ie. the 
> invalidations that are a result of that driver asking for device exclusive 
> entries.

OK I think I get it now.. so the driver checks both EXCLUSIVE and owner, if all
match it skips the notify, otherwise it's treated like all the rest.  Thanks.

However then it's still confusing (as I raised it too in previous comment) that
we use CLEAR when re-installing the valid pte.  It's merely against what CLEAR
means.

How about sending EXCLUSIVE for both mark/restore?  Just that when restore we
notify with owner==NULL telling that no one is owning it anymore so driver
needs to drop the ownership.  I assume your driver patch does not need change
too.  Would that be much cleaner than CLEAR?  I bet it also makes commenting
the new notify easier.

What do you think?

[...]

> > > +   vma->vm_mm, address, min(vma->vm_end,
> > > +   address + page_size(page)),
> > > args->owner); + mmu_notifier_invalidate_range_start();
> > > +
> > > + while (page_vma_mapped_walk()) {
> > > + /* Unexpected PMD-mapped THP? */
> > > + VM_BUG_ON_PAGE(!pvmw.pte, page);
> > > +
> > > + if (!pte_present(*pvmw.pte)) {
> > > + ret = false;
> > > + page_vma_mapped_walk_done();
> > > + break;
> > > + }
> > > +
> > > + subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte);
> > 
> > I see that all pages passed in should be done after FOLL_SPLIT_PMD, so is
> > this needed?  Or say, should subpage==page always be true?
> 
> Not always, in the case of a thp there are small ptes which will get device 
> exclusive entries.

FOLL_SPLIT_PMD will first split the huge thp into smaller pages, then do
follow_page_pte() on them (in follow_pmd_mask):

if (flags & FOLL_SPLIT_PMD) {
int ret;
page = pmd_page(*pmd);
if (is_huge_zero_page(page)) {
spin_unlock(ptl);
ret = 0;
split_huge_pmd(vma, pmd, address);
if (pmd_trans_unstable(pmd))
ret = -EBUSY;
} else {
spin_unlock(ptl);
split_huge_pmd(vma, pmd, address);
ret = pte_alloc(mm, pmd) ? -ENOMEM : 0;
}

return ret ? ERR_PTR(ret) :
follow_page_pte(vma, address, pmd, flags, >pgmap);
}

So I thought all pages are small pages?

-- 
Peter Xu

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-05-26 Thread Alistair Popple
On Thursday, 27 May 2021 5:28:32 AM AEST Peter Xu wrote:
> On Mon, May 24, 2021 at 11:27:22PM +1000, Alistair Popple wrote:
> > Some devices require exclusive write access to shared virtual
> > memory (SVM) ranges to perform atomic operations on that memory. This
> > requires CPU page tables to be updated to deny access whilst atomic
> > operations are occurring.
> > 
> > In order to do this introduce a new swap entry
> > type (SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for
> > exclusive access by a device all page table mappings for the particular
> > range are replaced with device exclusive swap entries. This causes any
> > CPU access to the page to result in a fault.
> > 
> > Faults are resovled by replacing the faulting entry with the original
> > mapping. This results in MMU notifiers being called which a driver uses
> > to update access permissions such as revoking atomic access. After
> > notifiers have been called the device will no longer have exclusive
> > access to the region.
> > 
> > Walking of the page tables to find the target pages is handled by
> > get_user_pages() rather than a direct page table walk. A direct page
> > table walk similar to what migrate_vma_collect()/unmap() does could also
> > have been utilised. However this resulted in more code similar in
> > functionality to what get_user_pages() provides as page faulting is
> > required to make the PTEs present and to break COW.
> > 
> > Signed-off-by: Alistair Popple 
> > Reviewed-by: Christoph Hellwig 
> > 
> > ---
> > 
> > v9:
> > * Split rename of migrate_pgmap_owner into a separate patch.
> > * Added comments explaining SWP_DEVICE_EXCLUSIVE_* entries.
> > * Renamed try_to_protect{_one} to page_make_device_exclusive{_one} based
> > 
> >   somewhat on a suggestion from Peter Xu. I was never particularly happy
> >   with try_to_protect() as a name so think this is better.
> > 
> > * Removed unneccesary code and reworded some comments based on feedback
> > 
> >   from Peter Xu.
> > 
> > * Removed the VMA walk when restoring PTEs for device-exclusive entries.
> > * Simplified implementation of copy_pte_range() to fail if the page
> > 
> >   cannot be locked. This might lead to occasional fork() failures but at
> >   this stage we don't think that will be an issue.
> > 
> > v8:
> > * Remove device exclusive entries on fork rather than copy them.
> > 
> > v7:
> > * Added Christoph's Reviewed-by.
> > * Minor cosmetic cleanups suggested by Christoph.
> > * Replace mmu_notifier_range_init_migrate/exclusive with
> > 
> >   mmu_notifier_range_init_owner as suggested by Christoph.
> > 
> > * Replaced lock_page() with lock_page_retry() when handling faults.
> > * Restrict to anonymous pages for now.
> > 
> > v6:
> > * Fixed a bisectablity issue due to incorrectly applying the rename of
> > 
> >   migrate_pgmap_owner to the wrong patches for Nouveau and hmm_test.
> > 
> > v5:
> > * Renamed range->migrate_pgmap_owner to range->owner.
> > * Added MMU_NOTIFY_EXCLUSIVE to allow passing of a driver cookie which
> > 
> >   allows notifiers called as a result of make_device_exclusive_range() to
> >   be ignored.
> > 
> > * Added a check to try_to_protect_one() to detect if the pages originally
> > 
> >   returned from get_user_pages() have been unmapped or not.
> > 
> > * Removed check_device_exclusive_range() as it is no longer required with
> > 
> >   the other changes.
> > 
> > * Documentation update.
> > 
> > v4:
> > * Add function to check that mappings are still valid and exclusive.
> > * s/long/unsigned long/ in make_device_exclusive_entry().
> > ---
> > 
> >  Documentation/vm/hmm.rst |  17 
> >  include/linux/mmu_notifier.h |   6 ++
> >  include/linux/rmap.h |   4 +
> >  include/linux/swap.h |   7 +-
> >  include/linux/swapops.h  |  44 -
> >  mm/hmm.c |   5 +
> >  mm/memory.c  | 128 +++-
> >  mm/mprotect.c|   8 ++
> >  mm/page_vma_mapped.c |   9 +-
> >  mm/rmap.c| 186 +++
> >  10 files changed, 405 insertions(+), 9 deletions(-)
> > 
> > diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
> > index 3df79307a797..a14c2938e7af 100644
> > --- a/Documentation/vm/hmm.rst
> > +++ b/Documentation/vm/hmm.rst
> > 
> > @@ -405,6 +405,23 @@ between device driver specific code and shared common 
code:
> > The lock can now be released.
> > 
> > +Exclusive access memory
> > +===
> > +
> > +Some devices have features such as atomic PTE bits that can be used to
> > implement +atomic access to system memory. To support atomic operations
> > to a shared virtual +memory page such a device needs access to that page
> > which is exclusive of any +userspace access from the CPU. The
> > ``make_device_exclusive_range()`` function +can be used to make a memory
> > range inaccessible from userspace.
> > +
> > +This replaces all mappings for pages in 

Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-05-26 Thread Peter Xu
On Mon, May 24, 2021 at 11:27:22PM +1000, Alistair Popple wrote:
> Some devices require exclusive write access to shared virtual
> memory (SVM) ranges to perform atomic operations on that memory. This
> requires CPU page tables to be updated to deny access whilst atomic
> operations are occurring.
> 
> In order to do this introduce a new swap entry
> type (SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for
> exclusive access by a device all page table mappings for the particular
> range are replaced with device exclusive swap entries. This causes any
> CPU access to the page to result in a fault.
> 
> Faults are resovled by replacing the faulting entry with the original
> mapping. This results in MMU notifiers being called which a driver uses
> to update access permissions such as revoking atomic access. After
> notifiers have been called the device will no longer have exclusive
> access to the region.
> 
> Walking of the page tables to find the target pages is handled by
> get_user_pages() rather than a direct page table walk. A direct page
> table walk similar to what migrate_vma_collect()/unmap() does could also
> have been utilised. However this resulted in more code similar in
> functionality to what get_user_pages() provides as page faulting is
> required to make the PTEs present and to break COW.
> 
> Signed-off-by: Alistair Popple 
> Reviewed-by: Christoph Hellwig 
> 
> ---
> 
> v9:
> * Split rename of migrate_pgmap_owner into a separate patch.
> * Added comments explaining SWP_DEVICE_EXCLUSIVE_* entries.
> * Renamed try_to_protect{_one} to page_make_device_exclusive{_one} based
>   somewhat on a suggestion from Peter Xu. I was never particularly happy
>   with try_to_protect() as a name so think this is better.
> * Removed unneccesary code and reworded some comments based on feedback
>   from Peter Xu.
> * Removed the VMA walk when restoring PTEs for device-exclusive entries.
> * Simplified implementation of copy_pte_range() to fail if the page
>   cannot be locked. This might lead to occasional fork() failures but at
>   this stage we don't think that will be an issue.
> 
> v8:
> * Remove device exclusive entries on fork rather than copy them.
> 
> v7:
> * Added Christoph's Reviewed-by.
> * Minor cosmetic cleanups suggested by Christoph.
> * Replace mmu_notifier_range_init_migrate/exclusive with
>   mmu_notifier_range_init_owner as suggested by Christoph.
> * Replaced lock_page() with lock_page_retry() when handling faults.
> * Restrict to anonymous pages for now.
> 
> v6:
> * Fixed a bisectablity issue due to incorrectly applying the rename of
>   migrate_pgmap_owner to the wrong patches for Nouveau and hmm_test.
> 
> v5:
> * Renamed range->migrate_pgmap_owner to range->owner.
> * Added MMU_NOTIFY_EXCLUSIVE to allow passing of a driver cookie which
>   allows notifiers called as a result of make_device_exclusive_range() to
>   be ignored.
> * Added a check to try_to_protect_one() to detect if the pages originally
>   returned from get_user_pages() have been unmapped or not.
> * Removed check_device_exclusive_range() as it is no longer required with
>   the other changes.
> * Documentation update.
> 
> v4:
> * Add function to check that mappings are still valid and exclusive.
> * s/long/unsigned long/ in make_device_exclusive_entry().
> ---
>  Documentation/vm/hmm.rst |  17 
>  include/linux/mmu_notifier.h |   6 ++
>  include/linux/rmap.h |   4 +
>  include/linux/swap.h |   7 +-
>  include/linux/swapops.h  |  44 -
>  mm/hmm.c |   5 +
>  mm/memory.c  | 128 +++-
>  mm/mprotect.c|   8 ++
>  mm/page_vma_mapped.c |   9 +-
>  mm/rmap.c| 186 +++
>  10 files changed, 405 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
> index 3df79307a797..a14c2938e7af 100644
> --- a/Documentation/vm/hmm.rst
> +++ b/Documentation/vm/hmm.rst
> @@ -405,6 +405,23 @@ between device driver specific code and shared common 
> code:
>  
> The lock can now be released.
>  
> +Exclusive access memory
> +===
> +
> +Some devices have features such as atomic PTE bits that can be used to 
> implement
> +atomic access to system memory. To support atomic operations to a shared 
> virtual
> +memory page such a device needs access to that page which is exclusive of any
> +userspace access from the CPU. The ``make_device_exclusive_range()`` function
> +can be used to make a memory range inaccessible from userspace.
> +
> +This replaces all mappings for pages in the given range with special swap
> +entries. Any attempt to access the swap entry results in a fault which is
> +resovled by replacing the entry with the original mapping. A driver gets
> +notified that the mapping has been changed by MMU notifiers, after which 
> point
> +it will no longer have exclusive access to the page. 

Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-05-26 Thread Alistair Popple
On Wednesday, 26 May 2021 5:17:18 PM AEST John Hubbard wrote:
> On 5/25/21 4:51 AM, Balbir Singh wrote:
> ...
> 
> >> How beneficial is this code to nouveau users?  I see that it permits a
> >> part of OpenCL to be implemented, but how useful/important is this in
> >> the real world?
> > 
> > That is a very good question! I've not reviewed the code, but a sample
> > program with the described use case would make things easy to parse.
> > I suspect that is not easy to build at the moment?
> 
> The cover letter says this:
> 
> This has been tested with upstream Mesa 21.1.0 and a simple OpenCL program
> which checks that GPU atomic accesses to system memory are atomic. Without
> this series the test fails as there is no way of write-protecting the page
> mapping which results in the device clobbering CPU writes. For reference
> the test is available at https://ozlabs.org/~apopple/opencl_svm_atomics/
> 
> Further testing has been performed by adding support for testing exclusive
> access to the hmm-tests kselftests.
> 
> ...so that seems to cover the "sample program" request, at least.

It is also sufficiently easy to build, assuming of course you have the 
appropriate Mesa/LLVM/OpenCL libraries installed :-)

If you are interested I have some scripts which may help with building Mesa, 
etc. Not that that is especially hard either, it's just there are a couple of 
different dependencies required.

> > I wonder how we co-ordinate all the work the mm is doing, page migration,
> > reclaim with device exclusive access? Do we have any numbers for the worst
> > case page fault latency when something is marked away for exclusive
> > access?
>
> CPU page fault latency is approximately "terrible", if a page is resident on
> the GPU. We have to spin up a DMA engine on the GPU and have it copy the
> page over the PCIe bus, after all.

Although for clarity that describes latency for CPU faults to device private 
pages which are always resident on the GPU. A CPU fault to a page being 
exclusively accessed will be slightly less terrible as it only requires the 
GPU MMU/TLB mappings to be taken down in much the same as for any other MMU 
notifier callback as the page is mapped by the GPU rather than resident there.

> > I presume for now this is anonymous memory only? SWP_DEVICE_EXCLUSIVE
> > would
> 
> Yes, for now.
> 
> > only impact the address space of programs using the GPU. Should the
> > exclusively marked range live in the unreclaimable list and recycled back
> > to active/in-active to account for the fact that
> > 
> > 1. It is not reclaimable and reclaim will only hurt via page faults?
> > 2. It ages the page correctly or at-least allows for that possibility when
> > the> 
> > page is used by the GPU.
> 
> I'm not sure that that is *necessarily* something we can conclude. It
> depends upon access patterns of each program. For example, a "reduction"
> parallel program sends over lots of data to the GPU, and only a tiny bit of
> (reduced!) data comes back to the CPU. In that case, freeing the physical
> page on the CPU is actually the best decision for the OS to make (if the OS
> is sufficiently prescient).
> 
> thanks,




___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-05-26 Thread John Hubbard

On 5/25/21 4:51 AM, Balbir Singh wrote:
...

How beneficial is this code to nouveau users?  I see that it permits a
part of OpenCL to be implemented, but how useful/important is this in
the real world?


That is a very good question! I've not reviewed the code, but a sample
program with the described use case would make things easy to parse.
I suspect that is not easy to build at the moment?



The cover letter says this:

This has been tested with upstream Mesa 21.1.0 and a simple OpenCL program
which checks that GPU atomic accesses to system memory are atomic. Without
this series the test fails as there is no way of write-protecting the page
mapping which results in the device clobbering CPU writes. For reference
the test is available at https://ozlabs.org/~apopple/opencl_svm_atomics/

Further testing has been performed by adding support for testing exclusive
access to the hmm-tests kselftests.

...so that seems to cover the "sample program" request, at least.


I wonder how we co-ordinate all the work the mm is doing, page migration,
reclaim with device exclusive access? Do we have any numbers for the worst
case page fault latency when something is marked away for exclusive access?


CPU page fault latency is approximately "terrible", if a page is resident on
the GPU. We have to spin up a DMA engine on the GPU and have it copy the page
over the PCIe bus, after all.


I presume for now this is anonymous memory only? SWP_DEVICE_EXCLUSIVE would


Yes, for now.


only impact the address space of programs using the GPU. Should the exclusively
marked range live in the unreclaimable list and recycled back to 
active/in-active
to account for the fact that

1. It is not reclaimable and reclaim will only hurt via page faults?
2. It ages the page correctly or at-least allows for that possibility when the
page is used by the GPU.


I'm not sure that that is *necessarily* something we can conclude. It depends 
upon
access patterns of each program. For example, a "reduction" parallel program 
sends
over lots of data to the GPU, and only a tiny bit of (reduced!) data comes back
to the CPU. In that case, freeing the physical page on the CPU is actually the
best decision for the OS to make (if the OS is sufficiently prescient).

thanks,
--
John Hubbard
NVIDIA
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-05-25 Thread Balbir Singh
On Mon, May 24, 2021 at 03:11:57PM -0700, Andrew Morton wrote:
> On Mon, 24 May 2021 23:27:22 +1000 Alistair Popple  wrote:
> 
> > Some devices require exclusive write access to shared virtual
> > memory (SVM) ranges to perform atomic operations on that memory. This
> > requires CPU page tables to be updated to deny access whilst atomic
> > operations are occurring.
> > 
> > In order to do this introduce a new swap entry
> > type (SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for
> > exclusive access by a device all page table mappings for the particular
> > range are replaced with device exclusive swap entries. This causes any
> > CPU access to the page to result in a fault.
> > 
> > Faults are resovled by replacing the faulting entry with the original
> > mapping. This results in MMU notifiers being called which a driver uses
> > to update access permissions such as revoking atomic access. After
> > notifiers have been called the device will no longer have exclusive
> > access to the region.
> > 
> > Walking of the page tables to find the target pages is handled by
> > get_user_pages() rather than a direct page table walk. A direct page
> > table walk similar to what migrate_vma_collect()/unmap() does could also
> > have been utilised. However this resulted in more code similar in
> > functionality to what get_user_pages() provides as page faulting is
> > required to make the PTEs present and to break COW.
> > 
> > ...
> >
> >  Documentation/vm/hmm.rst |  17 
> >  include/linux/mmu_notifier.h |   6 ++
> >  include/linux/rmap.h |   4 +
> >  include/linux/swap.h |   7 +-
> >  include/linux/swapops.h  |  44 -
> >  mm/hmm.c |   5 +
> >  mm/memory.c  | 128 +++-
> >  mm/mprotect.c|   8 ++
> >  mm/page_vma_mapped.c |   9 +-
> >  mm/rmap.c| 186 +++
> >  10 files changed, 405 insertions(+), 9 deletions(-)
> > 
> 
> This is quite a lot of code added to core MM for a single driver.
> 
> Is there any expectation that other drivers will use this code?
> 
> Is there a way of reducing the impact (code size, at least) for systems
> which don't need this code?
>
> How beneficial is this code to nouveau users?  I see that it permits a
> part of OpenCL to be implemented, but how useful/important is this in
> the real world?

That is a very good question! I've not reviewed the code, but a sample
program with the described use case would make things easy to parse.
I suspect that is not easy to build at the moment?

I wonder how we co-ordinate all the work the mm is doing, page migration,
reclaim with device exclusive access? Do we have any numbers for the worst
case page fault latency when something is marked away for exclusive access?
I presume for now this is anonymous memory only? SWP_DEVICE_EXCLUSIVE would
only impact the address space of programs using the GPU. Should the exclusively
marked range live in the unreclaimable list and recycled back to 
active/in-active
to account for the fact that

1. It is not reclaimable and reclaim will only hurt via page faults?
2. It ages the page correctly or at-least allows for that possibility when the
   page is used by the GPU.

Balbir Singh.
 
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-05-25 Thread Alistair Popple
On Tuesday, 25 May 2021 11:31:17 AM AEST John Hubbard wrote:
> On 5/24/21 3:11 PM, Andrew Morton wrote:
> >> ...
> >> 
> >>   Documentation/vm/hmm.rst |  17 
> >>   include/linux/mmu_notifier.h |   6 ++
> >>   include/linux/rmap.h |   4 +
> >>   include/linux/swap.h |   7 +-
> >>   include/linux/swapops.h  |  44 -
> >>   mm/hmm.c |   5 +
> >>   mm/memory.c  | 128 +++-
> >>   mm/mprotect.c|   8 ++
> >>   mm/page_vma_mapped.c |   9 +-
> >>   mm/rmap.c| 186 +++
> >>   10 files changed, 405 insertions(+), 9 deletions(-)
> > 
> > This is quite a lot of code added to core MM for a single driver.
> > 
> > Is there any expectation that other drivers will use this code?
> 
> Yes! This should work for GPUs (and potentially, other devices) that support
> OpenCL SVM atomic accesses on the device. I haven't looked into how amdgpu
> works in any detail, but that's certainly at the top of the list of likely
> additional callers.
> 
> > Is there a way of reducing the impact (code size, at least) for systems
> > which don't need this code?

All of the code added to mm/rmap.c is specific to implementing this feature 
and not depended on by other core MM code so could be put behind something 
like CONFIG_DEVICE_PRIVATE to reduce the code size impact (I realise now it 
currently isn't but should be).

The impact on compiled code size in mm/memory.c also ends up being minimised 
by the compiler because all of it is of the form:

if (is_device_exclusive_entry(...)) {
[...]
}

Meaning it should get thrown away when the feature is not configured given 
is_device_exclusive_entry() is a static inline always returning false in that 
case.

> I'll leave this question to others for the moment, in order to answer
> the "do we need it at all" points.
> 
> > How beneficial is this code to nouveau users?  I see that it permits a
> > part of OpenCL to be implemented, but how useful/important is this in
> > the real world?
> 
> So this is interesting. Right now, OpenCL support in Nouveau is rather new
> and so probably not a huge impact yet. However, we've built up enough
> experience with CUDA and OpenCL to learn that atomic operations, as part of
> the user space programming model, are a super big deal. Atomic operations
> are so useful and important that I'd expect many OpenCL SVM users to be
> uninterested in programming models that lack atomic operations for GPU
> compute programs.
> 
> Again, this doesn't rule out future, non-GPU accelerator devices that may
> come along.
> 
> Atomic ops are just a really important piece of high-end multi-threaded
> programming, it turns out. So this is the beginning of support for an
> important building block for general purpose programming on devices that
> have GPU-like memory models.
> 
> 
> thanks,




___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-05-24 Thread John Hubbard

On 5/24/21 3:11 PM, Andrew Morton wrote:

...

  Documentation/vm/hmm.rst |  17 
  include/linux/mmu_notifier.h |   6 ++
  include/linux/rmap.h |   4 +
  include/linux/swap.h |   7 +-
  include/linux/swapops.h  |  44 -
  mm/hmm.c |   5 +
  mm/memory.c  | 128 +++-
  mm/mprotect.c|   8 ++
  mm/page_vma_mapped.c |   9 +-
  mm/rmap.c| 186 +++
  10 files changed, 405 insertions(+), 9 deletions(-)



This is quite a lot of code added to core MM for a single driver.

Is there any expectation that other drivers will use this code?


Yes! This should work for GPUs (and potentially, other devices) that support
OpenCL SVM atomic accesses on the device. I haven't looked into how amdgpu
works in any detail, but that's certainly at the top of the list of likely
additional callers.



Is there a way of reducing the impact (code size, at least) for systems
which don't need this code?


I'll leave this question to others for the moment, in order to answer
the "do we need it at all" points.



How beneficial is this code to nouveau users?  I see that it permits a
part of OpenCL to be implemented, but how useful/important is this in
the real world?



So this is interesting. Right now, OpenCL support in Nouveau is rather new
and so probably not a huge impact yet. However, we've built up enough experience
with CUDA and OpenCL to learn that atomic operations, as part of the user
space programming model, are a super big deal. Atomic operations are so
useful and important that I'd expect many OpenCL SVM users to be uninterested in
programming models that lack atomic operations for GPU compute programs.

Again, this doesn't rule out future, non-GPU accelerator devices that may
come along.

Atomic ops are just a really important piece of high-end multi-threaded
programming, it turns out. So this is the beginning of support for an
important building block for general purpose programming on devices that
have GPU-like memory models.


thanks,
--
John Hubbard
NVIDIA
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 07/10] mm: Device exclusive memory access

2021-05-24 Thread Andrew Morton
On Mon, 24 May 2021 23:27:22 +1000 Alistair Popple  wrote:

> Some devices require exclusive write access to shared virtual
> memory (SVM) ranges to perform atomic operations on that memory. This
> requires CPU page tables to be updated to deny access whilst atomic
> operations are occurring.
> 
> In order to do this introduce a new swap entry
> type (SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for
> exclusive access by a device all page table mappings for the particular
> range are replaced with device exclusive swap entries. This causes any
> CPU access to the page to result in a fault.
> 
> Faults are resovled by replacing the faulting entry with the original
> mapping. This results in MMU notifiers being called which a driver uses
> to update access permissions such as revoking atomic access. After
> notifiers have been called the device will no longer have exclusive
> access to the region.
> 
> Walking of the page tables to find the target pages is handled by
> get_user_pages() rather than a direct page table walk. A direct page
> table walk similar to what migrate_vma_collect()/unmap() does could also
> have been utilised. However this resulted in more code similar in
> functionality to what get_user_pages() provides as page faulting is
> required to make the PTEs present and to break COW.
> 
> ...
>
>  Documentation/vm/hmm.rst |  17 
>  include/linux/mmu_notifier.h |   6 ++
>  include/linux/rmap.h |   4 +
>  include/linux/swap.h |   7 +-
>  include/linux/swapops.h  |  44 -
>  mm/hmm.c |   5 +
>  mm/memory.c  | 128 +++-
>  mm/mprotect.c|   8 ++
>  mm/page_vma_mapped.c |   9 +-
>  mm/rmap.c| 186 +++
>  10 files changed, 405 insertions(+), 9 deletions(-)
> 

This is quite a lot of code added to core MM for a single driver.

Is there any expectation that other drivers will use this code?

Is there a way of reducing the impact (code size, at least) for systems
which don't need this code?

How beneficial is this code to nouveau users?  I see that it permits a
part of OpenCL to be implemented, but how useful/important is this in
the real world?

Thanks.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau