Re: [PATCH 01/26] mm: Do page fault accounting in handle_mm_fault

2020-06-29 Thread Peter Xu
On Sun, Jun 28, 2020 at 06:52:24PM -0700, John Hubbard wrote:
> The above file is renamed, as of a couple weeks ago, via
> commit ad8694bac410 ("iommu/amd: Move AMD IOMMU driver into
> subdirectory").
> 
> Also there are a number of changes to mm/gup.c (not a concern for this
> patch, but it is for the overall series). So I'm hoping you're going to
> post a version that is rebased against 5.8-rc*.

Thanks for the heads up.  It turns out that there're even more conflicts than
the file movements.  I'll rebase to linux-next/akpm and resend.  The versioning
of the series seems to always not working right...  I'll try to fix that too...

-- 
Peter Xu



Re: [PATCH 01/26] mm: Do page fault accounting in handle_mm_fault

2020-06-28 Thread John Hubbard

On 2020-06-26 15:31, Peter Xu wrote:

This is a preparation patch to move page fault accountings into the general
code in handle_mm_fault().  This includes both the per task flt_maj/flt_min
counters, and the major/minor page fault perf events.  To do this, the pt_regs
pointer is passed into handle_mm_fault().

PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault handlers.

So far, all the pt_regs pointer that passed into handle_mm_fault() is NULL,
which means this patch should have no intented functional change.

Suggested-by: Linus Torvalds 
Signed-off-by: Peter Xu 
---
  arch/alpha/mm/fault.c |  2 +-
  arch/arc/mm/fault.c   |  2 +-
  arch/arm/mm/fault.c   |  2 +-
  arch/arm64/mm/fault.c |  2 +-
  arch/csky/mm/fault.c  |  3 +-
  arch/hexagon/mm/vm_fault.c|  2 +-
  arch/ia64/mm/fault.c  |  2 +-
  arch/m68k/mm/fault.c  |  2 +-
  arch/microblaze/mm/fault.c|  2 +-
  arch/mips/mm/fault.c  |  2 +-
  arch/nds32/mm/fault.c |  2 +-
  arch/nios2/mm/fault.c |  2 +-
  arch/openrisc/mm/fault.c  |  2 +-
  arch/parisc/mm/fault.c|  2 +-
  arch/powerpc/mm/copro_fault.c |  2 +-
  arch/powerpc/mm/fault.c   |  2 +-
  arch/riscv/mm/fault.c |  2 +-
  arch/s390/mm/fault.c  |  2 +-
  arch/sh/mm/fault.c|  2 +-
  arch/sparc/mm/fault_32.c  |  4 +--
  arch/sparc/mm/fault_64.c  |  2 +-
  arch/um/kernel/trap.c |  2 +-
  arch/unicore32/mm/fault.c |  2 +-
  arch/x86/mm/fault.c   |  2 +-
  arch/xtensa/mm/fault.c|  2 +-
  drivers/iommu/amd_iommu_v2.c  |  2 +-


The above file is renamed, as of a couple weeks ago, via
commit ad8694bac410 ("iommu/amd: Move AMD IOMMU driver into
subdirectory").

Also there are a number of changes to mm/gup.c (not a concern for this
patch, but it is for the overall series). So I'm hoping you're going to
post a version that is rebased against 5.8-rc*.

thanks,
--
John Hubbard
NVIDIA


Re: [PATCH 01/26] mm: Do page fault accounting in handle_mm_fault

2020-06-26 Thread Peter Xu
On Fri, Jun 26, 2020 at 05:53:46PM -0400, Peter Xu wrote:
> Since this patch seems to be the only one that needs a new post so far, I'll
> repost this patch only by replying to itself with v2.1.  Hopefully that can
> avoid some unecessary mail bombs.

Unluckily patch 25 will need a trivial touch-up on the comment...  I'll just
resend the whole series for simplicity.  Thanks,

-- 
Peter Xu



Re: [PATCH 01/26] mm: Do page fault accounting in handle_mm_fault

2020-06-26 Thread Peter Xu
On Fri, Jun 26, 2020 at 09:54:24PM +0200, Gerald Schaefer wrote:
> On Wed, 24 Jun 2020 16:34:12 -0400
> Peter Xu  wrote:
> 
> > On Wed, Jun 24, 2020 at 08:49:03PM +0200, Gerald Schaefer wrote:
> > > On Fri, 19 Jun 2020 12:05:13 -0400
> > > Peter Xu  wrote:
> > > 
> > > [...]
> > > 
> > > > @@ -4393,6 +4425,38 @@ vm_fault_t handle_mm_fault(struct vm_area_struct 
> > > > *vma, unsigned long address,
> > > > mem_cgroup_oom_synchronize(false);
> > > > }
> > > > 
> > > > +   if (ret & VM_FAULT_RETRY)
> > > > +   return ret;
> > > 
> > > I'm wondering if this also needs a check and exit for VM_FAULT_ERROR.
> > > In arch code (s390 and all others I briefly checked), the accounting
> > > was skipped for VM_FAULT_ERROR case.
> > 
> > Yes. I didn't explicitly add the check because I thought it's still OK to 
> > count
> > the error cases, especially after we've discussed about
> > PERF_COUNT_SW_PAGE_FAULTS in v1.  So far, the major reason (iiuc) to have
> > PERF_COUNT_SW_PAGE_FAULTS still in per-arch handlers is to also cover these
> > corner cases like VM_FAULT_ERROR.  So to me it makes sense too to also count
> > them in here.  But I agree it changes the old counting on most archs.
> 
> Having PERF_COUNT_SW_PAGE_FAULTS count everything including VM_FAULT_ERROR
> is OK. Just major/minor accounting should be only about successes, IIRC from
> v1 discussion.
> 
> The "new rules" also say
> 
> +  *  - faults that never even got here (because the address
> +  *wasn't valid). That includes arch_vma_access_permitted()
> +  *failing above.
> 
> VM_FAULT_ERROR, and also the arch-specific VM_FAULT_BADxxx, qualify
> as "address wasn't valid" I think, so they should not be counted as
> major/minor.
> 
> IIRC from v1, and we want to only count success as major/minor, maybe
> the rule could also be made more clear about that, e.g. like
> 
> +  *  - unsuccessful faults (because the address wasn't valid)
> +  *do not count. That includes arch_vma_access_permitted()
> +  *failing above.

Sure.

> 
> > 
> > Again, I don't have strong opinion either on this, just like the same to
> > PERF_COUNT_SW_PAGE_FAULTS...  But if no one disagree, I will change this to:
> > 
> >   if (ret & (VM_FAULT_RETRY | VM_FAULT_ERROR))
> >   return ret;
> > 
> > So we try our best to follow the past.
> 
> Sounds good to me, and VM_FAULT_BADxxx should never show up here.
> 
> > 
> > Btw, note that there will still be some even more special corner cases. 
> > E.g.,
> > for ARM64 it's also not accounted for some ARM64 specific fault errors
> > (VM_FAULT_BADMAP, VM_FAULT_BADACCESS).  So even if we don't count
> > VM_FAULT_ERROR, we might still count these for ARM64.  We can try to 
> > redefine
> > VM_FAULT_ERROR in ARM64 to cover all the arch-specific errors, however that
> > seems an overkill to me sololy for fault accountings, so hopefully I can 
> > ignore
> > that difference.
> 
> Hmm, arm64 already does not count the VM_FAULT_BADxxx, but also does not
> call handle_mm_fault() for those, so no change with this patch. arm (and
> also unicore32) do count those, but also not call handle_mm_fault(), so
> there would be the change that they lose accounting, IIUC.

Oh you are right...  I just noticed that VM_FAULT_BADMAP and VM_FAULT_BADACCESS
can never returned in handle_mm_fault() itself.

> 
> I agree that this probably can be ignored. The code in arm64 also looks
> more recent, so it's probably just a left-over in arm/unicore32 code.

Anyway, glad to know that we've reached consensus so that we can accept these
differences.

Since this patch seems to be the only one that needs a new post so far, I'll
repost this patch only by replying to itself with v2.1.  Hopefully that can
avoid some unecessary mail bombs.

Thanks for the very detailed review!

-- 
Peter Xu



Re: [PATCH 01/26] mm: Do page fault accounting in handle_mm_fault

2020-06-26 Thread Gerald Schaefer
On Wed, 24 Jun 2020 16:34:12 -0400
Peter Xu  wrote:

> On Wed, Jun 24, 2020 at 08:49:03PM +0200, Gerald Schaefer wrote:
> > On Fri, 19 Jun 2020 12:05:13 -0400
> > Peter Xu  wrote:
> > 
> > [...]
> > 
> > > @@ -4393,6 +4425,38 @@ vm_fault_t handle_mm_fault(struct vm_area_struct 
> > > *vma, unsigned long address,
> > >   mem_cgroup_oom_synchronize(false);
> > >   }
> > > 
> > > + if (ret & VM_FAULT_RETRY)
> > > + return ret;
> > 
> > I'm wondering if this also needs a check and exit for VM_FAULT_ERROR.
> > In arch code (s390 and all others I briefly checked), the accounting
> > was skipped for VM_FAULT_ERROR case.
> 
> Yes. I didn't explicitly add the check because I thought it's still OK to 
> count
> the error cases, especially after we've discussed about
> PERF_COUNT_SW_PAGE_FAULTS in v1.  So far, the major reason (iiuc) to have
> PERF_COUNT_SW_PAGE_FAULTS still in per-arch handlers is to also cover these
> corner cases like VM_FAULT_ERROR.  So to me it makes sense too to also count
> them in here.  But I agree it changes the old counting on most archs.

Having PERF_COUNT_SW_PAGE_FAULTS count everything including VM_FAULT_ERROR
is OK. Just major/minor accounting should be only about successes, IIRC from
v1 discussion.

The "new rules" also say

+*  - faults that never even got here (because the address
+*wasn't valid). That includes arch_vma_access_permitted()
+*failing above.

VM_FAULT_ERROR, and also the arch-specific VM_FAULT_BADxxx, qualify
as "address wasn't valid" I think, so they should not be counted as
major/minor.

IIRC from v1, and we want to only count success as major/minor, maybe
the rule could also be made more clear about that, e.g. like

+*  - unsuccessful faults (because the address wasn't valid)
+*do not count. That includes arch_vma_access_permitted()
+*failing above.

> 
> Again, I don't have strong opinion either on this, just like the same to
> PERF_COUNT_SW_PAGE_FAULTS...  But if no one disagree, I will change this to:
> 
>   if (ret & (VM_FAULT_RETRY | VM_FAULT_ERROR))
>   return ret;
> 
> So we try our best to follow the past.

Sounds good to me, and VM_FAULT_BADxxx should never show up here.

> 
> Btw, note that there will still be some even more special corner cases. E.g.,
> for ARM64 it's also not accounted for some ARM64 specific fault errors
> (VM_FAULT_BADMAP, VM_FAULT_BADACCESS).  So even if we don't count
> VM_FAULT_ERROR, we might still count these for ARM64.  We can try to redefine
> VM_FAULT_ERROR in ARM64 to cover all the arch-specific errors, however that
> seems an overkill to me sololy for fault accountings, so hopefully I can 
> ignore
> that difference.

Hmm, arm64 already does not count the VM_FAULT_BADxxx, but also does not
call handle_mm_fault() for those, so no change with this patch. arm (and
also unicore32) do count those, but also not call handle_mm_fault(), so
there would be the change that they lose accounting, IIUC.

I agree that this probably can be ignored. The code in arm64 also looks
more recent, so it's probably just a left-over in arm/unicore32 code.

Regards,
Gerald


Re: [PATCH 01/26] mm: Do page fault accounting in handle_mm_fault

2020-06-24 Thread Peter Xu
On Wed, Jun 24, 2020 at 08:49:03PM +0200, Gerald Schaefer wrote:
> On Fri, 19 Jun 2020 12:05:13 -0400
> Peter Xu  wrote:
> 
> [...]
> 
> > @@ -4393,6 +4425,38 @@ vm_fault_t handle_mm_fault(struct vm_area_struct 
> > *vma, unsigned long address,
> > mem_cgroup_oom_synchronize(false);
> > }
> > 
> > +   if (ret & VM_FAULT_RETRY)
> > +   return ret;
> 
> I'm wondering if this also needs a check and exit for VM_FAULT_ERROR.
> In arch code (s390 and all others I briefly checked), the accounting
> was skipped for VM_FAULT_ERROR case.

Yes. I didn't explicitly add the check because I thought it's still OK to count
the error cases, especially after we've discussed about
PERF_COUNT_SW_PAGE_FAULTS in v1.  So far, the major reason (iiuc) to have
PERF_COUNT_SW_PAGE_FAULTS still in per-arch handlers is to also cover these
corner cases like VM_FAULT_ERROR.  So to me it makes sense too to also count
them in here.  But I agree it changes the old counting on most archs.

Again, I don't have strong opinion either on this, just like the same to
PERF_COUNT_SW_PAGE_FAULTS...  But if no one disagree, I will change this to:

  if (ret & (VM_FAULT_RETRY | VM_FAULT_ERROR))
  return ret;

So we try our best to follow the past.

Btw, note that there will still be some even more special corner cases. E.g.,
for ARM64 it's also not accounted for some ARM64 specific fault errors
(VM_FAULT_BADMAP, VM_FAULT_BADACCESS).  So even if we don't count
VM_FAULT_ERROR, we might still count these for ARM64.  We can try to redefine
VM_FAULT_ERROR in ARM64 to cover all the arch-specific errors, however that
seems an overkill to me sololy for fault accountings, so hopefully I can ignore
that difference.

> 
> > +
> > +   /*
> > +* Do accounting in the common code, to avoid unnecessary
> > +* architecture differences or duplicated code.
> > +*
> > +* We arbitrarily make the rules be:
> > +*
> > +*  - faults that never even got here (because the address
> > +*wasn't valid). That includes arch_vma_access_permitted()
> 
> Missing "do not count" at the end of the first sentence?
> 
> > +*failing above.
> > +*
> > +*So this is expressly not a "this many hardware page
> > +*faults" counter. Use the hw profiling for that.
> > +*
> > +*  - incomplete faults (ie RETRY) do not count (see above).
> > +*They will only count once completed.
> > +*
> > +*  - the fault counts as a "major" fault when the final
> > +*successful fault is VM_FAULT_MAJOR, or if it was a
> > +*retry (which implies that we couldn't handle it
> > +*immediately previously).
> > +*
> > +*  - if the fault is done for GUP, regs wil be NULL and
> 
> wil -> will

Will fix both places.  Thanks,

-- 
Peter Xu



Re: [PATCH 01/26] mm: Do page fault accounting in handle_mm_fault

2020-06-24 Thread Gerald Schaefer
On Fri, 19 Jun 2020 12:05:13 -0400
Peter Xu  wrote:

[...]

> @@ -4393,6 +4425,38 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, 
> unsigned long address,
>   mem_cgroup_oom_synchronize(false);
>   }
> 
> + if (ret & VM_FAULT_RETRY)
> + return ret;

I'm wondering if this also needs a check and exit for VM_FAULT_ERROR.
In arch code (s390 and all others I briefly checked), the accounting
was skipped for VM_FAULT_ERROR case.

> +
> + /*
> +  * Do accounting in the common code, to avoid unnecessary
> +  * architecture differences or duplicated code.
> +  *
> +  * We arbitrarily make the rules be:
> +  *
> +  *  - faults that never even got here (because the address
> +  *wasn't valid). That includes arch_vma_access_permitted()

Missing "do not count" at the end of the first sentence?

> +  *failing above.
> +  *
> +  *So this is expressly not a "this many hardware page
> +  *faults" counter. Use the hw profiling for that.
> +  *
> +  *  - incomplete faults (ie RETRY) do not count (see above).
> +  *They will only count once completed.
> +  *
> +  *  - the fault counts as a "major" fault when the final
> +  *successful fault is VM_FAULT_MAJOR, or if it was a
> +  *retry (which implies that we couldn't handle it
> +  *immediately previously).
> +  *
> +  *  - if the fault is done for GUP, regs wil be NULL and

wil -> will

Regards,
Gerald