Re: [PATCH v2 4/9] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
On Fri, Apr 16, 2021 at 4:47 PM Hugh Dickins wrote: > > On Mon, 12 Apr 2021, Axel Rasmussen wrote: > > > With this change, userspace can resolve a minor fault within a > > shmem-backed area with a UFFDIO_CONTINUE ioctl. The semantics for this > > match those for hugetlbfs - we look up the existing page in the page > > cache, and install PTEs for it. > > s/PTEs/a PTE/ > > > > > This commit introduces a new helper: mcopy_atomic_install_ptes. > > The plural is misleading: it only installs a single pte, so I'm going > to ask you to change it throughout to mcopy_atomic_install_pte() > (I'm not thrilled with the "mcopy" nor the "atomic", but there you are > being consistent with userfaultfd's peculiar naming, so let them be). > > > > > Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in > > shmem.c? The existing userfault implementation only relies on shmem.c > > for VM_SHARED VMAs. However, minor fault handling / CONTINUE work just > > fine for !VM_SHARED VMAs as well. We'd prefer to handle CONTINUE for > > shmem in one place, regardless of shared/private (to reduce code > > duplication). > > > > Why add a new mcopy_atomic_install_ptes helper? A problem we have with > > continue is that shmem_mcopy_atomic_pte() and mcopy_atomic_pte() are > > *close* to what we want, but not exactly. We do want to setup the PTEs > > in a CONTINUE operation, but we don't want to e.g. allocate a new page, > > charge it (e.g. to the shmem inode), manipulate various flags, etc. Also > > we have the problem stated above: shmem_mcopy_atomic_pte() and > > mcopy_atomic_pte() both handle one-half of the problem (shared / > > private) continue cares about. So, introduce mcontinue_atomic_pte(), to > > handle all of the shmem continue cases. Introduce the helper so it > > doesn't duplicate code with mcopy_atomic_pte(). > > > > In a future commit, shmem_mcopy_atomic_pte() will also be modified to > > use this new helper. However, since this is a bigger refactor, it seems > > most clear to do it as a separate change. > > (Actually that turns out to be a nice deletion of lines, > but you're absolutely right to do it as a separate patch.) > > > > > Signed-off-by: Axel Rasmussen > > --- > > mm/userfaultfd.c | 176 +++ > > 1 file changed, 131 insertions(+), 45 deletions(-) > > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > index 23fa2583bbd1..8df0438f5d6a 100644 > > --- a/mm/userfaultfd.c > > +++ b/mm/userfaultfd.c > > @@ -48,6 +48,87 @@ struct vm_area_struct *find_dst_vma(struct mm_struct > > *dst_mm, > > return dst_vma; > > } > > > > +/* > > + * Install PTEs, to map dst_addr (within dst_vma) to page. > > + * > > + * This function handles MCOPY_ATOMIC_CONTINUE (which is always > > file-backed), > > + * whether or not dst_vma is VM_SHARED. It also handles the more general > > + * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be > > file > > + * backed, or not). > > + * > > + * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by > > + * shmem_mcopy_atomic_pte instead. > > Right, I'm thinking in terms of five cases below (I'm not for a moment > saying that you need to list these out in the comment, just saying that > I could not get my head around the issues in this function without > listing them out for myself): > > 1. anon private mcopy (using anon page newly allocated) > 2. shmem private mcopy (using anon page newly allocated) > 3. shmem private mcontinue (using page in cache from shmem_getpage) > 4. shmem shared mcontinue (using page in cache from shmem_getpage) > 5. shmem shared mcopy (using page in cache newly allocated) > > Of which each has a VM_WRITE and a !VM_WRITE case; and the third and > fourth cases are new in this patch (it really would have been better > to introduce mcopy_atomic_install_pte() in a separate earlier patch, > but don't change that now we've got this far); and the fifth case does > *not* use mcopy_atomic_install_pte() in this patch, but will in future. > > And while making these notes, let's hightlight again what is commented > elsewhere, the odd nature of the second case: where userfaultfd short > circuits to an anonymous CoW page without instantiating the shmem page. > (Please double-check me on that: quite a lot of my comments below are > about this case 2, so if I've got it wrong, then I've got a lot wrong.) My understanding of case (2) is the same. In mfill_atomic_pte(), we call into mcopy_atomic_pte if !VM_SHARED, regardless of whether it's anon or shmem we're dealing with. That function allocates an anon page, and then mcopy_atomic_install_pte() will *not* mark it writable, so we get CoW semantics. > > > + */ > > +static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t > > *dst_pmd, > > mcopy_atomic_install_pte() throughout please. > > > + struct vm_area_struct *dst_vma, > > + unsigned long dst_addr, struct pa
Re: [PATCH v2 4/9] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
On Mon, 12 Apr 2021, Axel Rasmussen wrote: > With this change, userspace can resolve a minor fault within a > shmem-backed area with a UFFDIO_CONTINUE ioctl. The semantics for this > match those for hugetlbfs - we look up the existing page in the page > cache, and install PTEs for it. s/PTEs/a PTE/ > > This commit introduces a new helper: mcopy_atomic_install_ptes. The plural is misleading: it only installs a single pte, so I'm going to ask you to change it throughout to mcopy_atomic_install_pte() (I'm not thrilled with the "mcopy" nor the "atomic", but there you are being consistent with userfaultfd's peculiar naming, so let them be). > > Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in > shmem.c? The existing userfault implementation only relies on shmem.c > for VM_SHARED VMAs. However, minor fault handling / CONTINUE work just > fine for !VM_SHARED VMAs as well. We'd prefer to handle CONTINUE for > shmem in one place, regardless of shared/private (to reduce code > duplication). > > Why add a new mcopy_atomic_install_ptes helper? A problem we have with > continue is that shmem_mcopy_atomic_pte() and mcopy_atomic_pte() are > *close* to what we want, but not exactly. We do want to setup the PTEs > in a CONTINUE operation, but we don't want to e.g. allocate a new page, > charge it (e.g. to the shmem inode), manipulate various flags, etc. Also > we have the problem stated above: shmem_mcopy_atomic_pte() and > mcopy_atomic_pte() both handle one-half of the problem (shared / > private) continue cares about. So, introduce mcontinue_atomic_pte(), to > handle all of the shmem continue cases. Introduce the helper so it > doesn't duplicate code with mcopy_atomic_pte(). > > In a future commit, shmem_mcopy_atomic_pte() will also be modified to > use this new helper. However, since this is a bigger refactor, it seems > most clear to do it as a separate change. (Actually that turns out to be a nice deletion of lines, but you're absolutely right to do it as a separate patch.) > > Signed-off-by: Axel Rasmussen > --- > mm/userfaultfd.c | 176 +++ > 1 file changed, 131 insertions(+), 45 deletions(-) > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index 23fa2583bbd1..8df0438f5d6a 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -48,6 +48,87 @@ struct vm_area_struct *find_dst_vma(struct mm_struct > *dst_mm, > return dst_vma; > } > > +/* > + * Install PTEs, to map dst_addr (within dst_vma) to page. > + * > + * This function handles MCOPY_ATOMIC_CONTINUE (which is always file-backed), > + * whether or not dst_vma is VM_SHARED. It also handles the more general > + * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be file > + * backed, or not). > + * > + * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by > + * shmem_mcopy_atomic_pte instead. Right, I'm thinking in terms of five cases below (I'm not for a moment saying that you need to list these out in the comment, just saying that I could not get my head around the issues in this function without listing them out for myself): 1. anon private mcopy (using anon page newly allocated) 2. shmem private mcopy (using anon page newly allocated) 3. shmem private mcontinue (using page in cache from shmem_getpage) 4. shmem shared mcontinue (using page in cache from shmem_getpage) 5. shmem shared mcopy (using page in cache newly allocated) Of which each has a VM_WRITE and a !VM_WRITE case; and the third and fourth cases are new in this patch (it really would have been better to introduce mcopy_atomic_install_pte() in a separate earlier patch, but don't change that now we've got this far); and the fifth case does *not* use mcopy_atomic_install_pte() in this patch, but will in future. And while making these notes, let's hightlight again what is commented elsewhere, the odd nature of the second case: where userfaultfd short circuits to an anonymous CoW page without instantiating the shmem page. (Please double-check me on that: quite a lot of my comments below are about this case 2, so if I've got it wrong, then I've got a lot wrong.) > + */ > +static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t > *dst_pmd, mcopy_atomic_install_pte() throughout please. > + struct vm_area_struct *dst_vma, > + unsigned long dst_addr, struct page *page, > + bool newly_allocated, bool wp_copy) > +{ > + int ret; > + pte_t _dst_pte, *dst_pte; > + int writable; Sorry, it's silly of me, but I keep getting irritated by "int writable" in company with the various bools; and the way vm_shared is initialized below, but writable initialized later. Please humour me by making it bool writable = dst_vma->vm_flags & VM_WRITE; > + bool vm_shared = dst_vma->vm_flags & VM_SHARED; And I've found that we also need bool page_in_ca