Re: [PATCH RFC 05/19] mm: add early FAULT_FLAG_WRITE consistency checks
On Nov 7, 2022, at 11:27 AM, David Hildenbrand wrote: > !! External Email > > On 07.11.22 20:03, Nadav Amit wrote: >> On Nov 7, 2022, at 8:17 AM, David Hildenbrand wrote: >> >>> !! External Email >>> >>> Let's catch abuse of FAULT_FLAG_WRITE early, such that we don't have to >>> care in all other handlers and might get "surprises" if we forget to do >>> so. >>> >>> Write faults without VM_MAYWRITE don't make any sense, and our >>> maybe_mkwrite() logic could have hidden such abuse for now. >>> >>> Write faults without VM_WRITE on something that is not a COW mapping is >>> similarly broken, and e.g., do_wp_page() could end up placing an >>> anonymous page into a shared mapping, which would be bad. >>> >>> This is a preparation for reliable R/O long-term pinning of pages in >>> private mappings, whereby we want to make sure that we will never break >>> COW in a read-only private mapping. >>> >>> Signed-off-by: David Hildenbrand >>> --- >>> mm/memory.c | 8 >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/mm/memory.c b/mm/memory.c >>> index fe131273217a..826353da7b23 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -5159,6 +5159,14 @@ static vm_fault_t sanitize_fault_flags(struct >>> vm_area_struct *vma, >>> */ >>>if (!is_cow_mapping(vma->vm_flags)) >>>*flags &= ~FAULT_FLAG_UNSHARE; >>> + } else if (*flags & FAULT_FLAG_WRITE) { >>> + /* Write faults on read-only mappings are impossible ... */ >>> + if (WARN_ON_ONCE(!(vma->vm_flags & VM_MAYWRITE))) >>> + return VM_FAULT_SIGSEGV; >>> + /* ... and FOLL_FORCE only applies to COW mappings. */ >>> + if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE) && >>> +!is_cow_mapping(vma->vm_flags))) >>> + return VM_FAULT_SIGSEGV; >> >> Not sure about the WARN_*(). Seems as if it might trigger in benign even if >> rare scenarios, e.g., mprotect() racing with page-fault. > > We most certainly would want to catch any such broken/racy cases. There > are no benign cases I could possibly think of. > > Page faults need the mmap lock in read. mprotect() / VMA changes need > the mmap lock in write. Whoever calls handle_mm_fault() is supposed to > properly check VMA permissions. My bad. I now see it. Thanks for explaining.
Re: [PATCH RFC 05/19] mm: add early FAULT_FLAG_WRITE consistency checks
On Nov 7, 2022, at 8:17 AM, David Hildenbrand wrote: > !! External Email > > Let's catch abuse of FAULT_FLAG_WRITE early, such that we don't have to > care in all other handlers and might get "surprises" if we forget to do > so. > > Write faults without VM_MAYWRITE don't make any sense, and our > maybe_mkwrite() logic could have hidden such abuse for now. > > Write faults without VM_WRITE on something that is not a COW mapping is > similarly broken, and e.g., do_wp_page() could end up placing an > anonymous page into a shared mapping, which would be bad. > > This is a preparation for reliable R/O long-term pinning of pages in > private mappings, whereby we want to make sure that we will never break > COW in a read-only private mapping. > > Signed-off-by: David Hildenbrand > --- > mm/memory.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/mm/memory.c b/mm/memory.c > index fe131273217a..826353da7b23 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5159,6 +5159,14 @@ static vm_fault_t sanitize_fault_flags(struct > vm_area_struct *vma, > */ >if (!is_cow_mapping(vma->vm_flags)) >*flags &= ~FAULT_FLAG_UNSHARE; > + } else if (*flags & FAULT_FLAG_WRITE) { > + /* Write faults on read-only mappings are impossible ... */ > + if (WARN_ON_ONCE(!(vma->vm_flags & VM_MAYWRITE))) > + return VM_FAULT_SIGSEGV; > + /* ... and FOLL_FORCE only applies to COW mappings. */ > + if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE) && > +!is_cow_mapping(vma->vm_flags))) > + return VM_FAULT_SIGSEGV; Not sure about the WARN_*(). Seems as if it might trigger in benign even if rare scenarios, e.g., mprotect() racing with page-fault.
Re: [PATCH 2/2] MAINTAINERS: Mark VMware mailing list entries as private
> On Nov 10, 2021, at 9:20 AM, Srivatsa S. Bhat wrote: > > On Tue, Nov 09, 2021 at 01:57:31PM -0800, Joe Perches wrote: >> On Tue, 2021-11-09 at 00:58 +0000, Nadav Amit wrote: >>>> On Nov 8, 2021, at 4:37 PM, Joe Perches wrote: >>>> On Mon, 2021-11-08 at 16:22 -0800, Srivatsa S. Bhat wrote: >>>> >>>> So it's an exploder not an actual maintainer and it likely isn't >>>> publically archived with any normal list mechanism. >>>> >>>> So IMO "private" isn't appropriate. Neither is "L:" >>>> Perhaps just mark it as what it is as an "exploder". >>>> >>>> Or maybe these blocks should be similar to: >>>> >>>> M: Name of Lead Developer >>>> M: VMware maintainers -maintain...@vmlinux.com> >> >> Maybe adding entries like >> >> M: Named maintainer >> R: VMware reviewers -maintain...@vmware.com> >> >> would be best/simplest. >> > > Sure, that sounds good to me. I also considered adding "(email alias)" > like Juergen suggested, but I think the R: entry is clear enough. > Please find the updated patch below. > > --- > > From f66faa238facf504cfc66325912ce7af8cbf79ec Mon Sep 17 00:00:00 2001 > From: "Srivatsa S. Bhat (VMware)" > Date: Mon, 8 Nov 2021 11:46:57 -0800 > Subject: [PATCH v2 2/2] MAINTAINERS: Mark VMware mailing list entries as email > aliases > > VMware mailing lists in the MAINTAINERS file are private lists meant > for VMware-internal review/notification for patches to the respective > subsystems. Anyone can post to these addresses, but there is no public > read access like open mailing lists, which makes them more like email > aliases instead (to reach out to reviewers). > > So update all the VMware mailing list references in the MAINTAINERS > file to mark them as such, using "R: email-al...@vmware.com". > > Signed-off-by: Srivatsa S. Bhat (VMware) > Cc: Zack Rusin > Cc: Nadav Amit > Cc: Vivek Thampi > Cc: Vishal Bhakta > Cc: Ronak Doshi > Cc: pv-driv...@vmware.com > Cc: linux-graphics-maintai...@vmware.com > Cc: dri-devel@lists.freedesktop.org > Cc: linux-r...@vger.kernel.org > Cc: linux-s...@vger.kernel.org > Cc: net...@vger.kernel.org > Cc: linux-in...@vger.kernel.org > --- > MAINTAINERS | 22 +++--- > 1 file changed, 11 insertions(+), 11 deletions(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 118cf8170d02..4372d79027e9 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -6134,8 +6134,8 @@ T: git git://anongit.freedesktop.org/drm/drm-misc > F:drivers/gpu/drm/vboxvideo/ > > DRM DRIVER FOR VMWARE VIRTUAL GPU > -M: "VMware Graphics" > M:Zack Rusin > +R: VMware Graphics Reviewers > L:dri-devel@lists.freedesktop.org > S:Supported > T:git git://anongit.freedesktop.org/drm/drm-misc > @@ -14189,7 +14189,7 @@ F:include/uapi/linux/ppdev.h > PARAVIRT_OPS INTERFACE > M:Juergen Gross > M:Srivatsa S. Bhat (VMware) > -L: pv-driv...@vmware.com (private) > +R: VMware PV-Drivers Reviewers This patch that you just sent seems to go on top of the previous patches (as it removes "L: pv-driv...@vmware.com (private)”). Since the patches were still not merged, I would presume you should squash the old 2/2 with this new patch and send v3 of these patches.
Re: [PATCH 2/2] MAINTAINERS: Mark VMware mailing list entries as private
> On Nov 8, 2021, at 4:37 PM, Joe Perches wrote: > > On Mon, 2021-11-08 at 16:22 -0800, Srivatsa S. Bhat wrote: > > So it's an exploder not an actual maintainer and it likely isn't > publically archived with any normal list mechanism. > > So IMO "private" isn't appropriate. Neither is "L:" > Perhaps just mark it as what it is as an "exploder". > > Or maybe these blocks should be similar to: > > M:Name of Lead Developer > M:VMware maintainers -maintain...@vmlinux.com> > > Maybe something like a comment mechanism should be added to the > MAINTAINERS file. > > Maybe # > > so this entry could be something like: > > M:VMware maintainers -maintain...@vmlinux.com> # > VMware's ever changing internal maintainers list Admittedly, I do not care much about how it turns to be. But if it is modified, it should be very clear who the maintainer is, and not to entangle the mailing list and the maintainer. I am personally not subscribed to the internal pv-drivers mailing list, which is not just for memory ballooning, and is also listed as a maintainer for vmmouse, pvscsi, vmxnet3 and others. As I am the only maintainer of VMware balloon, if someone is mistaken and sends an email only to the mailing list and not me, he might be disappointed.
Re: [PATCH 2/2] MAINTAINERS: Mark VMware mailing list entries as private
> On Nov 8, 2021, at 12:30 PM, Srivatsa S. Bhat wrote: > > From: Srivatsa S. Bhat (VMware) > > VMware mailing lists in the MAINTAINERS file are private lists meant > for VMware-internal review/notification for patches to the respective > subsystems. So, in an earlier discussion [1][2], it was recommended to > mark them as such. Update all the remaining VMware mailing list > references to use that format -- "L: list@address (private)”. Acked-by: Nadav Amit
Re: [PATCH v4 3/9] mm: Add write-protect and clean utilities for address space ranges
> On Jun 11, 2019, at 5:24 AM, Thomas Hellström (VMware) > wrote: > > From: Thomas Hellstrom > [ snip ] > +/** > + * apply_pt_wrprotect - Leaf pte callback to write-protect a pte > + * @pte: Pointer to the pte > + * @token: Page table token, see apply_to_pfn_range() > + * @addr: The virtual page address > + * @closure: Pointer to a struct pfn_range_apply embedded in a > + * struct apply_as > + * > + * The function write-protects a pte and records the range in > + * virtual address space of touched ptes for efficient range TLB flushes. > + * > + * Return: Always zero. > + */ > +static int apply_pt_wrprotect(pte_t *pte, pgtable_t token, > + unsigned long addr, > + struct pfn_range_apply *closure) > +{ > + struct apply_as *aas = container_of(closure, typeof(*aas), base); > + pte_t ptent = *pte; > + > + if (pte_write(ptent)) { > + pte_t old_pte = ptep_modify_prot_start(aas->vma, addr, pte); > + > + ptent = pte_wrprotect(old_pte); > + ptep_modify_prot_commit(aas->vma, addr, pte, old_pte, ptent); > + aas->total++; > + aas->start = min(aas->start, addr); > + aas->end = max(aas->end, addr + PAGE_SIZE); > + } > + > + return 0; > +} > + > +/** > + * struct apply_as_clean - Closure structure for apply_as_clean > + * @base: struct apply_as we derive from > + * @bitmap_pgoff: Address_space Page offset of the first bit in @bitmap > + * @bitmap: Bitmap with one bit for each page offset in the address_space > range > + * covered. > + * @start: Address_space page offset of first modified pte relative > + * to @bitmap_pgoff > + * @end: Address_space page offset of last modified pte relative > + * to @bitmap_pgoff > + */ > +struct apply_as_clean { > + struct apply_as base; > + pgoff_t bitmap_pgoff; > + unsigned long *bitmap; > + pgoff_t start; > + pgoff_t end; > +}; > + > +/** > + * apply_pt_clean - Leaf pte callback to clean a pte > + * @pte: Pointer to the pte > + * @token: Page table token, see apply_to_pfn_range() > + * @addr: The virtual page address > + * @closure: Pointer to a struct pfn_range_apply embedded in a > + * struct apply_as_clean > + * > + * The function cleans a pte and records the range in > + * virtual address space of touched ptes for efficient TLB flushes. > + * It also records dirty ptes in a bitmap representing page offsets > + * in the address_space, as well as the first and last of the bits > + * touched. > + * > + * Return: Always zero. > + */ > +static int apply_pt_clean(pte_t *pte, pgtable_t token, > + unsigned long addr, > + struct pfn_range_apply *closure) > +{ > + struct apply_as *aas = container_of(closure, typeof(*aas), base); > + struct apply_as_clean *clean = container_of(aas, typeof(*clean), base); > + pte_t ptent = *pte; > + > + if (pte_dirty(ptent)) { > + pgoff_t pgoff = ((addr - aas->vma->vm_start) >> PAGE_SHIFT) + > + aas->vma->vm_pgoff - clean->bitmap_pgoff; > + pte_t old_pte = ptep_modify_prot_start(aas->vma, addr, pte); > + > + ptent = pte_mkclean(old_pte); > + ptep_modify_prot_commit(aas->vma, addr, pte, old_pte, ptent); > + > + aas->total++; > + aas->start = min(aas->start, addr); > + aas->end = max(aas->end, addr + PAGE_SIZE); > + > + __set_bit(pgoff, clean->bitmap); > + clean->start = min(clean->start, pgoff); > + clean->end = max(clean->end, pgoff + 1); > + } > + > + return 0; Usually, when a PTE is write-protected, or when a dirty-bit is cleared, the TLB flush must be done while the page-table lock for that specific table is taken (i.e., within apply_pt_clean() and apply_pt_wrprotect() in this case). Otherwise, in the case of apply_pt_clean() for example, another core might shortly after (before the TLB flush) write to the same page whose PTE was changed. The dirty-bit in such case might not be set, and the change get lost. Does this function regards a certain use-case in which deferring the TLB flushes is fine? If so, assertions and documentation of the related assumption would be useful. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 3/9] mm: Add write-protect and clean utilities for address space ranges
> On Jun 11, 2019, at 2:20 PM, Thomas Hellström (VMware) > wrote: > > On 6/11/19 9:10 PM, Nadav Amit wrote: >>> On Jun 11, 2019, at 11:26 AM, Thomas Hellström (VMware) >>> wrote: >>> >>> Hi, Nadav, >>> >>> On 6/11/19 7:21 PM, Nadav Amit wrote: >>>>> On Jun 11, 2019, at 5:24 AM, Thomas Hellström (VMware) >>>>> wrote: >>>>> >>>>> From: Thomas Hellstrom >>>> [ snip ] >>>> >>>>> +/** >>>>> + * apply_pt_wrprotect - Leaf pte callback to write-protect a pte >>>>> + * @pte: Pointer to the pte >>>>> + * @token: Page table token, see apply_to_pfn_range() >>>>> + * @addr: The virtual page address >>>>> + * @closure: Pointer to a struct pfn_range_apply embedded in a >>>>> + * struct apply_as >>>>> + * >>>>> + * The function write-protects a pte and records the range in >>>>> + * virtual address space of touched ptes for efficient range TLB flushes. >>>>> + * >>>>> + * Return: Always zero. >>>>> + */ >>>>> +static int apply_pt_wrprotect(pte_t *pte, pgtable_t token, >>>>> + unsigned long addr, >>>>> + struct pfn_range_apply *closure) >>>>> +{ >>>>> + struct apply_as *aas = container_of(closure, typeof(*aas), base); >>>>> + pte_t ptent = *pte; >>>>> + >>>>> + if (pte_write(ptent)) { >>>>> + pte_t old_pte = ptep_modify_prot_start(aas->vma, addr, pte); >>>>> + >>>>> + ptent = pte_wrprotect(old_pte); >>>>> + ptep_modify_prot_commit(aas->vma, addr, pte, old_pte, ptent); >>>>> + aas->total++; >>>>> + aas->start = min(aas->start, addr); >>>>> + aas->end = max(aas->end, addr + PAGE_SIZE); >>>>> + } >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +/** >>>>> + * struct apply_as_clean - Closure structure for apply_as_clean >>>>> + * @base: struct apply_as we derive from >>>>> + * @bitmap_pgoff: Address_space Page offset of the first bit in @bitmap >>>>> + * @bitmap: Bitmap with one bit for each page offset in the >>>>> address_space range >>>>> + * covered. >>>>> + * @start: Address_space page offset of first modified pte relative >>>>> + * to @bitmap_pgoff >>>>> + * @end: Address_space page offset of last modified pte relative >>>>> + * to @bitmap_pgoff >>>>> + */ >>>>> +struct apply_as_clean { >>>>> + struct apply_as base; >>>>> + pgoff_t bitmap_pgoff; >>>>> + unsigned long *bitmap; >>>>> + pgoff_t start; >>>>> + pgoff_t end; >>>>> +}; >>>>> + >>>>> +/** >>>>> + * apply_pt_clean - Leaf pte callback to clean a pte >>>>> + * @pte: Pointer to the pte >>>>> + * @token: Page table token, see apply_to_pfn_range() >>>>> + * @addr: The virtual page address >>>>> + * @closure: Pointer to a struct pfn_range_apply embedded in a >>>>> + * struct apply_as_clean >>>>> + * >>>>> + * The function cleans a pte and records the range in >>>>> + * virtual address space of touched ptes for efficient TLB flushes. >>>>> + * It also records dirty ptes in a bitmap representing page offsets >>>>> + * in the address_space, as well as the first and last of the bits >>>>> + * touched. >>>>> + * >>>>> + * Return: Always zero. >>>>> + */ >>>>> +static int apply_pt_clean(pte_t *pte, pgtable_t token, >>>>> + unsigned long addr, >>>>> + struct pfn_range_apply *closure) >>>>> +{ >>>>> + struct apply_as *aas = container_of(closure, typeof(*aas), base); >>>>> + struct apply_as_clean *clean = container_of(aas, typeof(*clean), base); >>>>> + pte_t ptent = *pte; >>>>> + >>>>> + if (pte_dirty(ptent)) { >>>>> + pgoff_t pgoff = ((addr - aas->vma->vm_start) >> PAGE_SHIFT) + >>>>> + aas->vma->vm_pgoff - clean->bitmap_pgoff; >>>>&
Re: [PATCH v4 3/9] mm: Add write-protect and clean utilities for address space ranges
> On Jun 11, 2019, at 11:26 AM, Thomas Hellström (VMware) > wrote: > > Hi, Nadav, > > On 6/11/19 7:21 PM, Nadav Amit wrote: >>> On Jun 11, 2019, at 5:24 AM, Thomas Hellström (VMware) >>> wrote: >>> >>> From: Thomas Hellstrom >> [ snip ] >> >>> +/** >>> + * apply_pt_wrprotect - Leaf pte callback to write-protect a pte >>> + * @pte: Pointer to the pte >>> + * @token: Page table token, see apply_to_pfn_range() >>> + * @addr: The virtual page address >>> + * @closure: Pointer to a struct pfn_range_apply embedded in a >>> + * struct apply_as >>> + * >>> + * The function write-protects a pte and records the range in >>> + * virtual address space of touched ptes for efficient range TLB flushes. >>> + * >>> + * Return: Always zero. >>> + */ >>> +static int apply_pt_wrprotect(pte_t *pte, pgtable_t token, >>> + unsigned long addr, >>> + struct pfn_range_apply *closure) >>> +{ >>> + struct apply_as *aas = container_of(closure, typeof(*aas), base); >>> + pte_t ptent = *pte; >>> + >>> + if (pte_write(ptent)) { >>> + pte_t old_pte = ptep_modify_prot_start(aas->vma, addr, pte); >>> + >>> + ptent = pte_wrprotect(old_pte); >>> + ptep_modify_prot_commit(aas->vma, addr, pte, old_pte, ptent); >>> + aas->total++; >>> + aas->start = min(aas->start, addr); >>> + aas->end = max(aas->end, addr + PAGE_SIZE); >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +/** >>> + * struct apply_as_clean - Closure structure for apply_as_clean >>> + * @base: struct apply_as we derive from >>> + * @bitmap_pgoff: Address_space Page offset of the first bit in @bitmap >>> + * @bitmap: Bitmap with one bit for each page offset in the address_space >>> range >>> + * covered. >>> + * @start: Address_space page offset of first modified pte relative >>> + * to @bitmap_pgoff >>> + * @end: Address_space page offset of last modified pte relative >>> + * to @bitmap_pgoff >>> + */ >>> +struct apply_as_clean { >>> + struct apply_as base; >>> + pgoff_t bitmap_pgoff; >>> + unsigned long *bitmap; >>> + pgoff_t start; >>> + pgoff_t end; >>> +}; >>> + >>> +/** >>> + * apply_pt_clean - Leaf pte callback to clean a pte >>> + * @pte: Pointer to the pte >>> + * @token: Page table token, see apply_to_pfn_range() >>> + * @addr: The virtual page address >>> + * @closure: Pointer to a struct pfn_range_apply embedded in a >>> + * struct apply_as_clean >>> + * >>> + * The function cleans a pte and records the range in >>> + * virtual address space of touched ptes for efficient TLB flushes. >>> + * It also records dirty ptes in a bitmap representing page offsets >>> + * in the address_space, as well as the first and last of the bits >>> + * touched. >>> + * >>> + * Return: Always zero. >>> + */ >>> +static int apply_pt_clean(pte_t *pte, pgtable_t token, >>> + unsigned long addr, >>> + struct pfn_range_apply *closure) >>> +{ >>> + struct apply_as *aas = container_of(closure, typeof(*aas), base); >>> + struct apply_as_clean *clean = container_of(aas, typeof(*clean), base); >>> + pte_t ptent = *pte; >>> + >>> + if (pte_dirty(ptent)) { >>> + pgoff_t pgoff = ((addr - aas->vma->vm_start) >> PAGE_SHIFT) + >>> + aas->vma->vm_pgoff - clean->bitmap_pgoff; >>> + pte_t old_pte = ptep_modify_prot_start(aas->vma, addr, pte); >>> + >>> + ptent = pte_mkclean(old_pte); >>> + ptep_modify_prot_commit(aas->vma, addr, pte, old_pte, ptent); >>> + >>> + aas->total++; >>> + aas->start = min(aas->start, addr); >>> + aas->end = max(aas->end, addr + PAGE_SIZE); >>> + >>> + __set_bit(pgoff, clean->bitmap); >>> + clean->start = min(clean->start, pgoff); >>> + clean->end = max(clean->end, pgoff + 1); >>> + } >>> + >>> + return 0; >> Usually, when a PTE is write-protected, or when a dirty-bit is cleared, the &g