Re: [PATCH RFC v3 28/35] arm64: mte: swap: Handle tag restoring when missing tag storage

2024-02-02 Thread Peter Collingbourne
On Fri, Feb 2, 2024 at 6:56 AM Alexandru Elisei
 wrote:
>
> Hi Peter,
>
> On Thu, Feb 01, 2024 at 08:02:40PM -0800, Peter Collingbourne wrote:
> > On Thu, Jan 25, 2024 at 8:45 AM Alexandru Elisei
> >  wrote:
> > >
> > > Linux restores tags when a page is swapped in and there are tags 
> > > associated
> > > with the swap entry which the new page will replace. The saved tags are
> > > restored even if the page will not be mapped as tagged, to protect against
> > > cases where the page is shared between different VMAs, and is tagged in
> > > some, but untagged in others. By using this approach, the process can 
> > > still
> > > access the correct tags following an mprotect(PROT_MTE) on the non-MTE
> > > enabled VMA.
> > >
> > > But this poses a challenge for managing tag storage: in the scenario 
> > > above,
> > > when a new page is allocated to be swapped in for the process where it 
> > > will
> > > be mapped as untagged, the corresponding tag storage block is not 
> > > reserved.
> > > mte_restore_page_tags_by_swp_entry(), when it restores the saved tags, 
> > > will
> > > overwrite data in the tag storage block associated with the new page,
> > > leading to data corruption if the block is in use by a process.
> > >
> > > Get around this issue by saving the tags in a new xarray, this time 
> > > indexed
> > > by the page pfn, and then restoring them when tag storage is reserved for
> > > the page.
> > >
> > > Signed-off-by: Alexandru Elisei 
> > > ---
> > >
> > > Changes since rfc v2:
> > >
> > > * Restore saved tags **before** setting the PG_tag_storage_reserved bit to
> > > eliminate a brief window of opportunity where userspace can access 
> > > uninitialized
> > > tags (Peter Collingbourne).
> > >
> > >  arch/arm64/include/asm/mte_tag_storage.h |   8 ++
> > >  arch/arm64/include/asm/pgtable.h |  11 +++
> > >  arch/arm64/kernel/mte_tag_storage.c  |  12 ++-
> > >  arch/arm64/mm/mteswap.c  | 110 +++
> > >  4 files changed, 140 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm64/include/asm/mte_tag_storage.h 
> > > b/arch/arm64/include/asm/mte_tag_storage.h
> > > index 50bdae94cf71..40590a8c3748 100644
> > > --- a/arch/arm64/include/asm/mte_tag_storage.h
> > > +++ b/arch/arm64/include/asm/mte_tag_storage.h
> > > @@ -36,6 +36,14 @@ bool page_is_tag_storage(struct page *page);
> > >
> > >  vm_fault_t handle_folio_missing_tag_storage(struct folio *folio, struct 
> > > vm_fault *vmf,
> > > bool *map_pte);
> > > +vm_fault_t mte_try_transfer_swap_tags(swp_entry_t entry, struct page 
> > > *page);
> > > +
> > > +void tags_by_pfn_lock(void);
> > > +void tags_by_pfn_unlock(void);
> > > +
> > > +void *mte_erase_tags_for_pfn(unsigned long pfn);
> > > +bool mte_save_tags_for_pfn(void *tags, unsigned long pfn);
> > > +void mte_restore_tags_for_pfn(unsigned long start_pfn, int order);
> > >  #else
> > >  static inline bool tag_storage_enabled(void)
> > >  {
> > > diff --git a/arch/arm64/include/asm/pgtable.h 
> > > b/arch/arm64/include/asm/pgtable.h
> > > index 0174e292f890..87ae59436162 100644
> > > --- a/arch/arm64/include/asm/pgtable.h
> > > +++ b/arch/arm64/include/asm/pgtable.h
> > > @@ -1085,6 +1085,17 @@ static inline void arch_swap_invalidate_area(int 
> > > type)
> > > mte_invalidate_tags_area_by_swp_entry(type);
> > >  }
> > >
> > > +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE
> > > +#define __HAVE_ARCH_SWAP_PREPARE_TO_RESTORE
> > > +static inline vm_fault_t arch_swap_prepare_to_restore(swp_entry_t entry,
> > > + struct folio *folio)
> > > +{
> > > +   if (tag_storage_enabled())
> > > +   return mte_try_transfer_swap_tags(entry, >page);
> > > +   return 0;
> > > +}
> > > +#endif
> > > +
> > >  #define __HAVE_ARCH_SWAP_RESTORE
> > >  static inline void arch_swap_restore(swp_entry_t entry, struct folio 
> > > *folio)
> > >  {
> > > diff --git a/arch/arm64/kernel/mte_tag_storage.c 
> > > b/arch/arm64/kernel/mte_tag_storage.c
> > > index afe2bb754879..ac7b9c9c585c

Re: [PATCH RFC v3 28/35] arm64: mte: swap: Handle tag restoring when missing tag storage

2024-02-01 Thread Peter Collingbourne
On Thu, Jan 25, 2024 at 8:45 AM Alexandru Elisei
 wrote:
>
> Linux restores tags when a page is swapped in and there are tags associated
> with the swap entry which the new page will replace. The saved tags are
> restored even if the page will not be mapped as tagged, to protect against
> cases where the page is shared between different VMAs, and is tagged in
> some, but untagged in others. By using this approach, the process can still
> access the correct tags following an mprotect(PROT_MTE) on the non-MTE
> enabled VMA.
>
> But this poses a challenge for managing tag storage: in the scenario above,
> when a new page is allocated to be swapped in for the process where it will
> be mapped as untagged, the corresponding tag storage block is not reserved.
> mte_restore_page_tags_by_swp_entry(), when it restores the saved tags, will
> overwrite data in the tag storage block associated with the new page,
> leading to data corruption if the block is in use by a process.
>
> Get around this issue by saving the tags in a new xarray, this time indexed
> by the page pfn, and then restoring them when tag storage is reserved for
> the page.
>
> Signed-off-by: Alexandru Elisei 
> ---
>
> Changes since rfc v2:
>
> * Restore saved tags **before** setting the PG_tag_storage_reserved bit to
> eliminate a brief window of opportunity where userspace can access 
> uninitialized
> tags (Peter Collingbourne).
>
>  arch/arm64/include/asm/mte_tag_storage.h |   8 ++
>  arch/arm64/include/asm/pgtable.h |  11 +++
>  arch/arm64/kernel/mte_tag_storage.c  |  12 ++-
>  arch/arm64/mm/mteswap.c  | 110 +++
>  4 files changed, 140 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/mte_tag_storage.h 
> b/arch/arm64/include/asm/mte_tag_storage.h
> index 50bdae94cf71..40590a8c3748 100644
> --- a/arch/arm64/include/asm/mte_tag_storage.h
> +++ b/arch/arm64/include/asm/mte_tag_storage.h
> @@ -36,6 +36,14 @@ bool page_is_tag_storage(struct page *page);
>
>  vm_fault_t handle_folio_missing_tag_storage(struct folio *folio, struct 
> vm_fault *vmf,
> bool *map_pte);
> +vm_fault_t mte_try_transfer_swap_tags(swp_entry_t entry, struct page *page);
> +
> +void tags_by_pfn_lock(void);
> +void tags_by_pfn_unlock(void);
> +
> +void *mte_erase_tags_for_pfn(unsigned long pfn);
> +bool mte_save_tags_for_pfn(void *tags, unsigned long pfn);
> +void mte_restore_tags_for_pfn(unsigned long start_pfn, int order);
>  #else
>  static inline bool tag_storage_enabled(void)
>  {
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index 0174e292f890..87ae59436162 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1085,6 +1085,17 @@ static inline void arch_swap_invalidate_area(int type)
> mte_invalidate_tags_area_by_swp_entry(type);
>  }
>
> +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE
> +#define __HAVE_ARCH_SWAP_PREPARE_TO_RESTORE
> +static inline vm_fault_t arch_swap_prepare_to_restore(swp_entry_t entry,
> + struct folio *folio)
> +{
> +   if (tag_storage_enabled())
> +   return mte_try_transfer_swap_tags(entry, >page);
> +   return 0;
> +}
> +#endif
> +
>  #define __HAVE_ARCH_SWAP_RESTORE
>  static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
>  {
> diff --git a/arch/arm64/kernel/mte_tag_storage.c 
> b/arch/arm64/kernel/mte_tag_storage.c
> index afe2bb754879..ac7b9c9c585c 100644
> --- a/arch/arm64/kernel/mte_tag_storage.c
> +++ b/arch/arm64/kernel/mte_tag_storage.c
> @@ -567,6 +567,7 @@ int reserve_tag_storage(struct page *page, int order, 
> gfp_t gfp)
> }
> }
>
> +   mte_restore_tags_for_pfn(page_to_pfn(page), order);
> page_set_tag_storage_reserved(page, order);
>  out_unlock:
> mutex_unlock(_blocks_lock);
> @@ -595,7 +596,8 @@ void free_tag_storage(struct page *page, int order)
> struct tag_region *region;
> unsigned long page_va;
> unsigned long flags;
> -   int ret;
> +   void *tags;
> +   int i, ret;
>
> ret = tag_storage_find_block(page, _block, );
> if (WARN_ONCE(ret, "Missing tag storage block for pfn 0x%lx", 
> page_to_pfn(page)))
> @@ -605,6 +607,14 @@ void free_tag_storage(struct page *page, int order)
> /* Avoid writeback of dirty tag cache lines corrupting data. */
> dcache_inval_tags_poc(page_va, page_va + (PAGE_SIZE << order));
>
> +   tags_by_pfn_lock();
> +   for (i = 0; i < (1 << order)

Re: [PATCH RFC v3 23/35] arm64: mte: Try to reserve tag storage in arch_alloc_page()

2024-01-29 Thread Peter Collingbourne
On Thu, Jan 25, 2024 at 8:45 AM Alexandru Elisei
 wrote:
>
> Reserve tag storage for a page that is being allocated as tagged. This
> is a best effort approach, and failing to reserve tag storage is
> allowed.
>
> When all the associated tagged pages have been freed, return the tag
> storage pages back to the page allocator, where they can be used again for
> data allocations.
>
> Signed-off-by: Alexandru Elisei 
> ---
>
> Changes since rfc v2:
>
> * Based on rfc v2 patch #16 ("arm64: mte: Manage tag storage on page
> allocation").
> * Fixed calculation of the number of associated tag storage blocks (Hyesoo
> Yu).
> * Tag storage is reserved in arch_alloc_page() instead of
> arch_prep_new_page().
>
>  arch/arm64/include/asm/mte.h |  16 +-
>  arch/arm64/include/asm/mte_tag_storage.h |  31 +++
>  arch/arm64/include/asm/page.h|   5 +
>  arch/arm64/include/asm/pgtable.h |  19 ++
>  arch/arm64/kernel/mte_tag_storage.c  | 234 +++
>  arch/arm64/mm/fault.c|   7 +
>  fs/proc/page.c   |   1 +
>  include/linux/kernel-page-flags.h|   1 +
>  include/linux/page-flags.h   |   1 +
>  include/trace/events/mmflags.h   |   3 +-
>  mm/huge_memory.c |   1 +
>  11 files changed, 316 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> index 8034695b3dd7..6457b7899207 100644
> --- a/arch/arm64/include/asm/mte.h
> +++ b/arch/arm64/include/asm/mte.h
> @@ -40,12 +40,24 @@ void mte_free_tag_buf(void *buf);
>  #ifdef CONFIG_ARM64_MTE
>
>  /* track which pages have valid allocation tags */
> -#define PG_mte_tagged  PG_arch_2
> +#define PG_mte_tagged  PG_arch_2
>  /* simple lock to avoid multiple threads tagging the same page */
> -#define PG_mte_lockPG_arch_3
> +#define PG_mte_lockPG_arch_3
> +/* Track if a tagged page has tag storage reserved */
> +#define PG_tag_storage_reservedPG_arch_4
> +
> +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE
> +DECLARE_STATIC_KEY_FALSE(tag_storage_enabled_key);
> +extern bool page_tag_storage_reserved(struct page *page);
> +#endif
>
>  static inline void set_page_mte_tagged(struct page *page)
>  {
> +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE
> +   /* Open code mte_tag_storage_enabled() */
> +   WARN_ON_ONCE(static_branch_likely(_storage_enabled_key) &&
> +!page_tag_storage_reserved(page));
> +#endif
> /*
>  * Ensure that the tags written prior to this function are visible
>  * before the page flags update.
> diff --git a/arch/arm64/include/asm/mte_tag_storage.h 
> b/arch/arm64/include/asm/mte_tag_storage.h
> index 7b3f6bff8e6f..09f1318d924e 100644
> --- a/arch/arm64/include/asm/mte_tag_storage.h
> +++ b/arch/arm64/include/asm/mte_tag_storage.h
> @@ -5,6 +5,12 @@
>  #ifndef __ASM_MTE_TAG_STORAGE_H
>  #define __ASM_MTE_TAG_STORAGE_H
>
> +#ifndef __ASSEMBLY__
> +
> +#include 
> +
> +#include 
> +
>  #ifdef CONFIG_ARM64_MTE_TAG_STORAGE
>
>  DECLARE_STATIC_KEY_FALSE(tag_storage_enabled_key);
> @@ -15,6 +21,15 @@ static inline bool tag_storage_enabled(void)
>  }
>
>  void mte_init_tag_storage(void);
> +
> +static inline bool alloc_requires_tag_storage(gfp_t gfp)
> +{
> +   return gfp & __GFP_TAGGED;
> +}
> +int reserve_tag_storage(struct page *page, int order, gfp_t gfp);
> +void free_tag_storage(struct page *page, int order);
> +
> +bool page_tag_storage_reserved(struct page *page);
>  #else
>  static inline bool tag_storage_enabled(void)
>  {
> @@ -23,6 +38,22 @@ static inline bool tag_storage_enabled(void)
>  static inline void mte_init_tag_storage(void)
>  {
>  }
> +static inline bool alloc_requires_tag_storage(struct page *page)

This function should take a gfp_t to match the
CONFIG_ARM64_MTE_TAG_STORAGE case.

Peter

> +{
> +   return false;
> +}
> +static inline int reserve_tag_storage(struct page *page, int order, gfp_t 
> gfp)
> +{
> +   return 0;
> +}
> +static inline void free_tag_storage(struct page *page, int order)
> +{
> +}
> +static inline bool page_tag_storage_reserved(struct page *page)
> +{
> +   return true;
> +}
>  #endif /* CONFIG_ARM64_MTE_TAG_STORAGE */
>
> +#endif /* !__ASSEMBLY__ */
>  #endif /* __ASM_MTE_TAG_STORAGE_H  */
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 88bab032a493..3a656492f34a 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -35,6 +35,11 @@ void copy_highpage(struct page *to, struct page *from);
>  void tag_clear_highpage(struct page *to);
>  #define __HAVE_ARCH_TAG_CLEAR_HIGHPAGE
>
> +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE
> +void arch_alloc_page(struct page *, int order, gfp_t gfp);
> +#define HAVE_ARCH_ALLOC_PAGE
> +#endif
> +
>  #define clear_user_page(page, vaddr, pg)   clear_page(page)
>  #define copy_user_page(to, from, vaddr, pg)copy_page(to, from)
>
> diff --git 

Re: [PATCH RFC v3 11/35] mm: Allow an arch to hook into folio allocation when VMA is known

2024-01-26 Thread Peter Collingbourne
On Thu, Jan 25, 2024 at 8:43 AM Alexandru Elisei
 wrote:
>
> arm64 uses VM_HIGH_ARCH_0 and VM_HIGH_ARCH_1 for enabling MTE for a VMA.
> When VM_HIGH_ARCH_0, which arm64 renames to VM_MTE, is set for a VMA, and
> the gfp flag __GFP_ZERO is present, the __GFP_ZEROTAGS gfp flag also gets
> set in vma_alloc_zeroed_movable_folio().
>
> Expand this to be more generic by adding an arch hook that modifes the gfp
> flags for an allocation when the VMA is known.
>
> Note that __GFP_ZEROTAGS is ignored by the page allocator unless __GFP_ZERO
> is also set; from that point of view, the current behaviour is unchanged,
> even though the arm64 flag is set in more places.  When arm64 will have
> support to reuse the tag storage for data allocation, the uses of the
> __GFP_ZEROTAGS flag will be expanded to instruct the page allocator to try
> to reserve the corresponding tag storage for the pages being allocated.
>
> The flags returned by arch_calc_vma_gfp() are or'ed with the flags set by
> the caller; this has been done to keep an architecture from modifying the
> flags already set by the core memory management code; this is similar to
> how do_mmap() -> calc_vm_flag_bits() -> arch_calc_vm_flag_bits() has been
> implemented. This can be revisited in the future if there's a need to do
> so.
>
> Signed-off-by: Alexandru Elisei 

This patch also needs to update the non-CONFIG_NUMA definition of
vma_alloc_folio in include/linux/gfp.h to call arch_calc_vma_gfp. See:
https://r.android.com/2849146

Peter



Re: [PATCH RFC v2 20/27] mm: hugepage: Handle huge page fault on access

2023-11-21 Thread Peter Collingbourne
On Sun, Nov 19, 2023 at 8:59 AM Alexandru Elisei
 wrote:
>
> Handle PAGE_FAULT_ON_ACCESS faults for huge pages in a similar way to
> regular pages.
>
> Signed-off-by: Alexandru Elisei 
> ---
>  arch/arm64/include/asm/mte_tag_storage.h |  1 +
>  arch/arm64/include/asm/pgtable.h |  7 ++
>  arch/arm64/mm/fault.c| 81 
>  include/linux/huge_mm.h  |  2 +
>  include/linux/pgtable.h  |  5 ++
>  mm/huge_memory.c |  4 +-
>  mm/memory.c  |  3 +
>  7 files changed, 101 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/mte_tag_storage.h 
> b/arch/arm64/include/asm/mte_tag_storage.h
> index c70ced60a0cd..b97406d369ce 100644
> --- a/arch/arm64/include/asm/mte_tag_storage.h
> +++ b/arch/arm64/include/asm/mte_tag_storage.h
> @@ -35,6 +35,7 @@ void free_tag_storage(struct page *page, int order);
>  bool page_tag_storage_reserved(struct page *page);
>
>  vm_fault_t handle_page_missing_tag_storage(struct vm_fault *vmf);
> +vm_fault_t handle_huge_page_missing_tag_storage(struct vm_fault *vmf);
>  #else
>  static inline bool tag_storage_enabled(void)
>  {
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index 8cc135f1c112..1704411c096d 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -477,6 +477,13 @@ static inline vm_fault_t 
> arch_do_page_fault_on_access(struct vm_fault *vmf)
> return handle_page_missing_tag_storage(vmf);
> return VM_FAULT_SIGBUS;
>  }
> +
> +static inline vm_fault_t arch_do_huge_page_fault_on_access(struct vm_fault 
> *vmf)
> +{
> +   if (tag_storage_enabled())
> +   return handle_huge_page_missing_tag_storage(vmf);
> +   return VM_FAULT_SIGBUS;
> +}
>  #endif /* CONFIG_ARCH_HAS_FAULT_ON_ACCESS */
>
>  #define pmd_present_invalid(pmd) (!!(pmd_val(pmd) & PMD_PRESENT_INVALID))
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index f5fa583acf18..6730a0812a24 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -1041,6 +1041,87 @@ vm_fault_t handle_page_missing_tag_storage(struct 
> vm_fault *vmf)
>
> return 0;
>
> +out_retry:
> +   put_page(page);
> +   if (vmf->flags & FAULT_FLAG_VMA_LOCK)
> +   vma_end_read(vma);
> +   if (fault_flag_allow_retry_first(vmf->flags)) {
> +   err = VM_FAULT_RETRY;
> +   } else {
> +   /* Replay the fault. */
> +   err = 0;
> +   }
> +   return err;
> +}
> +
> +vm_fault_t handle_huge_page_missing_tag_storage(struct vm_fault *vmf)
> +{
> +   unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
> +   struct vm_area_struct *vma = vmf->vma;
> +   pmd_t old_pmd, new_pmd;
> +   bool writable = false;
> +   struct page *page;
> +   vm_fault_t err;
> +   int ret;
> +
> +   vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
> +   if (unlikely(!pmd_same(vmf->orig_pmd, *vmf->pmd))) {
> +   spin_unlock(vmf->ptl);
> +   return 0;
> +   }
> +
> +   old_pmd = vmf->orig_pmd;
> +   new_pmd = pmd_modify(old_pmd, vma->vm_page_prot);
> +
> +   /*
> +* Detect now whether the PMD could be writable; this information
> +* is only valid while holding the PT lock.
> +*/
> +   writable = pmd_write(new_pmd);
> +   if (!writable && vma_wants_manual_pte_write_upgrade(vma) &&
> +   can_change_pmd_writable(vma, vmf->address, new_pmd))
> +   writable = true;
> +
> +   page = vm_normal_page_pmd(vma, haddr, new_pmd);
> +   if (!page)
> +   goto out_map;
> +
> +   if (!(vma->vm_flags & VM_MTE))
> +   goto out_map;
> +
> +   get_page(page);
> +   vma_set_access_pid_bit(vma);
> +
> +   spin_unlock(vmf->ptl);
> +   writable = false;
> +
> +   if (unlikely(is_migrate_isolate_page(page)))
> +   goto out_retry;
> +
> +   ret = reserve_tag_storage(page, HPAGE_PMD_ORDER, 
> GFP_HIGHUSER_MOVABLE);
> +   if (ret)
> +   goto out_retry;
> +
> +   put_page(page);
> +
> +   vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
> +   if (unlikely(!pmd_same(old_pmd, *vmf->pmd))) {
> +   spin_unlock(vmf->ptl);
> +   return 0;
> +   }
> +
> +out_map:
> +   /* Restore the PMD */
> +   new_pmd = pmd_modify(old_pmd, vma->vm_page_prot);
> +   new_pmd = pmd_mkyoung(new_pmd);
> +   if (writable)
> +   new_pmd = pmd_mkwrite(new_pmd, vma);
> +   set_pmd_at(vma->vm_mm, haddr, vmf->pmd, new_pmd);
> +   update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
> +   spin_unlock(vmf->ptl);
> +
> +   return 0;
> +
>  out_retry:
> put_page(page);
> if (vmf->flags & FAULT_FLAG_VMA_LOCK)
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 

Re: [PATCH RFC 20/37] mm: compaction: Reserve metadata storage in compaction_alloc()

2023-11-20 Thread Peter Collingbourne
Hi Alexandru,

On Wed, Aug 23, 2023 at 6:16 AM Alexandru Elisei
 wrote:
>
> If the source page being migrated has metadata associated with it, make
> sure to reserve the metadata storage when choosing a suitable destination
> page from the free list.
>
> Signed-off-by: Alexandru Elisei 
> ---
>  mm/compaction.c | 9 +
>  mm/internal.h   | 1 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index cc0139fa0cb0..af2ee3085623 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -570,6 +570,7 @@ static unsigned long isolate_freepages_block(struct 
> compact_control *cc,
> bool locked = false;
> unsigned long blockpfn = *start_pfn;
> unsigned int order;
> +   int ret;
>
> /* Strict mode is for isolation, speed is secondary */
> if (strict)
> @@ -626,6 +627,11 @@ static unsigned long isolate_freepages_block(struct 
> compact_control *cc,
>
> /* Found a free page, will break it into order-0 pages */
> order = buddy_order(page);
> +   if (metadata_storage_enabled() && cc->reserve_metadata) {
> +   ret = reserve_metadata_storage(page, order, 
> cc->gfp_mask);

At this point the zone lock is held and preemption is disabled, which
makes it invalid to call reserve_metadata_storage.

Peter

> +   if (ret)
> +   goto isolate_fail;
> +   }
> isolated = __isolate_free_page(page, order);
> if (!isolated)
> break;
> @@ -1757,6 +1763,9 @@ static struct folio *compaction_alloc(struct folio 
> *src, unsigned long data)
> struct compact_control *cc = (struct compact_control *)data;
> struct folio *dst;
>
> +   if (metadata_storage_enabled())
> +   cc->reserve_metadata = folio_has_metadata(src);
> +
> if (list_empty(>freepages)) {
> isolate_freepages(cc);
>
> diff --git a/mm/internal.h b/mm/internal.h
> index d28ac0085f61..046cc264bfbe 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -492,6 +492,7 @@ struct compact_control {
>  */
> bool alloc_contig;  /* alloc_contig_range allocation */
> bool source_has_metadata;   /* source pages have associated 
> metadata */
> +   bool reserve_metadata;
>  };
>
>  /*
> --
> 2.41.0
>



Re: [PATCH] kasan: fix kasan_byte_accessible() to be consistent with actual checks

2021-04-05 Thread Peter Collingbourne
On Mon, Apr 5, 2021 at 2:53 PM Andrey Konovalov  wrote:
>
> On Mon, Apr 5, 2021 at 11:43 PM Peter Collingbourne  wrote:
> >
> > We can sometimes end up with kasan_byte_accessible() being called
> > on non-slab memory. For example ksize() and krealloc() may end up
> > calling it on KFENCE allocated memory. In this case the memory will
> > be tagged with KASAN_SHADOW_INIT, which a subsequent patch ("kasan:
> > initialize shadow to TAG_INVALID for SW_TAGS") will set to the same
> > value as KASAN_TAG_INVALID, causing kasan_byte_accessible() to fail
> > when called on non-slab memory.
> >
> > This highlighted the fact that the check in kasan_byte_accessible()
> > was inconsistent with checks as implemented for loads and stores
> > (kasan_check_range() in SW tags mode and hardware-implemented
> > checks in HW tags mode). kasan_check_range() does not have a
> > check for KASAN_TAG_INVALID, and instead has a comparison against
> > KASAN_SHADOW_START. In HW tags mode, we do not have either, but we
> > do set TCR_EL1.TCMA which corresponds with the comparison against
> > KASAN_TAG_KERNEL.
> >
> > Therefore, update kasan_byte_accessible() for both SW and HW tags
> > modes to correspond with the respective checks on loads and stores.
> >
> > Link: 
> > https://linux-review.googlesource.com/id/Ic6d40803c57dcc6331bd97fbb9a60b0d38a65a36
> > Signed-off-by: Peter Collingbourne 
> > ---
> >  mm/kasan/kasan.h   | 3 +--
> >  mm/kasan/sw_tags.c | 8 +---
> >  2 files changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
> > index 8c55634d6edd..e18e8da35255 100644
> > --- a/mm/kasan/kasan.h
> > +++ b/mm/kasan/kasan.h
> > @@ -368,8 +368,7 @@ static inline bool kasan_byte_accessible(const void 
> > *addr)
> > u8 ptr_tag = get_tag(addr);
> > u8 mem_tag = hw_get_mem_tag((void *)addr);
> >
> > -   return (mem_tag != KASAN_TAG_INVALID) &&
> > -   (ptr_tag == KASAN_TAG_KERNEL || ptr_tag == mem_tag);
> > +   return ptr_tag == KASAN_TAG_KERNEL || ptr_tag == mem_tag;
> >  }
> >
> >  #else /* CONFIG_KASAN_HW_TAGS */
> > diff --git a/mm/kasan/sw_tags.c b/mm/kasan/sw_tags.c
> > index 94c2d33be333..914225eeda99 100644
> > --- a/mm/kasan/sw_tags.c
> > +++ b/mm/kasan/sw_tags.c
> > @@ -121,10 +121,12 @@ bool kasan_check_range(unsigned long addr, size_t 
> > size, bool write,
> >  bool kasan_byte_accessible(const void *addr)
> >  {
> > u8 tag = get_tag(addr);
> > -   u8 shadow_byte = READ_ONCE(*(u8 
> > *)kasan_mem_to_shadow(kasan_reset_tag(addr)));
> > +   void *untagged_addr = kasan_reset_tag(addr);
> > +   u8 shadow_byte = READ_ONCE(*(u8 
> > *)kasan_mem_to_shadow(untagged_addr));
>
> Hi Peter,
>
> Let's move dereferencing shadow memory past the KASAN_SHADOW_START
> check. Otherwise, in case the check is to fail, accessing shadow will
> likely crash the kernel.
>
> Thanks!

Makes sense, fixed in v2.

Peter


[PATCH v2] kasan: fix kasan_byte_accessible() to be consistent with actual checks

2021-04-05 Thread Peter Collingbourne
We can sometimes end up with kasan_byte_accessible() being called
on non-slab memory. For example ksize() and krealloc() may end up
calling it on KFENCE allocated memory. In this case the memory will
be tagged with KASAN_SHADOW_INIT, which a subsequent patch ("kasan:
initialize shadow to TAG_INVALID for SW_TAGS") will set to the same
value as KASAN_TAG_INVALID, causing kasan_byte_accessible() to fail
when called on non-slab memory.

This highlighted the fact that the check in kasan_byte_accessible()
was inconsistent with checks as implemented for loads and stores
(kasan_check_range() in SW tags mode and hardware-implemented
checks in HW tags mode). kasan_check_range() does not have a
check for KASAN_TAG_INVALID, and instead has a comparison against
KASAN_SHADOW_START. In HW tags mode, we do not have either, but we
do set TCR_EL1.TCMA which corresponds with the comparison against
KASAN_TAG_KERNEL.

Therefore, update kasan_byte_accessible() for both SW and HW tags
modes to correspond with the respective checks on loads and stores.

Link: 
https://linux-review.googlesource.com/id/Ic6d40803c57dcc6331bd97fbb9a60b0d38a65a36
Signed-off-by: Peter Collingbourne 
---
 mm/kasan/kasan.h   |  3 +--
 mm/kasan/sw_tags.c | 10 +++---
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 8c55634d6edd..e18e8da35255 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -368,8 +368,7 @@ static inline bool kasan_byte_accessible(const void *addr)
u8 ptr_tag = get_tag(addr);
u8 mem_tag = hw_get_mem_tag((void *)addr);
 
-   return (mem_tag != KASAN_TAG_INVALID) &&
-   (ptr_tag == KASAN_TAG_KERNEL || ptr_tag == mem_tag);
+   return ptr_tag == KASAN_TAG_KERNEL || ptr_tag == mem_tag;
 }
 
 #else /* CONFIG_KASAN_HW_TAGS */
diff --git a/mm/kasan/sw_tags.c b/mm/kasan/sw_tags.c
index 94c2d33be333..00ae8913fc74 100644
--- a/mm/kasan/sw_tags.c
+++ b/mm/kasan/sw_tags.c
@@ -121,10 +121,14 @@ bool kasan_check_range(unsigned long addr, size_t size, 
bool write,
 bool kasan_byte_accessible(const void *addr)
 {
u8 tag = get_tag(addr);
-   u8 shadow_byte = READ_ONCE(*(u8 
*)kasan_mem_to_shadow(kasan_reset_tag(addr)));
+   void *untagged_addr = kasan_reset_tag(addr);
+   u8 shadow_byte;
 
-   return (shadow_byte != KASAN_TAG_INVALID) &&
-   (tag == KASAN_TAG_KERNEL || tag == shadow_byte);
+   if (untagged_addr < kasan_shadow_to_mem((void *)KASAN_SHADOW_START))
+   return false;
+
+   shadow_byte = READ_ONCE(*(u8 *)kasan_mem_to_shadow(untagged_addr));
+   return tag == KASAN_TAG_KERNEL || tag == shadow_byte;
 }
 
 #define DEFINE_HWASAN_LOAD_STORE(size) \
-- 
2.31.0.208.g409f899ff0-goog



[PATCH] kasan: fix kasan_byte_accessible() to be consistent with actual checks

2021-04-05 Thread Peter Collingbourne
We can sometimes end up with kasan_byte_accessible() being called
on non-slab memory. For example ksize() and krealloc() may end up
calling it on KFENCE allocated memory. In this case the memory will
be tagged with KASAN_SHADOW_INIT, which a subsequent patch ("kasan:
initialize shadow to TAG_INVALID for SW_TAGS") will set to the same
value as KASAN_TAG_INVALID, causing kasan_byte_accessible() to fail
when called on non-slab memory.

This highlighted the fact that the check in kasan_byte_accessible()
was inconsistent with checks as implemented for loads and stores
(kasan_check_range() in SW tags mode and hardware-implemented
checks in HW tags mode). kasan_check_range() does not have a
check for KASAN_TAG_INVALID, and instead has a comparison against
KASAN_SHADOW_START. In HW tags mode, we do not have either, but we
do set TCR_EL1.TCMA which corresponds with the comparison against
KASAN_TAG_KERNEL.

Therefore, update kasan_byte_accessible() for both SW and HW tags
modes to correspond with the respective checks on loads and stores.

Link: 
https://linux-review.googlesource.com/id/Ic6d40803c57dcc6331bd97fbb9a60b0d38a65a36
Signed-off-by: Peter Collingbourne 
---
 mm/kasan/kasan.h   | 3 +--
 mm/kasan/sw_tags.c | 8 +---
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 8c55634d6edd..e18e8da35255 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -368,8 +368,7 @@ static inline bool kasan_byte_accessible(const void *addr)
u8 ptr_tag = get_tag(addr);
u8 mem_tag = hw_get_mem_tag((void *)addr);
 
-   return (mem_tag != KASAN_TAG_INVALID) &&
-   (ptr_tag == KASAN_TAG_KERNEL || ptr_tag == mem_tag);
+   return ptr_tag == KASAN_TAG_KERNEL || ptr_tag == mem_tag;
 }
 
 #else /* CONFIG_KASAN_HW_TAGS */
diff --git a/mm/kasan/sw_tags.c b/mm/kasan/sw_tags.c
index 94c2d33be333..914225eeda99 100644
--- a/mm/kasan/sw_tags.c
+++ b/mm/kasan/sw_tags.c
@@ -121,10 +121,12 @@ bool kasan_check_range(unsigned long addr, size_t size, 
bool write,
 bool kasan_byte_accessible(const void *addr)
 {
u8 tag = get_tag(addr);
-   u8 shadow_byte = READ_ONCE(*(u8 
*)kasan_mem_to_shadow(kasan_reset_tag(addr)));
+   void *untagged_addr = kasan_reset_tag(addr);
+   u8 shadow_byte = READ_ONCE(*(u8 *)kasan_mem_to_shadow(untagged_addr));
 
-   return (shadow_byte != KASAN_TAG_INVALID) &&
-   (tag == KASAN_TAG_KERNEL || tag == shadow_byte);
+   return untagged_addr >=
+  kasan_shadow_to_mem((void *)KASAN_SHADOW_START) &&
+  (tag == KASAN_TAG_KERNEL || tag == shadow_byte);
 }
 
 #define DEFINE_HWASAN_LOAD_STORE(size) \
-- 
2.31.0.208.g409f899ff0-goog



Re: [PATCH] kfence: unpoison pool region before use

2021-04-05 Thread Peter Collingbourne
On Sat, Apr 3, 2021 at 4:52 PM Andrey Konovalov  wrote:
>
> On Sun, Apr 4, 2021 at 12:31 AM Marco Elver  wrote:
> >
> > However, given the above, I think we need to explain this in the
> > commit message (which also makes the dependency between these 2
> > patches clear) and add a comment above the new kasan_unpoison_range().
> > That is, if we still think this is the right fix -- I'm not entirely
> > sure it is.
> >
> > Because what I gather from "kasan: initialize shadow to TAG_INVALID
> > for SW_TAGS", is the requirement that "0xFF pointer tag is a match-all
> > tag, it doesn't matter what tag the accessed memory has".
> >
> > While KFENCE memory is accessible through the slab API, and in this
> > case ksize() calling kasan_check_byte() leading to a failure, the
> > kasan_check_byte() call is part of the public KASAN API. Which means
> > that if some subsystem decides to memblock_alloc() some memory, and
> > wishes to use kasan_check_byte() on that memory but with an untagged
> > pointer, will get the same problem as KFENCE: with generic and HW_TAGS
> > mode everything is fine, but with SW_TAGS mode things break.
>
> It makes sense to allow this function to operate on any kind of
> memory, including memory that hasn't been previously marked by KASAN.
>
> > To me this indicates the fix is not with KFENCE, but should be in
> > mm/kasan/sw_tags.c:kasan_byte_accessible(), which should not load the
> > shadow when the pointer is untagged.
>
> The problem isn't in accessing shadow per se. Looking at
> kasan_byte_accessible() (in both sw_tags.c and kasan.h), the return
> statement there seems just wrong and redundant. The KASAN_TAG_KERNEL
> check should come first:
>
> return tag == KASAN_TAG_KERNEL || (shadow_byte != KASAN_TAG_INVALID &&
> tag == shadow_byte);
>
> This way, if the pointer tag is KASAN_TAG_KERNEL, the memory is
> accessible no matter what the memory tag is.
>
> But then the KASAN_TAG_INVALID check isn't needed, as this value is
> never assigned to a pointer tag. Which brings us to:
>
> return tag == KASAN_TAG_KERNEL || tag == shadow_byte;
>
> Which is essentially the same check that kasan_check_range() performs.
>
> Although, kasan_check_range() also checks that the shadow is <
> KASAN_SHADOW_START. It makes makes sense to add this check into
> kasan_byte_accessible() as well, before accessing shadow.
>
> Thanks!

Okay, if the intent is that kasan_byte_accessible() should work on any
memory, not just slab memory, then I agree that it should do the same
thing as kasan_check_range().

Peter


Re: [PATCH] kfence: unpoison pool region before use

2021-04-03 Thread Peter Collingbourne
On Sat, Apr 3, 2021 at 3:03 AM Marco Elver  wrote:
>
> On Sat, 3 Apr 2021 at 07:13, Peter Collingbourne  wrote:
> > If the memory region allocated by KFENCE had previously been poisoned,
> > any validity checks done using kasan_byte_accessible() will fail. Fix
> > it by unpoisoning the memory before using it as the pool region.
> >
> > Link: 
> > https://linux-review.googlesource.com/id/I0af99e9f1c25eaf7e1ec295836b5d148d76940c5
> > Signed-off-by: Peter Collingbourne 
>
> Thanks, at a high level this seems reasonable, because we always want
> to ensure that KFENCE memory remains unpoisoned with KASAN on. FWIW I
> subjected a config with KFENCE+KASAN (generic, SW_TAGS, and HW_TAGS)
> to syzkaller testing and ran kfence_test:
>
>   Tested-by: Marco Elver 
>
>
> However, it is unclear to me under which circumstances we actually
> need this, i.e. something would grab some memblock memory, somehow
> poison it, and then release the memory back during early boot (note,
> kfence_alloc_pool() is called before slab setup). If we can somehow
> understand what actually did this, perhaps it'd help tell us if this
> actually needs fixing in KFENCE or it's the other thing that needs a
> fix.
>
> Given all this is happening during really early boot, I'd expect no or
> very few calls to kasan_poison() until kfence_alloc_pool() is called.
> We can probably debug it more by having kasan_poison() do a "if
> (!__kfence_pool) dump_stack();" somewhere. Can you try this on the
> system where you can repro the problem? I tried this just now on the
> latest mainline kernel, and saw 0 calls until kfence_alloc_pool().

I looked into the issue some more, and it turned out that the memory
wasn't getting poisoned by kasan_poison() but rather by the calls to
kasan_map_populate() in kasan_init_shadow(). Starting with the patch
"kasan: initialize shadow to TAG_INVALID for SW_TAGS",
KASAN_SHADOW_INIT is set to 0xFE rather than 0xFF, which caused the
failure. The Android kernel branch for 5.10 (and the downstream kernel
I was working with) already have this patch, but it isn't in the
mainline kernel yet.

Now that I understand the cause of the issue, I can reproduce it using
the KFENCE unit tests on a db845c board, using both the Android 5.10
and mainline branches if I cherry-pick that change. Here's an example
crash from the unit tests (the failure was originally also observed
from ksize in the downstream kernel):

[   46.692195][  T175] BUG: KASAN: invalid-access in test_krealloc+0x1c4/0xf98
[   46.699282][  T175] Read of size 1 at addr ff80e9e7b000 by task
kunit_try_catch/175
[   46.707400][  T175] Pointer tag: [ff], memory tag: [fe]
[   46.712710][  T175]
[   46.714955][  T175] CPU: 4 PID: 175 Comm: kunit_try_catch Tainted:
GB 5.12.0-rc5-mainline-09505-ga2ab5b26d445-dirty #1
[   46.727193][  T175] Hardware name: Thundercomm Dragonboard 845c (DT)
[   46.733636][  T175] Call trace:
[   46.736841][  T175]  dump_backtrace+0x0/0x2f8
[   46.741295][  T175]  show_stack+0x2c/0x3c
[   46.745388][  T175]  dump_stack+0x124/0x1bc
[   46.749668][  T175]  print_address_description+0x7c/0x308
[   46.755178][  T175]  __kasan_report+0x1a8/0x398
[   46.759816][  T175]  kasan_report+0x50/0x7c
[   46.764103][  T175]  __kasan_check_byte+0x3c/0x54
[   46.768916][  T175]  ksize+0x4c/0x94
[   46.772573][  T175]  test_krealloc+0x1c4/0xf98
[   46.777108][  T175]  kunit_try_run_case+0x94/0x1c4
[   46.781990][  T175]  kunit_generic_run_threadfn_adapter+0x30/0x44
[   46.788196][  T175]  kthread+0x20c/0x234
[   46.792213][  T175]  ret_from_fork+0x10/0x30

Since "kasan: initialize shadow to TAG_INVALID for SW_TAGS" hasn't
landed in mainline yet, it seems like we should insert this patch
before that one rather than adding a Fixes: tag.

Peter


[PATCH] kfence: unpoison pool region before use

2021-04-02 Thread Peter Collingbourne
If the memory region allocated by KFENCE had previously been poisoned,
any validity checks done using kasan_byte_accessible() will fail. Fix
it by unpoisoning the memory before using it as the pool region.

Link: 
https://linux-review.googlesource.com/id/I0af99e9f1c25eaf7e1ec295836b5d148d76940c5
Signed-off-by: Peter Collingbourne 
---
 mm/kfence/core.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index d53c91f881a4..bb22b0cf77aa 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -633,13 +633,19 @@ static DECLARE_DELAYED_WORK(kfence_timer, 
toggle_allocation_gate);
 
 void __init kfence_alloc_pool(void)
 {
+   void *pool;
+
if (!kfence_sample_interval)
return;
 
-   __kfence_pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE);
-
-   if (!__kfence_pool)
+   pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE);
+   if (!pool) {
pr_err("failed to allocate pool\n");
+   return;
+   }
+
+   kasan_unpoison_range(pool, KFENCE_POOL_SIZE);
+   __kfence_pool = pool;
 }
 
 void __init kfence_init(void)
-- 
2.31.0.208.g409f899ff0-goog



Re: [PATCH v3 12/17] arm64: implement __va_function

2021-03-25 Thread Peter Collingbourne
On Thu, Mar 25, 2021 at 4:28 PM Sami Tolvanen  wrote:
>
> On Thu, Mar 25, 2021 at 3:38 AM Mark Rutland  wrote:
> >
> > On Tue, Mar 23, 2021 at 01:39:41PM -0700, Sami Tolvanen wrote:
> > > With CONFIG_CFI_CLANG, the compiler replaces function addresses in
> > > instrumented C code with jump table addresses. This change implements
> > > the __va_function() macro, which returns the actual function address
> > > instead.
> > >
> > > Signed-off-by: Sami Tolvanen 
> > > Reviewed-by: Kees Cook 
> >
> > Is there really no attribute or builtin that can be used to do this
> > without assembly?
>
> I don't think the compiler currently offers anything that could help
> us here. Peter, can you think of another way to avoid the function
> address to jump table address conversion with
> -fno-sanitize-cfi-canonical-jump-tables?

No, the assembly seems like the best way at the moment.

> > IIUC from other patches the symbol tables will contain the "real"
> > non-cfi entry points (unless we explciitly asked to make the jump table
> > address canonical), so AFAICT here the compiler should have all the
> > necessary information to generate either the CFI or non-CFI entry point
> > addresses, even if it doesn't expose an interface for that today.
> >
> > It'd be a lot nicer if we could get the compiler to do this for us.
>
> I agree, that would be quite useful in the kernel.

Maybe. The kernel's requirements seem quite specialized here though. A
normal C or C++ program has little need for the actual entry point, so
if you need it you are probably doing something low level enough to
require assembly anyway.

Peter


Re: [PATCH v1] drm/bridge: lt9611: Fix handling of 4k panels

2020-12-10 Thread Peter Collingbourne
Thanks. Confirmed that this fixes display output for me on a 4K monitor.

On Mon, Nov 23, 2020 at 2:46 AM Robert Foss  wrote:
>
> 4k requires two dsi pipes, so don't report MODE_OK when only a
> single pipe is configured. But rather report MODE_PANEL to
> signal that requirements of the panel are not being met.
>
> Reported-by: Peter Collingbourne 
> Suggested-by: Peter Collingbourne 
> Signed-off-by: Robert Foss 
> Tested-by: John Stultz 

Tested-by: Peter Collingbourne 

> ---
>  drivers/gpu/drm/bridge/lontium-lt9611.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/bridge/lontium-lt9611.c 
> b/drivers/gpu/drm/bridge/lontium-lt9611.c
> index d734d9402c35..e8eb8deb444b 100644
> --- a/drivers/gpu/drm/bridge/lontium-lt9611.c
> +++ b/drivers/gpu/drm/bridge/lontium-lt9611.c
> @@ -867,8 +867,14 @@ static enum drm_mode_status 
> lt9611_bridge_mode_valid(struct drm_bridge *bridge,
>  const struct 
> drm_display_mode *mode)
>  {
> struct lt9611_mode *lt9611_mode = lt9611_find_mode(mode);
> +   struct lt9611 *lt9611 = bridge_to_lt9611(bridge);
>
> -   return lt9611_mode ? MODE_OK : MODE_BAD;
> +   if (!lt9611_mode)
> +   return MODE_BAD;
> +   else if (lt9611_mode->intfs > 1 && !lt9611->dsi1)
> +   return MODE_PANEL;
> +   else
> +   return MODE_OK;
>  }
>
>  static void lt9611_bridge_pre_enable(struct drm_bridge *bridge)
> --
> 2.27.0
>


Re: linux-next: manual merge of the akpm tree with the arm64 tree

2020-11-25 Thread Peter Collingbourne
On Wed, Nov 25, 2020 at 11:06 PM Stephen Rothwell  wrote:
>
> Hi all,
>
> Today's linux-next merge of the akpm tree got a conflict in:
>
>   arch/arm64/mm/proc.S
>
> between commit:
>
>   49b3cf035edc ("kasan: arm64: set TCR_EL1.TBID1 when enabled")
>
> from the arm64 tree and commit:
>
>   68cd215d6529 ("arm64: kasan: allow enabling in-kernel MTE")
>
> from the akpm tree.
>
> I fixed it up (I think, see below) and can carry the fix as necessary.
> This is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
>
> --
> Cheers,
> Stephen Rothwell
>
> diff --cc arch/arm64/mm/proc.S
> index a0831bf8a018,0d85e6df42bc..
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@@ -40,9 -40,15 +40,15 @@@
>   #define TCR_CACHE_FLAGS   TCR_IRGN_WBWA | TCR_ORGN_WBWA
>
>   #ifdef CONFIG_KASAN_SW_TAGS
> - #define TCR_KASAN_FLAGS TCR_TBI1 | TCR_TBID1
>  -#define TCR_KASAN_SW_FLAGS TCR_TBI1
> ++#define TCR_KASAN_SW_FLAGS TCR_TBI1 | TCR_TBID1
>   #else
> - #define TCR_KASAN_FLAGS 0
> + #define TCR_KASAN_SW_FLAGS 0
> + #endif
> +
> + #ifdef CONFIG_KASAN_HW_TAGS
>  -#define TCR_KASAN_HW_FLAGS SYS_TCR_EL1_TCMA1 | TCR_TBI1
> ++#define TCR_KASAN_HW_FLAGS SYS_TCR_EL1_TCMA1 | TCR_TBI1 | TCR_TBID1
> + #else
> + #define TCR_KASAN_HW_FLAGS 0
>   #endif
>
>   /*

Thanks Stephen, that resolution looks correct to me.

Peter


Re: [PATCH v13 19/26] mm: Re-introduce do_mmap_pgoff()

2020-10-02 Thread Peter Collingbourne
On Fri, Oct 2, 2020 at 8:58 AM Yu, Yu-cheng  wrote:
>
> On 10/1/2020 7:06 PM, Peter Collingbourne wrote:
> > On Fri, Sep 25, 2020 at 7:57 AM Yu-cheng Yu  wrote:
> >>
> >> There was no more caller passing vm_flags to do_mmap(), and vm_flags was
> >> removed from the function's input by:
> >>
> >>  commit 45e55300f114 ("mm: remove unnecessary wrapper function 
> >> do_mmap_pgoff()").
> >>
> >> There is a new user now.  Shadow stack allocation passes VM_SHSTK to
> >> do_mmap().  Re-introduce the vm_flags and do_mmap_pgoff().
> >
> > I would prefer to change the callers to pass the additional 0 argument
> > instead of bringing the wrapper function back, but if we're going to
> > bring it back then we should fix the naming (both functions take a
> > pgoff argument, so the previous name do_mmap_pgoff() was just plain
> > confusing).
> >
> > Peter
> >
>
> Thanks for your feedback.  Here is the updated patch.  I will re-send
> the whole series later.
>
> Yu-cheng
>
> ==
>
>  From 6a9f1e6bcdb6e599a44d5f58cf4cebd28c4634a2 Mon Sep 17 00:00:00 2001
> From: Yu-cheng Yu 
> Date: Wed, 12 Aug 2020 14:01:58 -0700
> Subject: [PATCH 19/26] mm: Re-introduce do_mmap_pgoff()

The subject line of the commit message needs to be updated, but aside from that:

Reviewed-by: Peter Collingbourne 

Peter

>
> There was no more caller passing vm_flags to do_mmap(), and vm_flags was
> removed from the function's input by:
>
>  commit 45e55300f114 ("mm: remove unnecessary wrapper function
> do_mmap_pgoff()").
>
> There is a new user now.  Shadow stack allocation passes VM_SHSTK to
> do_mmap().  Re-introduce vm_flags to do_mmap(), but without the old wrapper
> do_mmap_pgoff().  Instead, fix all callers of the wrapper by passing a zero
> vm_flags to do_mmap().
>
> Signed-off-by: Yu-cheng Yu 
> Cc: Peter Collingbourne 
> Cc: Andrew Morton 
> Cc: Oleg Nesterov 
> Cc: linux...@kvack.org
> ---
>   fs/aio.c   |  2 +-
>   include/linux/mm.h |  3 ++-
>   ipc/shm.c  |  2 +-
>   mm/mmap.c  | 10 +-
>   mm/nommu.c |  4 ++--
>   mm/util.c  |  2 +-
>   6 files changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index d5ec30385566..ca8c11665eea 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -527,7 +527,7 @@ static int aio_setup_ring(struct kioctx *ctx,
> unsigned int nr_events)
>
> ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size,
>  PROT_READ | PROT_WRITE,
> -MAP_SHARED, 0, , NULL);
> +MAP_SHARED, 0, 0, , NULL);
> mmap_write_unlock(mm);
> if (IS_ERR((void *)ctx->mmap_base)) {
> ctx->mmap_size = 0;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e09d13699bbe..e020eea33138 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2560,7 +2560,8 @@ extern unsigned long mmap_region(struct file
> *file, unsigned long addr,
> struct list_head *uf);
>   extern unsigned long do_mmap(struct file *file, unsigned long addr,
> unsigned long len, unsigned long prot, unsigned long flags,
> -   unsigned long pgoff, unsigned long *populate, struct list_head *uf);
> +   vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate,
> +   struct list_head *uf);
>   extern int __do_munmap(struct mm_struct *, unsigned long, size_t,
>struct list_head *uf, bool downgrade);
>   extern int do_munmap(struct mm_struct *, unsigned long, size_t,
> diff --git a/ipc/shm.c b/ipc/shm.c
> index e25c7c6106bc..91474258933d 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -1556,7 +1556,7 @@ long do_shmat(int shmid, char __user *shmaddr, int
> shmflg,
> goto invalid;
> }
>
> -   addr = do_mmap(file, addr, size, prot, flags, 0, , NULL);
> +   addr = do_mmap(file, addr, size, prot, flags, 0, 0, , NULL);
> *raddr = addr;
> err = 0;
> if (IS_ERR_VALUE(addr))
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 574b3f273462..fc04184d2eae 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1365,11 +1365,11 @@ static inline bool file_mmap_ok(struct file
> *file, struct inode *inode,
>*/
>   unsigned long do_mmap(struct file *file, unsigned long addr,
> unsigned long len, unsigned long prot,
> -   unsigned long flags, unsigned long pgoff,
> -   unsigned long *populate, struct list_head *uf)
> +

Re: [PATCH v13 19/26] mm: Re-introduce do_mmap_pgoff()

2020-10-01 Thread Peter Collingbourne
On Fri, Sep 25, 2020 at 7:57 AM Yu-cheng Yu  wrote:
>
> There was no more caller passing vm_flags to do_mmap(), and vm_flags was
> removed from the function's input by:
>
> commit 45e55300f114 ("mm: remove unnecessary wrapper function 
> do_mmap_pgoff()").
>
> There is a new user now.  Shadow stack allocation passes VM_SHSTK to
> do_mmap().  Re-introduce the vm_flags and do_mmap_pgoff().

I would prefer to change the callers to pass the additional 0 argument
instead of bringing the wrapper function back, but if we're going to
bring it back then we should fix the naming (both functions take a
pgoff argument, so the previous name do_mmap_pgoff() was just plain
confusing).

Peter


Re: [PATCH] ACPICA: fix UBSAN warning using __builtin_offsetof

2020-06-01 Thread Peter Collingbourne
On Mon, Jun 1, 2020 at 4:18 PM Nick Desaulniers  wrote:
>
> Will reported UBSAN warnings:
> UBSAN: null-ptr-deref in drivers/acpi/acpica/tbfadt.c:459:37
> UBSAN: null-ptr-deref in arch/arm64/kernel/smp.c:596:6
>
> Looks like the emulated offsetof macro ACPI_OFFSET is causing these. We
> can avoid this by using the compiler builtin, __builtin_offsetof.

Would it be better to s/ACPI_OFFSET/offsetof/g the existing users of
this macro and remove it? It looks like offsetof is already being used
pervasively in the kernel, and its definition comes from
.

Peter


Peter

> The non-kernel runtime of UBSAN would print:
> runtime error: member access within null pointer of type
> for this macro.
>
> Link: https://lore.kernel.org/lkml/20200521100952.GA5360@willie-the-truck/
> Cc: sta...@vger.kernel.org
> Reported-by: Will Deacon 
> Suggested-by: Ard Biesheuvel 
> Signed-off-by: Nick Desaulniers 
> ---
>  include/acpi/actypes.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/acpi/actypes.h b/include/acpi/actypes.h
> index 4defed58ea33..04359c70b198 100644
> --- a/include/acpi/actypes.h
> +++ b/include/acpi/actypes.h
> @@ -508,7 +508,7 @@ typedef u64 acpi_integer;
>
>  #define ACPI_TO_POINTER(i)  ACPI_CAST_PTR (void, (acpi_size) (i))
>  #define ACPI_TO_INTEGER(p)  ACPI_PTR_DIFF (p, (void *) 0)
> -#define ACPI_OFFSET(d, f)   ACPI_PTR_DIFF (&(((d *) 0)->f), 
> (void *) 0)
> +#define ACPI_OFFSET(d, f)   __builtin_offsetof(d, f)
>  #define ACPI_PHYSADDR_TO_PTR(i) ACPI_TO_POINTER(i)
>  #define ACPI_PTR_TO_PHYSADDR(i) ACPI_TO_INTEGER(i)
>
> --
> 2.27.0.rc2.251.g90737beb825-goog
>


Re: linux-next: build failure after merge of the arm64 tree

2019-08-07 Thread Peter Collingbourne
On Wed, Aug 7, 2019 at 8:25 AM Will Deacon  wrote:
>
> Hello Masahiro,
>
> On Wed, Aug 07, 2019 at 11:43:02PM +0900, Masahiro Yamada wrote:
> > On Wed, Aug 7, 2019 at 8:46 PM Will Deacon  wrote:
> > > On Tue, Aug 06, 2019 at 07:34:36PM -0700, Peter Collingbourne wrote:
> > > > We're setting NM to something else based on a config option, which I
> > > > presume sets up some sort of circular dependency that confuses
> > > > Kconfig. Removing this fragment of the makefile (or appending
> > > > --synthetic unconditionally) also makes the problem go away.
> >
> > Exactly. This makes a circular dependency.
> > Kconfig determines the environment variable 'NM' has been changed,
> > and re-runs.
> >
> > > Yes, I think you're right. The lack of something like KBUILD_NMFLAGS means
> > > that architectures are forced to override NM entirely if they want to pass
> > > any specific options. Making that conditional on a Kconfig option appears
> > > to send the entire thing recursive.
> >
> > Adding KBUILD_NMFLAGS is probably the right thing to do.
> > (Is there somebody who wants to volunteer for this?)
>
> I don't think so ;) We don't do this for many other tools, and there's only
> this one case that really seems to require it.
>
> > But, for this particular case, I vote for
> > the entire removal of --synthetic.
> >
> > This dates back to 2004, and the commit message
> > did not clearly explain why it was needed.
> >
> > The PowerPC maintainers should re-evaluate
> > whether or not this flag is necessary.
> >
> > ppc32 is working without --synthetic,
> > so probably ppc64 would be ...
>
> Again, this is up to the PPC maintainers.
>
> > > diff --git a/init/Kconfig b/init/Kconfig
> > > index d96127ebc44e..a38027a06b79 100644
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -31,7 +31,7 @@ config CC_HAS_ASM_GOTO
> > > def_bool $(success,$(srctree)/scripts/gcc-goto.sh $(CC))
> > >
> > >  config TOOLS_SUPPORT_RELR
> > > -   def_bool $(success,env "CC=$(CC)" "LD=$(LD)" "NM=$(NM)" 
> > > "OBJCOPY=$(OBJCOPY)" $(srctree)/scripts/tools-support-relr.sh)
> > > +   def_bool $(success,env "CC=$(CC)" "LD=$(LD)" 
> > > "NM=$(CROSS_COMPILE)nm" "OBJCOPY=$(OBJCOPY)" 
> > > $(srctree)/scripts/tools-support-relr.sh)
> > >
> > >  config CC_HAS_WARN_MAYBE_UNINITIALIZED
> > > def_bool $(cc-option,-Wmaybe-uninitialized)
> >
> > Maybe,
> >
> > def_bool $(success,env "CC=$(CC)" "LD=$(LD)" "OBJCOPY=$(OBJCOPY)"
> > $(srctree)/scripts/tools-support-relr.sh)
> >
> > will work.
>
> Oh yes, I prefer that. I've included the updated patch below, and I'll
> put it into -next shortly so that we resolve the build breakage. Hopefully
> we'll have a better fix once the ozlabs folks wake up.
>
> > Or more simply
> >
> > def_bool $(success,$(srctree)/scripts/tools-support-relr.sh)
> >
> > CC, LD, OBJCOPY, NM are environment variables,
> > so I think tools-support-relr.sh can directly use them.
>
> Maybe, but the other scripts invoked here tend to pass $(CC) through as
> an argument, and that helps with your observation below...:
>
> > However, this bypasses the detection of environment variable changes.
> > If a user passes NM= from the command line, Kconfig will no longer
> > notice the change of NM.
>
> ... but since my proposal also breaks this, then I think your idea of just
> dropping $NM for now is best.

Thanks Will, that sounds good to me as well. I verified that it fixes
the hang in ppc64 on my end, and that we can still produce RELR
relocations on arm64 by passing in llvm-nm as $NM.

> Will
>
> --->8
>
> From 71c67a31f09fa8fdd1495dffd96a5f0d4cef2ede Mon Sep 17 00:00:00 2001
> From: Will Deacon 
> Date: Wed, 7 Aug 2019 12:48:33 +0100
> Subject: [PATCH] init/Kconfig: Fix infinite Kconfig recursion on PPC
>
> Commit 5cf896fb6be3 ("arm64: Add support for relocating the kernel with
> RELR relocations") introduced CONFIG_TOOLS_SUPPORT_RELR, which checks
> for RELR support in the toolchain as part of the kernel configuration.
> During this procedure, "$(NM)" is invoked to see if it supports the new
> relocation format, however PowerPC conditionally overrides this variable
> in the architecture Makefile in order to pass '--synthetic' when
> targetting PPC64.
>
> This conditional override causes Kconfig to recurse forev

Re: linux-next: build failure after merge of the arm64 tree

2019-08-06 Thread Peter Collingbourne
On Tue, Aug 6, 2019 at 4:50 PM Stephen Rothwell  wrote:
>
> Hi all,
>
> After merging the arm64 tree, today's linux-next build (powerpc
> ppc64_defconfig) was just spinning in make - it executing some scripts,
> but it was hard to catch just what.
>
> Apparently caused by commit
>
>   5cf896fb6be3 ("arm64: Add support for relocating the kernel with RELR 
> relocations")
>
> I have not idea why, but reverting the above commit allows to build
> to finish.

Okay, I can reproduce with:

$ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- defconfig
*** Default configuration is based on 'ppc64_defconfig'
#
# No change to .config
#
$ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j72
scripts/kconfig/conf  --syncconfig Kconfig
scripts/kconfig/conf  --syncconfig Kconfig
scripts/kconfig/conf  --syncconfig Kconfig
[...]

And confirmed that backing out my patch fixes it.

The problem seems to come from the use of $(NM) in the Kconfig file.
If I apply this diff:

diff --git a/init/Kconfig b/init/Kconfig
index d96127ebc44e0..177a95b323230 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -31,7 +31,7 @@ config CC_HAS_ASM_GOTO
def_bool $(success,$(srctree)/scripts/gcc-goto.sh $(CC))

 config TOOLS_SUPPORT_RELR
-   def_bool $(success,env "CC=$(CC)" "LD=$(LD)" "NM=$(NM)"
"OBJCOPY=$(OBJCOPY)" $(srctree)/scripts/tools-support-relr.sh)
+   def_bool $(success,$(NM))

 config CC_HAS_WARN_MAYBE_UNINITIALIZED
def_bool $(cc-option,-Wmaybe-uninitialized)

I still see the hang. Replacing $(NM) with something else makes it go away.

That leads me to ask what is special about $(NM) + powerpc? It turns
out to be this fragment of arch/powerpc/Makefile:

ifdef CONFIG_PPC64
new_nm := $(shell if $(NM) --help 2>&1 | grep -- '--synthetic' >
/dev/null; then echo y; else echo n; fi)

ifeq ($(new_nm),y)
NM  := $(NM) --synthetic
endif
endif

We're setting NM to something else based on a config option, which I
presume sets up some sort of circular dependency that confuses
Kconfig. Removing this fragment of the makefile (or appending
--synthetic unconditionally) also makes the problem go away.

We should at least able to remove the test for a new-enough binutils.
According to changes.rst we require binutils 2.21 which was released
in 2011, and support for --synthetic was added to binutils in 2004:
https://github.com/bminor/binutils-gdb/commit/0873df2aec48685715d2c5041c1f6f4ed43976c1

But why are we passing --synthetic at all on ppc64? This behaviour
seems to come from this commit from 2004:
https://github.com/mpe/linux-fullhistory/commit/0e32679a4ea48a634d94e97355d47512ef14d71f
which states: "On new toolchains we need to use nm --synthetic or we
miss code symbols."

But I only see a couple of missing symbols if I compare the output of
nm with and without --synthetic on a powerpc64 kernel:

$ diff -u <(powerpc-linux-gnu-nm --synthetic vmlinux)
<(powerpc-linux-gnu-nm  vmlinux)
--- /dev/fd/63 2019-08-06 18:48:56.127020621 -0700
+++ /dev/fd/62 2019-08-06 18:48:56.131020636 -0700
@@ -74840,7 +74840,6 @@
 c1901b10 D LZ4_decompress_fast
 c07819a0 T .LZ4_decompress_fast_continue
 c1901b70 D LZ4_decompress_fast_continue
-c07811e0 t .LZ4_decompress_fast_extDict
 c1901b40 d LZ4_decompress_fast_extDict
 c0781960 T .LZ4_decompress_fast_usingDict
 c1901b58 D LZ4_decompress_fast_usingDict
@@ -74856,7 +74855,6 @@
 c1901bd0 D LZ4_decompress_safe_usingDict
 c07822b0 T .LZ4_decompress_safe_withPrefix64k
 c1901b88 D LZ4_decompress_safe_withPrefix64k
-c0780c60 t .LZ4_decompress_safe_withSmallPrefix
 c1901b28 d LZ4_decompress_safe_withSmallPrefix
 c077fbe0 T .LZ4_setStreamDecode
 c1901ac8 D LZ4_setStreamDecode

I guess the problem was worse back in 2004. It almost certainly didn't
involve these particular symbols because they were added in commit
2209fda323e2fd2a2d0885595fd5097717f8d2aa from 2018. So I guess we have
a couple of possible quick fixes (assuming that the Kconfig issue
can't be solved somehow): either stop passing --synthetic on powerpc
and lose a couple of symbols in 64-bit kernels, or start passing it
unconditionally on powerpc (it doesn't seem to make a difference to
the nm output on a ppc64_defconfig kernel with CONFIG_PPC64=n). I'm
cc'ing the powerpc maintainers for their opinion on what to do. While
this is being resolved we should probably back out my patch from
-next.

Peter


[tip:timers/vdso] arm64: vdso: Build vDSO with -ffixed-x18

2019-06-23 Thread tip-bot for Peter Collingbourne
Commit-ID:  98cd3c3f83fbba27a6bacd75ad12e8388a61a32a
Gitweb: https://git.kernel.org/tip/98cd3c3f83fbba27a6bacd75ad12e8388a61a32a
Author: Peter Collingbourne 
AuthorDate: Fri, 21 Jun 2019 10:52:32 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 22 Jun 2019 21:21:06 +0200

arm64: vdso: Build vDSO with -ffixed-x18

The vDSO needs to be built with x18 reserved in order to accommodate
userspace platform ABIs built on top of Linux that use the register
to carry inter-procedural state, as provided for by the AAPCS.
An example of such a platform ABI is the one that will be used by an
upcoming version of Android.

Although this change is currently a no-op due to the fact that the vDSO
is currently implemented in pure assembly on arm64, it is necessary
in order to prepare for using the generic C implementation of the vDSO.

[ tglx: Massaged changelog ]

Signed-off-by: Peter Collingbourne 
Signed-off-by: Vincenzo Frascino 
Signed-off-by: Thomas Gleixner 
Tested-by: Shijith Thotton 
Tested-by: Andre Przywara 
Cc: linux-a...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-m...@vger.kernel.org
Cc: linux-kselft...@vger.kernel.org
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: Ralf Baechle 
Cc: Paul Burton 
Cc: Daniel Lezcano 
Cc: Mark Salyzyn 
Cc: Shuah Khan 
Cc: Dmitry Safonov <0x7f454...@gmail.com>
Cc: Rasmus Villemoes 
Cc: Huw Davies 
Cc: Mark Salyzyn 
Link: https://lkml.kernel.org/r/20190621095252.32307-6-vincenzo.frasc...@arm.com

---
 arch/arm64/kernel/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index 3acfc813e966..ec81d28aeb5d 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -20,7 +20,7 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
 ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 --hash-style=sysv \
--build-id -n -T
 
-ccflags-y := -fno-common -fno-builtin -fno-stack-protector
+ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18
 ccflags-y += -DDISABLE_BRANCH_PROFILING
 
 VDSO_LDFLAGS := -Bsymbolic