Re: [PATCH RFC v3 28/35] arm64: mte: swap: Handle tag restoring when missing tag storage
On Fri, Feb 2, 2024 at 6:56 AM Alexandru Elisei wrote: > > Hi Peter, > > On Thu, Feb 01, 2024 at 08:02:40PM -0800, Peter Collingbourne wrote: > > On Thu, Jan 25, 2024 at 8:45 AM Alexandru Elisei > > wrote: > > > > > > Linux restores tags when a page is swapped in and there are tags > > > associated > > > with the swap entry which the new page will replace. The saved tags are > > > restored even if the page will not be mapped as tagged, to protect against > > > cases where the page is shared between different VMAs, and is tagged in > > > some, but untagged in others. By using this approach, the process can > > > still > > > access the correct tags following an mprotect(PROT_MTE) on the non-MTE > > > enabled VMA. > > > > > > But this poses a challenge for managing tag storage: in the scenario > > > above, > > > when a new page is allocated to be swapped in for the process where it > > > will > > > be mapped as untagged, the corresponding tag storage block is not > > > reserved. > > > mte_restore_page_tags_by_swp_entry(), when it restores the saved tags, > > > will > > > overwrite data in the tag storage block associated with the new page, > > > leading to data corruption if the block is in use by a process. > > > > > > Get around this issue by saving the tags in a new xarray, this time > > > indexed > > > by the page pfn, and then restoring them when tag storage is reserved for > > > the page. > > > > > > Signed-off-by: Alexandru Elisei > > > --- > > > > > > Changes since rfc v2: > > > > > > * Restore saved tags **before** setting the PG_tag_storage_reserved bit to > > > eliminate a brief window of opportunity where userspace can access > > > uninitialized > > > tags (Peter Collingbourne). > > > > > > arch/arm64/include/asm/mte_tag_storage.h | 8 ++ > > > arch/arm64/include/asm/pgtable.h | 11 +++ > > > arch/arm64/kernel/mte_tag_storage.c | 12 ++- > > > arch/arm64/mm/mteswap.c | 110 +++ > > > 4 files changed, 140 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/arm64/include/asm/mte_tag_storage.h > > > b/arch/arm64/include/asm/mte_tag_storage.h > > > index 50bdae94cf71..40590a8c3748 100644 > > > --- a/arch/arm64/include/asm/mte_tag_storage.h > > > +++ b/arch/arm64/include/asm/mte_tag_storage.h > > > @@ -36,6 +36,14 @@ bool page_is_tag_storage(struct page *page); > > > > > > vm_fault_t handle_folio_missing_tag_storage(struct folio *folio, struct > > > vm_fault *vmf, > > > bool *map_pte); > > > +vm_fault_t mte_try_transfer_swap_tags(swp_entry_t entry, struct page > > > *page); > > > + > > > +void tags_by_pfn_lock(void); > > > +void tags_by_pfn_unlock(void); > > > + > > > +void *mte_erase_tags_for_pfn(unsigned long pfn); > > > +bool mte_save_tags_for_pfn(void *tags, unsigned long pfn); > > > +void mte_restore_tags_for_pfn(unsigned long start_pfn, int order); > > > #else > > > static inline bool tag_storage_enabled(void) > > > { > > > diff --git a/arch/arm64/include/asm/pgtable.h > > > b/arch/arm64/include/asm/pgtable.h > > > index 0174e292f890..87ae59436162 100644 > > > --- a/arch/arm64/include/asm/pgtable.h > > > +++ b/arch/arm64/include/asm/pgtable.h > > > @@ -1085,6 +1085,17 @@ static inline void arch_swap_invalidate_area(int > > > type) > > > mte_invalidate_tags_area_by_swp_entry(type); > > > } > > > > > > +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE > > > +#define __HAVE_ARCH_SWAP_PREPARE_TO_RESTORE > > > +static inline vm_fault_t arch_swap_prepare_to_restore(swp_entry_t entry, > > > + struct folio *folio) > > > +{ > > > + if (tag_storage_enabled()) > > > + return mte_try_transfer_swap_tags(entry, >page); > > > + return 0; > > > +} > > > +#endif > > > + > > > #define __HAVE_ARCH_SWAP_RESTORE > > > static inline void arch_swap_restore(swp_entry_t entry, struct folio > > > *folio) > > > { > > > diff --git a/arch/arm64/kernel/mte_tag_storage.c > > > b/arch/arm64/kernel/mte_tag_storage.c > > > index afe2bb754879..ac7b9c9c585c
Re: [PATCH RFC v3 28/35] arm64: mte: swap: Handle tag restoring when missing tag storage
On Thu, Jan 25, 2024 at 8:45 AM Alexandru Elisei wrote: > > Linux restores tags when a page is swapped in and there are tags associated > with the swap entry which the new page will replace. The saved tags are > restored even if the page will not be mapped as tagged, to protect against > cases where the page is shared between different VMAs, and is tagged in > some, but untagged in others. By using this approach, the process can still > access the correct tags following an mprotect(PROT_MTE) on the non-MTE > enabled VMA. > > But this poses a challenge for managing tag storage: in the scenario above, > when a new page is allocated to be swapped in for the process where it will > be mapped as untagged, the corresponding tag storage block is not reserved. > mte_restore_page_tags_by_swp_entry(), when it restores the saved tags, will > overwrite data in the tag storage block associated with the new page, > leading to data corruption if the block is in use by a process. > > Get around this issue by saving the tags in a new xarray, this time indexed > by the page pfn, and then restoring them when tag storage is reserved for > the page. > > Signed-off-by: Alexandru Elisei > --- > > Changes since rfc v2: > > * Restore saved tags **before** setting the PG_tag_storage_reserved bit to > eliminate a brief window of opportunity where userspace can access > uninitialized > tags (Peter Collingbourne). > > arch/arm64/include/asm/mte_tag_storage.h | 8 ++ > arch/arm64/include/asm/pgtable.h | 11 +++ > arch/arm64/kernel/mte_tag_storage.c | 12 ++- > arch/arm64/mm/mteswap.c | 110 +++ > 4 files changed, 140 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/include/asm/mte_tag_storage.h > b/arch/arm64/include/asm/mte_tag_storage.h > index 50bdae94cf71..40590a8c3748 100644 > --- a/arch/arm64/include/asm/mte_tag_storage.h > +++ b/arch/arm64/include/asm/mte_tag_storage.h > @@ -36,6 +36,14 @@ bool page_is_tag_storage(struct page *page); > > vm_fault_t handle_folio_missing_tag_storage(struct folio *folio, struct > vm_fault *vmf, > bool *map_pte); > +vm_fault_t mte_try_transfer_swap_tags(swp_entry_t entry, struct page *page); > + > +void tags_by_pfn_lock(void); > +void tags_by_pfn_unlock(void); > + > +void *mte_erase_tags_for_pfn(unsigned long pfn); > +bool mte_save_tags_for_pfn(void *tags, unsigned long pfn); > +void mte_restore_tags_for_pfn(unsigned long start_pfn, int order); > #else > static inline bool tag_storage_enabled(void) > { > diff --git a/arch/arm64/include/asm/pgtable.h > b/arch/arm64/include/asm/pgtable.h > index 0174e292f890..87ae59436162 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -1085,6 +1085,17 @@ static inline void arch_swap_invalidate_area(int type) > mte_invalidate_tags_area_by_swp_entry(type); > } > > +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE > +#define __HAVE_ARCH_SWAP_PREPARE_TO_RESTORE > +static inline vm_fault_t arch_swap_prepare_to_restore(swp_entry_t entry, > + struct folio *folio) > +{ > + if (tag_storage_enabled()) > + return mte_try_transfer_swap_tags(entry, >page); > + return 0; > +} > +#endif > + > #define __HAVE_ARCH_SWAP_RESTORE > static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio) > { > diff --git a/arch/arm64/kernel/mte_tag_storage.c > b/arch/arm64/kernel/mte_tag_storage.c > index afe2bb754879..ac7b9c9c585c 100644 > --- a/arch/arm64/kernel/mte_tag_storage.c > +++ b/arch/arm64/kernel/mte_tag_storage.c > @@ -567,6 +567,7 @@ int reserve_tag_storage(struct page *page, int order, > gfp_t gfp) > } > } > > + mte_restore_tags_for_pfn(page_to_pfn(page), order); > page_set_tag_storage_reserved(page, order); > out_unlock: > mutex_unlock(_blocks_lock); > @@ -595,7 +596,8 @@ void free_tag_storage(struct page *page, int order) > struct tag_region *region; > unsigned long page_va; > unsigned long flags; > - int ret; > + void *tags; > + int i, ret; > > ret = tag_storage_find_block(page, _block, ); > if (WARN_ONCE(ret, "Missing tag storage block for pfn 0x%lx", > page_to_pfn(page))) > @@ -605,6 +607,14 @@ void free_tag_storage(struct page *page, int order) > /* Avoid writeback of dirty tag cache lines corrupting data. */ > dcache_inval_tags_poc(page_va, page_va + (PAGE_SIZE << order)); > > + tags_by_pfn_lock(); > + for (i = 0; i < (1 << order)
Re: [PATCH RFC v3 23/35] arm64: mte: Try to reserve tag storage in arch_alloc_page()
On Thu, Jan 25, 2024 at 8:45 AM Alexandru Elisei wrote: > > Reserve tag storage for a page that is being allocated as tagged. This > is a best effort approach, and failing to reserve tag storage is > allowed. > > When all the associated tagged pages have been freed, return the tag > storage pages back to the page allocator, where they can be used again for > data allocations. > > Signed-off-by: Alexandru Elisei > --- > > Changes since rfc v2: > > * Based on rfc v2 patch #16 ("arm64: mte: Manage tag storage on page > allocation"). > * Fixed calculation of the number of associated tag storage blocks (Hyesoo > Yu). > * Tag storage is reserved in arch_alloc_page() instead of > arch_prep_new_page(). > > arch/arm64/include/asm/mte.h | 16 +- > arch/arm64/include/asm/mte_tag_storage.h | 31 +++ > arch/arm64/include/asm/page.h| 5 + > arch/arm64/include/asm/pgtable.h | 19 ++ > arch/arm64/kernel/mte_tag_storage.c | 234 +++ > arch/arm64/mm/fault.c| 7 + > fs/proc/page.c | 1 + > include/linux/kernel-page-flags.h| 1 + > include/linux/page-flags.h | 1 + > include/trace/events/mmflags.h | 3 +- > mm/huge_memory.c | 1 + > 11 files changed, 316 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h > index 8034695b3dd7..6457b7899207 100644 > --- a/arch/arm64/include/asm/mte.h > +++ b/arch/arm64/include/asm/mte.h > @@ -40,12 +40,24 @@ void mte_free_tag_buf(void *buf); > #ifdef CONFIG_ARM64_MTE > > /* track which pages have valid allocation tags */ > -#define PG_mte_tagged PG_arch_2 > +#define PG_mte_tagged PG_arch_2 > /* simple lock to avoid multiple threads tagging the same page */ > -#define PG_mte_lockPG_arch_3 > +#define PG_mte_lockPG_arch_3 > +/* Track if a tagged page has tag storage reserved */ > +#define PG_tag_storage_reservedPG_arch_4 > + > +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE > +DECLARE_STATIC_KEY_FALSE(tag_storage_enabled_key); > +extern bool page_tag_storage_reserved(struct page *page); > +#endif > > static inline void set_page_mte_tagged(struct page *page) > { > +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE > + /* Open code mte_tag_storage_enabled() */ > + WARN_ON_ONCE(static_branch_likely(_storage_enabled_key) && > +!page_tag_storage_reserved(page)); > +#endif > /* > * Ensure that the tags written prior to this function are visible > * before the page flags update. > diff --git a/arch/arm64/include/asm/mte_tag_storage.h > b/arch/arm64/include/asm/mte_tag_storage.h > index 7b3f6bff8e6f..09f1318d924e 100644 > --- a/arch/arm64/include/asm/mte_tag_storage.h > +++ b/arch/arm64/include/asm/mte_tag_storage.h > @@ -5,6 +5,12 @@ > #ifndef __ASM_MTE_TAG_STORAGE_H > #define __ASM_MTE_TAG_STORAGE_H > > +#ifndef __ASSEMBLY__ > + > +#include > + > +#include > + > #ifdef CONFIG_ARM64_MTE_TAG_STORAGE > > DECLARE_STATIC_KEY_FALSE(tag_storage_enabled_key); > @@ -15,6 +21,15 @@ static inline bool tag_storage_enabled(void) > } > > void mte_init_tag_storage(void); > + > +static inline bool alloc_requires_tag_storage(gfp_t gfp) > +{ > + return gfp & __GFP_TAGGED; > +} > +int reserve_tag_storage(struct page *page, int order, gfp_t gfp); > +void free_tag_storage(struct page *page, int order); > + > +bool page_tag_storage_reserved(struct page *page); > #else > static inline bool tag_storage_enabled(void) > { > @@ -23,6 +38,22 @@ static inline bool tag_storage_enabled(void) > static inline void mte_init_tag_storage(void) > { > } > +static inline bool alloc_requires_tag_storage(struct page *page) This function should take a gfp_t to match the CONFIG_ARM64_MTE_TAG_STORAGE case. Peter > +{ > + return false; > +} > +static inline int reserve_tag_storage(struct page *page, int order, gfp_t > gfp) > +{ > + return 0; > +} > +static inline void free_tag_storage(struct page *page, int order) > +{ > +} > +static inline bool page_tag_storage_reserved(struct page *page) > +{ > + return true; > +} > #endif /* CONFIG_ARM64_MTE_TAG_STORAGE */ > > +#endif /* !__ASSEMBLY__ */ > #endif /* __ASM_MTE_TAG_STORAGE_H */ > diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h > index 88bab032a493..3a656492f34a 100644 > --- a/arch/arm64/include/asm/page.h > +++ b/arch/arm64/include/asm/page.h > @@ -35,6 +35,11 @@ void copy_highpage(struct page *to, struct page *from); > void tag_clear_highpage(struct page *to); > #define __HAVE_ARCH_TAG_CLEAR_HIGHPAGE > > +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE > +void arch_alloc_page(struct page *, int order, gfp_t gfp); > +#define HAVE_ARCH_ALLOC_PAGE > +#endif > + > #define clear_user_page(page, vaddr, pg) clear_page(page) > #define copy_user_page(to, from, vaddr, pg)copy_page(to, from) > > diff --git
Re: [PATCH RFC v3 11/35] mm: Allow an arch to hook into folio allocation when VMA is known
On Thu, Jan 25, 2024 at 8:43 AM Alexandru Elisei wrote: > > arm64 uses VM_HIGH_ARCH_0 and VM_HIGH_ARCH_1 for enabling MTE for a VMA. > When VM_HIGH_ARCH_0, which arm64 renames to VM_MTE, is set for a VMA, and > the gfp flag __GFP_ZERO is present, the __GFP_ZEROTAGS gfp flag also gets > set in vma_alloc_zeroed_movable_folio(). > > Expand this to be more generic by adding an arch hook that modifes the gfp > flags for an allocation when the VMA is known. > > Note that __GFP_ZEROTAGS is ignored by the page allocator unless __GFP_ZERO > is also set; from that point of view, the current behaviour is unchanged, > even though the arm64 flag is set in more places. When arm64 will have > support to reuse the tag storage for data allocation, the uses of the > __GFP_ZEROTAGS flag will be expanded to instruct the page allocator to try > to reserve the corresponding tag storage for the pages being allocated. > > The flags returned by arch_calc_vma_gfp() are or'ed with the flags set by > the caller; this has been done to keep an architecture from modifying the > flags already set by the core memory management code; this is similar to > how do_mmap() -> calc_vm_flag_bits() -> arch_calc_vm_flag_bits() has been > implemented. This can be revisited in the future if there's a need to do > so. > > Signed-off-by: Alexandru Elisei This patch also needs to update the non-CONFIG_NUMA definition of vma_alloc_folio in include/linux/gfp.h to call arch_calc_vma_gfp. See: https://r.android.com/2849146 Peter
Re: [PATCH RFC v2 20/27] mm: hugepage: Handle huge page fault on access
On Sun, Nov 19, 2023 at 8:59 AM Alexandru Elisei wrote: > > Handle PAGE_FAULT_ON_ACCESS faults for huge pages in a similar way to > regular pages. > > Signed-off-by: Alexandru Elisei > --- > arch/arm64/include/asm/mte_tag_storage.h | 1 + > arch/arm64/include/asm/pgtable.h | 7 ++ > arch/arm64/mm/fault.c| 81 > include/linux/huge_mm.h | 2 + > include/linux/pgtable.h | 5 ++ > mm/huge_memory.c | 4 +- > mm/memory.c | 3 + > 7 files changed, 101 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/mte_tag_storage.h > b/arch/arm64/include/asm/mte_tag_storage.h > index c70ced60a0cd..b97406d369ce 100644 > --- a/arch/arm64/include/asm/mte_tag_storage.h > +++ b/arch/arm64/include/asm/mte_tag_storage.h > @@ -35,6 +35,7 @@ void free_tag_storage(struct page *page, int order); > bool page_tag_storage_reserved(struct page *page); > > vm_fault_t handle_page_missing_tag_storage(struct vm_fault *vmf); > +vm_fault_t handle_huge_page_missing_tag_storage(struct vm_fault *vmf); > #else > static inline bool tag_storage_enabled(void) > { > diff --git a/arch/arm64/include/asm/pgtable.h > b/arch/arm64/include/asm/pgtable.h > index 8cc135f1c112..1704411c096d 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -477,6 +477,13 @@ static inline vm_fault_t > arch_do_page_fault_on_access(struct vm_fault *vmf) > return handle_page_missing_tag_storage(vmf); > return VM_FAULT_SIGBUS; > } > + > +static inline vm_fault_t arch_do_huge_page_fault_on_access(struct vm_fault > *vmf) > +{ > + if (tag_storage_enabled()) > + return handle_huge_page_missing_tag_storage(vmf); > + return VM_FAULT_SIGBUS; > +} > #endif /* CONFIG_ARCH_HAS_FAULT_ON_ACCESS */ > > #define pmd_present_invalid(pmd) (!!(pmd_val(pmd) & PMD_PRESENT_INVALID)) > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > index f5fa583acf18..6730a0812a24 100644 > --- a/arch/arm64/mm/fault.c > +++ b/arch/arm64/mm/fault.c > @@ -1041,6 +1041,87 @@ vm_fault_t handle_page_missing_tag_storage(struct > vm_fault *vmf) > > return 0; > > +out_retry: > + put_page(page); > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) > + vma_end_read(vma); > + if (fault_flag_allow_retry_first(vmf->flags)) { > + err = VM_FAULT_RETRY; > + } else { > + /* Replay the fault. */ > + err = 0; > + } > + return err; > +} > + > +vm_fault_t handle_huge_page_missing_tag_storage(struct vm_fault *vmf) > +{ > + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; > + struct vm_area_struct *vma = vmf->vma; > + pmd_t old_pmd, new_pmd; > + bool writable = false; > + struct page *page; > + vm_fault_t err; > + int ret; > + > + vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); > + if (unlikely(!pmd_same(vmf->orig_pmd, *vmf->pmd))) { > + spin_unlock(vmf->ptl); > + return 0; > + } > + > + old_pmd = vmf->orig_pmd; > + new_pmd = pmd_modify(old_pmd, vma->vm_page_prot); > + > + /* > +* Detect now whether the PMD could be writable; this information > +* is only valid while holding the PT lock. > +*/ > + writable = pmd_write(new_pmd); > + if (!writable && vma_wants_manual_pte_write_upgrade(vma) && > + can_change_pmd_writable(vma, vmf->address, new_pmd)) > + writable = true; > + > + page = vm_normal_page_pmd(vma, haddr, new_pmd); > + if (!page) > + goto out_map; > + > + if (!(vma->vm_flags & VM_MTE)) > + goto out_map; > + > + get_page(page); > + vma_set_access_pid_bit(vma); > + > + spin_unlock(vmf->ptl); > + writable = false; > + > + if (unlikely(is_migrate_isolate_page(page))) > + goto out_retry; > + > + ret = reserve_tag_storage(page, HPAGE_PMD_ORDER, > GFP_HIGHUSER_MOVABLE); > + if (ret) > + goto out_retry; > + > + put_page(page); > + > + vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); > + if (unlikely(!pmd_same(old_pmd, *vmf->pmd))) { > + spin_unlock(vmf->ptl); > + return 0; > + } > + > +out_map: > + /* Restore the PMD */ > + new_pmd = pmd_modify(old_pmd, vma->vm_page_prot); > + new_pmd = pmd_mkyoung(new_pmd); > + if (writable) > + new_pmd = pmd_mkwrite(new_pmd, vma); > + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, new_pmd); > + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); > + spin_unlock(vmf->ptl); > + > + return 0; > + > out_retry: > put_page(page); > if (vmf->flags & FAULT_FLAG_VMA_LOCK) > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index
Re: [PATCH RFC 20/37] mm: compaction: Reserve metadata storage in compaction_alloc()
Hi Alexandru, On Wed, Aug 23, 2023 at 6:16 AM Alexandru Elisei wrote: > > If the source page being migrated has metadata associated with it, make > sure to reserve the metadata storage when choosing a suitable destination > page from the free list. > > Signed-off-by: Alexandru Elisei > --- > mm/compaction.c | 9 + > mm/internal.h | 1 + > 2 files changed, 10 insertions(+) > > diff --git a/mm/compaction.c b/mm/compaction.c > index cc0139fa0cb0..af2ee3085623 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -570,6 +570,7 @@ static unsigned long isolate_freepages_block(struct > compact_control *cc, > bool locked = false; > unsigned long blockpfn = *start_pfn; > unsigned int order; > + int ret; > > /* Strict mode is for isolation, speed is secondary */ > if (strict) > @@ -626,6 +627,11 @@ static unsigned long isolate_freepages_block(struct > compact_control *cc, > > /* Found a free page, will break it into order-0 pages */ > order = buddy_order(page); > + if (metadata_storage_enabled() && cc->reserve_metadata) { > + ret = reserve_metadata_storage(page, order, > cc->gfp_mask); At this point the zone lock is held and preemption is disabled, which makes it invalid to call reserve_metadata_storage. Peter > + if (ret) > + goto isolate_fail; > + } > isolated = __isolate_free_page(page, order); > if (!isolated) > break; > @@ -1757,6 +1763,9 @@ static struct folio *compaction_alloc(struct folio > *src, unsigned long data) > struct compact_control *cc = (struct compact_control *)data; > struct folio *dst; > > + if (metadata_storage_enabled()) > + cc->reserve_metadata = folio_has_metadata(src); > + > if (list_empty(>freepages)) { > isolate_freepages(cc); > > diff --git a/mm/internal.h b/mm/internal.h > index d28ac0085f61..046cc264bfbe 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -492,6 +492,7 @@ struct compact_control { > */ > bool alloc_contig; /* alloc_contig_range allocation */ > bool source_has_metadata; /* source pages have associated > metadata */ > + bool reserve_metadata; > }; > > /* > -- > 2.41.0 >
Re: [PATCH] kasan: fix kasan_byte_accessible() to be consistent with actual checks
On Mon, Apr 5, 2021 at 2:53 PM Andrey Konovalov wrote: > > On Mon, Apr 5, 2021 at 11:43 PM Peter Collingbourne wrote: > > > > We can sometimes end up with kasan_byte_accessible() being called > > on non-slab memory. For example ksize() and krealloc() may end up > > calling it on KFENCE allocated memory. In this case the memory will > > be tagged with KASAN_SHADOW_INIT, which a subsequent patch ("kasan: > > initialize shadow to TAG_INVALID for SW_TAGS") will set to the same > > value as KASAN_TAG_INVALID, causing kasan_byte_accessible() to fail > > when called on non-slab memory. > > > > This highlighted the fact that the check in kasan_byte_accessible() > > was inconsistent with checks as implemented for loads and stores > > (kasan_check_range() in SW tags mode and hardware-implemented > > checks in HW tags mode). kasan_check_range() does not have a > > check for KASAN_TAG_INVALID, and instead has a comparison against > > KASAN_SHADOW_START. In HW tags mode, we do not have either, but we > > do set TCR_EL1.TCMA which corresponds with the comparison against > > KASAN_TAG_KERNEL. > > > > Therefore, update kasan_byte_accessible() for both SW and HW tags > > modes to correspond with the respective checks on loads and stores. > > > > Link: > > https://linux-review.googlesource.com/id/Ic6d40803c57dcc6331bd97fbb9a60b0d38a65a36 > > Signed-off-by: Peter Collingbourne > > --- > > mm/kasan/kasan.h | 3 +-- > > mm/kasan/sw_tags.c | 8 +--- > > 2 files changed, 6 insertions(+), 5 deletions(-) > > > > diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h > > index 8c55634d6edd..e18e8da35255 100644 > > --- a/mm/kasan/kasan.h > > +++ b/mm/kasan/kasan.h > > @@ -368,8 +368,7 @@ static inline bool kasan_byte_accessible(const void > > *addr) > > u8 ptr_tag = get_tag(addr); > > u8 mem_tag = hw_get_mem_tag((void *)addr); > > > > - return (mem_tag != KASAN_TAG_INVALID) && > > - (ptr_tag == KASAN_TAG_KERNEL || ptr_tag == mem_tag); > > + return ptr_tag == KASAN_TAG_KERNEL || ptr_tag == mem_tag; > > } > > > > #else /* CONFIG_KASAN_HW_TAGS */ > > diff --git a/mm/kasan/sw_tags.c b/mm/kasan/sw_tags.c > > index 94c2d33be333..914225eeda99 100644 > > --- a/mm/kasan/sw_tags.c > > +++ b/mm/kasan/sw_tags.c > > @@ -121,10 +121,12 @@ bool kasan_check_range(unsigned long addr, size_t > > size, bool write, > > bool kasan_byte_accessible(const void *addr) > > { > > u8 tag = get_tag(addr); > > - u8 shadow_byte = READ_ONCE(*(u8 > > *)kasan_mem_to_shadow(kasan_reset_tag(addr))); > > + void *untagged_addr = kasan_reset_tag(addr); > > + u8 shadow_byte = READ_ONCE(*(u8 > > *)kasan_mem_to_shadow(untagged_addr)); > > Hi Peter, > > Let's move dereferencing shadow memory past the KASAN_SHADOW_START > check. Otherwise, in case the check is to fail, accessing shadow will > likely crash the kernel. > > Thanks! Makes sense, fixed in v2. Peter
[PATCH v2] kasan: fix kasan_byte_accessible() to be consistent with actual checks
We can sometimes end up with kasan_byte_accessible() being called on non-slab memory. For example ksize() and krealloc() may end up calling it on KFENCE allocated memory. In this case the memory will be tagged with KASAN_SHADOW_INIT, which a subsequent patch ("kasan: initialize shadow to TAG_INVALID for SW_TAGS") will set to the same value as KASAN_TAG_INVALID, causing kasan_byte_accessible() to fail when called on non-slab memory. This highlighted the fact that the check in kasan_byte_accessible() was inconsistent with checks as implemented for loads and stores (kasan_check_range() in SW tags mode and hardware-implemented checks in HW tags mode). kasan_check_range() does not have a check for KASAN_TAG_INVALID, and instead has a comparison against KASAN_SHADOW_START. In HW tags mode, we do not have either, but we do set TCR_EL1.TCMA which corresponds with the comparison against KASAN_TAG_KERNEL. Therefore, update kasan_byte_accessible() for both SW and HW tags modes to correspond with the respective checks on loads and stores. Link: https://linux-review.googlesource.com/id/Ic6d40803c57dcc6331bd97fbb9a60b0d38a65a36 Signed-off-by: Peter Collingbourne --- mm/kasan/kasan.h | 3 +-- mm/kasan/sw_tags.c | 10 +++--- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h index 8c55634d6edd..e18e8da35255 100644 --- a/mm/kasan/kasan.h +++ b/mm/kasan/kasan.h @@ -368,8 +368,7 @@ static inline bool kasan_byte_accessible(const void *addr) u8 ptr_tag = get_tag(addr); u8 mem_tag = hw_get_mem_tag((void *)addr); - return (mem_tag != KASAN_TAG_INVALID) && - (ptr_tag == KASAN_TAG_KERNEL || ptr_tag == mem_tag); + return ptr_tag == KASAN_TAG_KERNEL || ptr_tag == mem_tag; } #else /* CONFIG_KASAN_HW_TAGS */ diff --git a/mm/kasan/sw_tags.c b/mm/kasan/sw_tags.c index 94c2d33be333..00ae8913fc74 100644 --- a/mm/kasan/sw_tags.c +++ b/mm/kasan/sw_tags.c @@ -121,10 +121,14 @@ bool kasan_check_range(unsigned long addr, size_t size, bool write, bool kasan_byte_accessible(const void *addr) { u8 tag = get_tag(addr); - u8 shadow_byte = READ_ONCE(*(u8 *)kasan_mem_to_shadow(kasan_reset_tag(addr))); + void *untagged_addr = kasan_reset_tag(addr); + u8 shadow_byte; - return (shadow_byte != KASAN_TAG_INVALID) && - (tag == KASAN_TAG_KERNEL || tag == shadow_byte); + if (untagged_addr < kasan_shadow_to_mem((void *)KASAN_SHADOW_START)) + return false; + + shadow_byte = READ_ONCE(*(u8 *)kasan_mem_to_shadow(untagged_addr)); + return tag == KASAN_TAG_KERNEL || tag == shadow_byte; } #define DEFINE_HWASAN_LOAD_STORE(size) \ -- 2.31.0.208.g409f899ff0-goog
[PATCH] kasan: fix kasan_byte_accessible() to be consistent with actual checks
We can sometimes end up with kasan_byte_accessible() being called on non-slab memory. For example ksize() and krealloc() may end up calling it on KFENCE allocated memory. In this case the memory will be tagged with KASAN_SHADOW_INIT, which a subsequent patch ("kasan: initialize shadow to TAG_INVALID for SW_TAGS") will set to the same value as KASAN_TAG_INVALID, causing kasan_byte_accessible() to fail when called on non-slab memory. This highlighted the fact that the check in kasan_byte_accessible() was inconsistent with checks as implemented for loads and stores (kasan_check_range() in SW tags mode and hardware-implemented checks in HW tags mode). kasan_check_range() does not have a check for KASAN_TAG_INVALID, and instead has a comparison against KASAN_SHADOW_START. In HW tags mode, we do not have either, but we do set TCR_EL1.TCMA which corresponds with the comparison against KASAN_TAG_KERNEL. Therefore, update kasan_byte_accessible() for both SW and HW tags modes to correspond with the respective checks on loads and stores. Link: https://linux-review.googlesource.com/id/Ic6d40803c57dcc6331bd97fbb9a60b0d38a65a36 Signed-off-by: Peter Collingbourne --- mm/kasan/kasan.h | 3 +-- mm/kasan/sw_tags.c | 8 +--- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h index 8c55634d6edd..e18e8da35255 100644 --- a/mm/kasan/kasan.h +++ b/mm/kasan/kasan.h @@ -368,8 +368,7 @@ static inline bool kasan_byte_accessible(const void *addr) u8 ptr_tag = get_tag(addr); u8 mem_tag = hw_get_mem_tag((void *)addr); - return (mem_tag != KASAN_TAG_INVALID) && - (ptr_tag == KASAN_TAG_KERNEL || ptr_tag == mem_tag); + return ptr_tag == KASAN_TAG_KERNEL || ptr_tag == mem_tag; } #else /* CONFIG_KASAN_HW_TAGS */ diff --git a/mm/kasan/sw_tags.c b/mm/kasan/sw_tags.c index 94c2d33be333..914225eeda99 100644 --- a/mm/kasan/sw_tags.c +++ b/mm/kasan/sw_tags.c @@ -121,10 +121,12 @@ bool kasan_check_range(unsigned long addr, size_t size, bool write, bool kasan_byte_accessible(const void *addr) { u8 tag = get_tag(addr); - u8 shadow_byte = READ_ONCE(*(u8 *)kasan_mem_to_shadow(kasan_reset_tag(addr))); + void *untagged_addr = kasan_reset_tag(addr); + u8 shadow_byte = READ_ONCE(*(u8 *)kasan_mem_to_shadow(untagged_addr)); - return (shadow_byte != KASAN_TAG_INVALID) && - (tag == KASAN_TAG_KERNEL || tag == shadow_byte); + return untagged_addr >= + kasan_shadow_to_mem((void *)KASAN_SHADOW_START) && + (tag == KASAN_TAG_KERNEL || tag == shadow_byte); } #define DEFINE_HWASAN_LOAD_STORE(size) \ -- 2.31.0.208.g409f899ff0-goog
Re: [PATCH] kfence: unpoison pool region before use
On Sat, Apr 3, 2021 at 4:52 PM Andrey Konovalov wrote: > > On Sun, Apr 4, 2021 at 12:31 AM Marco Elver wrote: > > > > However, given the above, I think we need to explain this in the > > commit message (which also makes the dependency between these 2 > > patches clear) and add a comment above the new kasan_unpoison_range(). > > That is, if we still think this is the right fix -- I'm not entirely > > sure it is. > > > > Because what I gather from "kasan: initialize shadow to TAG_INVALID > > for SW_TAGS", is the requirement that "0xFF pointer tag is a match-all > > tag, it doesn't matter what tag the accessed memory has". > > > > While KFENCE memory is accessible through the slab API, and in this > > case ksize() calling kasan_check_byte() leading to a failure, the > > kasan_check_byte() call is part of the public KASAN API. Which means > > that if some subsystem decides to memblock_alloc() some memory, and > > wishes to use kasan_check_byte() on that memory but with an untagged > > pointer, will get the same problem as KFENCE: with generic and HW_TAGS > > mode everything is fine, but with SW_TAGS mode things break. > > It makes sense to allow this function to operate on any kind of > memory, including memory that hasn't been previously marked by KASAN. > > > To me this indicates the fix is not with KFENCE, but should be in > > mm/kasan/sw_tags.c:kasan_byte_accessible(), which should not load the > > shadow when the pointer is untagged. > > The problem isn't in accessing shadow per se. Looking at > kasan_byte_accessible() (in both sw_tags.c and kasan.h), the return > statement there seems just wrong and redundant. The KASAN_TAG_KERNEL > check should come first: > > return tag == KASAN_TAG_KERNEL || (shadow_byte != KASAN_TAG_INVALID && > tag == shadow_byte); > > This way, if the pointer tag is KASAN_TAG_KERNEL, the memory is > accessible no matter what the memory tag is. > > But then the KASAN_TAG_INVALID check isn't needed, as this value is > never assigned to a pointer tag. Which brings us to: > > return tag == KASAN_TAG_KERNEL || tag == shadow_byte; > > Which is essentially the same check that kasan_check_range() performs. > > Although, kasan_check_range() also checks that the shadow is < > KASAN_SHADOW_START. It makes makes sense to add this check into > kasan_byte_accessible() as well, before accessing shadow. > > Thanks! Okay, if the intent is that kasan_byte_accessible() should work on any memory, not just slab memory, then I agree that it should do the same thing as kasan_check_range(). Peter
Re: [PATCH] kfence: unpoison pool region before use
On Sat, Apr 3, 2021 at 3:03 AM Marco Elver wrote: > > On Sat, 3 Apr 2021 at 07:13, Peter Collingbourne wrote: > > If the memory region allocated by KFENCE had previously been poisoned, > > any validity checks done using kasan_byte_accessible() will fail. Fix > > it by unpoisoning the memory before using it as the pool region. > > > > Link: > > https://linux-review.googlesource.com/id/I0af99e9f1c25eaf7e1ec295836b5d148d76940c5 > > Signed-off-by: Peter Collingbourne > > Thanks, at a high level this seems reasonable, because we always want > to ensure that KFENCE memory remains unpoisoned with KASAN on. FWIW I > subjected a config with KFENCE+KASAN (generic, SW_TAGS, and HW_TAGS) > to syzkaller testing and ran kfence_test: > > Tested-by: Marco Elver > > > However, it is unclear to me under which circumstances we actually > need this, i.e. something would grab some memblock memory, somehow > poison it, and then release the memory back during early boot (note, > kfence_alloc_pool() is called before slab setup). If we can somehow > understand what actually did this, perhaps it'd help tell us if this > actually needs fixing in KFENCE or it's the other thing that needs a > fix. > > Given all this is happening during really early boot, I'd expect no or > very few calls to kasan_poison() until kfence_alloc_pool() is called. > We can probably debug it more by having kasan_poison() do a "if > (!__kfence_pool) dump_stack();" somewhere. Can you try this on the > system where you can repro the problem? I tried this just now on the > latest mainline kernel, and saw 0 calls until kfence_alloc_pool(). I looked into the issue some more, and it turned out that the memory wasn't getting poisoned by kasan_poison() but rather by the calls to kasan_map_populate() in kasan_init_shadow(). Starting with the patch "kasan: initialize shadow to TAG_INVALID for SW_TAGS", KASAN_SHADOW_INIT is set to 0xFE rather than 0xFF, which caused the failure. The Android kernel branch for 5.10 (and the downstream kernel I was working with) already have this patch, but it isn't in the mainline kernel yet. Now that I understand the cause of the issue, I can reproduce it using the KFENCE unit tests on a db845c board, using both the Android 5.10 and mainline branches if I cherry-pick that change. Here's an example crash from the unit tests (the failure was originally also observed from ksize in the downstream kernel): [ 46.692195][ T175] BUG: KASAN: invalid-access in test_krealloc+0x1c4/0xf98 [ 46.699282][ T175] Read of size 1 at addr ff80e9e7b000 by task kunit_try_catch/175 [ 46.707400][ T175] Pointer tag: [ff], memory tag: [fe] [ 46.712710][ T175] [ 46.714955][ T175] CPU: 4 PID: 175 Comm: kunit_try_catch Tainted: GB 5.12.0-rc5-mainline-09505-ga2ab5b26d445-dirty #1 [ 46.727193][ T175] Hardware name: Thundercomm Dragonboard 845c (DT) [ 46.733636][ T175] Call trace: [ 46.736841][ T175] dump_backtrace+0x0/0x2f8 [ 46.741295][ T175] show_stack+0x2c/0x3c [ 46.745388][ T175] dump_stack+0x124/0x1bc [ 46.749668][ T175] print_address_description+0x7c/0x308 [ 46.755178][ T175] __kasan_report+0x1a8/0x398 [ 46.759816][ T175] kasan_report+0x50/0x7c [ 46.764103][ T175] __kasan_check_byte+0x3c/0x54 [ 46.768916][ T175] ksize+0x4c/0x94 [ 46.772573][ T175] test_krealloc+0x1c4/0xf98 [ 46.777108][ T175] kunit_try_run_case+0x94/0x1c4 [ 46.781990][ T175] kunit_generic_run_threadfn_adapter+0x30/0x44 [ 46.788196][ T175] kthread+0x20c/0x234 [ 46.792213][ T175] ret_from_fork+0x10/0x30 Since "kasan: initialize shadow to TAG_INVALID for SW_TAGS" hasn't landed in mainline yet, it seems like we should insert this patch before that one rather than adding a Fixes: tag. Peter
[PATCH] kfence: unpoison pool region before use
If the memory region allocated by KFENCE had previously been poisoned, any validity checks done using kasan_byte_accessible() will fail. Fix it by unpoisoning the memory before using it as the pool region. Link: https://linux-review.googlesource.com/id/I0af99e9f1c25eaf7e1ec295836b5d148d76940c5 Signed-off-by: Peter Collingbourne --- mm/kfence/core.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/mm/kfence/core.c b/mm/kfence/core.c index d53c91f881a4..bb22b0cf77aa 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -633,13 +633,19 @@ static DECLARE_DELAYED_WORK(kfence_timer, toggle_allocation_gate); void __init kfence_alloc_pool(void) { + void *pool; + if (!kfence_sample_interval) return; - __kfence_pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE); - - if (!__kfence_pool) + pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE); + if (!pool) { pr_err("failed to allocate pool\n"); + return; + } + + kasan_unpoison_range(pool, KFENCE_POOL_SIZE); + __kfence_pool = pool; } void __init kfence_init(void) -- 2.31.0.208.g409f899ff0-goog
Re: [PATCH v3 12/17] arm64: implement __va_function
On Thu, Mar 25, 2021 at 4:28 PM Sami Tolvanen wrote: > > On Thu, Mar 25, 2021 at 3:38 AM Mark Rutland wrote: > > > > On Tue, Mar 23, 2021 at 01:39:41PM -0700, Sami Tolvanen wrote: > > > With CONFIG_CFI_CLANG, the compiler replaces function addresses in > > > instrumented C code with jump table addresses. This change implements > > > the __va_function() macro, which returns the actual function address > > > instead. > > > > > > Signed-off-by: Sami Tolvanen > > > Reviewed-by: Kees Cook > > > > Is there really no attribute or builtin that can be used to do this > > without assembly? > > I don't think the compiler currently offers anything that could help > us here. Peter, can you think of another way to avoid the function > address to jump table address conversion with > -fno-sanitize-cfi-canonical-jump-tables? No, the assembly seems like the best way at the moment. > > IIUC from other patches the symbol tables will contain the "real" > > non-cfi entry points (unless we explciitly asked to make the jump table > > address canonical), so AFAICT here the compiler should have all the > > necessary information to generate either the CFI or non-CFI entry point > > addresses, even if it doesn't expose an interface for that today. > > > > It'd be a lot nicer if we could get the compiler to do this for us. > > I agree, that would be quite useful in the kernel. Maybe. The kernel's requirements seem quite specialized here though. A normal C or C++ program has little need for the actual entry point, so if you need it you are probably doing something low level enough to require assembly anyway. Peter
Re: [PATCH v1] drm/bridge: lt9611: Fix handling of 4k panels
Thanks. Confirmed that this fixes display output for me on a 4K monitor. On Mon, Nov 23, 2020 at 2:46 AM Robert Foss wrote: > > 4k requires two dsi pipes, so don't report MODE_OK when only a > single pipe is configured. But rather report MODE_PANEL to > signal that requirements of the panel are not being met. > > Reported-by: Peter Collingbourne > Suggested-by: Peter Collingbourne > Signed-off-by: Robert Foss > Tested-by: John Stultz Tested-by: Peter Collingbourne > --- > drivers/gpu/drm/bridge/lontium-lt9611.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/bridge/lontium-lt9611.c > b/drivers/gpu/drm/bridge/lontium-lt9611.c > index d734d9402c35..e8eb8deb444b 100644 > --- a/drivers/gpu/drm/bridge/lontium-lt9611.c > +++ b/drivers/gpu/drm/bridge/lontium-lt9611.c > @@ -867,8 +867,14 @@ static enum drm_mode_status > lt9611_bridge_mode_valid(struct drm_bridge *bridge, > const struct > drm_display_mode *mode) > { > struct lt9611_mode *lt9611_mode = lt9611_find_mode(mode); > + struct lt9611 *lt9611 = bridge_to_lt9611(bridge); > > - return lt9611_mode ? MODE_OK : MODE_BAD; > + if (!lt9611_mode) > + return MODE_BAD; > + else if (lt9611_mode->intfs > 1 && !lt9611->dsi1) > + return MODE_PANEL; > + else > + return MODE_OK; > } > > static void lt9611_bridge_pre_enable(struct drm_bridge *bridge) > -- > 2.27.0 >
Re: linux-next: manual merge of the akpm tree with the arm64 tree
On Wed, Nov 25, 2020 at 11:06 PM Stephen Rothwell wrote: > > Hi all, > > Today's linux-next merge of the akpm tree got a conflict in: > > arch/arm64/mm/proc.S > > between commit: > > 49b3cf035edc ("kasan: arm64: set TCR_EL1.TBID1 when enabled") > > from the arm64 tree and commit: > > 68cd215d6529 ("arm64: kasan: allow enabling in-kernel MTE") > > from the akpm tree. > > I fixed it up (I think, see below) and can carry the fix as necessary. > This is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > > -- > Cheers, > Stephen Rothwell > > diff --cc arch/arm64/mm/proc.S > index a0831bf8a018,0d85e6df42bc.. > --- a/arch/arm64/mm/proc.S > +++ b/arch/arm64/mm/proc.S > @@@ -40,9 -40,15 +40,15 @@@ > #define TCR_CACHE_FLAGS TCR_IRGN_WBWA | TCR_ORGN_WBWA > > #ifdef CONFIG_KASAN_SW_TAGS > - #define TCR_KASAN_FLAGS TCR_TBI1 | TCR_TBID1 > -#define TCR_KASAN_SW_FLAGS TCR_TBI1 > ++#define TCR_KASAN_SW_FLAGS TCR_TBI1 | TCR_TBID1 > #else > - #define TCR_KASAN_FLAGS 0 > + #define TCR_KASAN_SW_FLAGS 0 > + #endif > + > + #ifdef CONFIG_KASAN_HW_TAGS > -#define TCR_KASAN_HW_FLAGS SYS_TCR_EL1_TCMA1 | TCR_TBI1 > ++#define TCR_KASAN_HW_FLAGS SYS_TCR_EL1_TCMA1 | TCR_TBI1 | TCR_TBID1 > + #else > + #define TCR_KASAN_HW_FLAGS 0 > #endif > > /* Thanks Stephen, that resolution looks correct to me. Peter
Re: [PATCH v13 19/26] mm: Re-introduce do_mmap_pgoff()
On Fri, Oct 2, 2020 at 8:58 AM Yu, Yu-cheng wrote: > > On 10/1/2020 7:06 PM, Peter Collingbourne wrote: > > On Fri, Sep 25, 2020 at 7:57 AM Yu-cheng Yu wrote: > >> > >> There was no more caller passing vm_flags to do_mmap(), and vm_flags was > >> removed from the function's input by: > >> > >> commit 45e55300f114 ("mm: remove unnecessary wrapper function > >> do_mmap_pgoff()"). > >> > >> There is a new user now. Shadow stack allocation passes VM_SHSTK to > >> do_mmap(). Re-introduce the vm_flags and do_mmap_pgoff(). > > > > I would prefer to change the callers to pass the additional 0 argument > > instead of bringing the wrapper function back, but if we're going to > > bring it back then we should fix the naming (both functions take a > > pgoff argument, so the previous name do_mmap_pgoff() was just plain > > confusing). > > > > Peter > > > > Thanks for your feedback. Here is the updated patch. I will re-send > the whole series later. > > Yu-cheng > > == > > From 6a9f1e6bcdb6e599a44d5f58cf4cebd28c4634a2 Mon Sep 17 00:00:00 2001 > From: Yu-cheng Yu > Date: Wed, 12 Aug 2020 14:01:58 -0700 > Subject: [PATCH 19/26] mm: Re-introduce do_mmap_pgoff() The subject line of the commit message needs to be updated, but aside from that: Reviewed-by: Peter Collingbourne Peter > > There was no more caller passing vm_flags to do_mmap(), and vm_flags was > removed from the function's input by: > > commit 45e55300f114 ("mm: remove unnecessary wrapper function > do_mmap_pgoff()"). > > There is a new user now. Shadow stack allocation passes VM_SHSTK to > do_mmap(). Re-introduce vm_flags to do_mmap(), but without the old wrapper > do_mmap_pgoff(). Instead, fix all callers of the wrapper by passing a zero > vm_flags to do_mmap(). > > Signed-off-by: Yu-cheng Yu > Cc: Peter Collingbourne > Cc: Andrew Morton > Cc: Oleg Nesterov > Cc: linux...@kvack.org > --- > fs/aio.c | 2 +- > include/linux/mm.h | 3 ++- > ipc/shm.c | 2 +- > mm/mmap.c | 10 +- > mm/nommu.c | 4 ++-- > mm/util.c | 2 +- > 6 files changed, 12 insertions(+), 11 deletions(-) > > diff --git a/fs/aio.c b/fs/aio.c > index d5ec30385566..ca8c11665eea 100644 > --- a/fs/aio.c > +++ b/fs/aio.c > @@ -527,7 +527,7 @@ static int aio_setup_ring(struct kioctx *ctx, > unsigned int nr_events) > > ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, > PROT_READ | PROT_WRITE, > -MAP_SHARED, 0, , NULL); > +MAP_SHARED, 0, 0, , NULL); > mmap_write_unlock(mm); > if (IS_ERR((void *)ctx->mmap_base)) { > ctx->mmap_size = 0; > diff --git a/include/linux/mm.h b/include/linux/mm.h > index e09d13699bbe..e020eea33138 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2560,7 +2560,8 @@ extern unsigned long mmap_region(struct file > *file, unsigned long addr, > struct list_head *uf); > extern unsigned long do_mmap(struct file *file, unsigned long addr, > unsigned long len, unsigned long prot, unsigned long flags, > - unsigned long pgoff, unsigned long *populate, struct list_head *uf); > + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, > + struct list_head *uf); > extern int __do_munmap(struct mm_struct *, unsigned long, size_t, >struct list_head *uf, bool downgrade); > extern int do_munmap(struct mm_struct *, unsigned long, size_t, > diff --git a/ipc/shm.c b/ipc/shm.c > index e25c7c6106bc..91474258933d 100644 > --- a/ipc/shm.c > +++ b/ipc/shm.c > @@ -1556,7 +1556,7 @@ long do_shmat(int shmid, char __user *shmaddr, int > shmflg, > goto invalid; > } > > - addr = do_mmap(file, addr, size, prot, flags, 0, , NULL); > + addr = do_mmap(file, addr, size, prot, flags, 0, 0, , NULL); > *raddr = addr; > err = 0; > if (IS_ERR_VALUE(addr)) > diff --git a/mm/mmap.c b/mm/mmap.c > index 574b3f273462..fc04184d2eae 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1365,11 +1365,11 @@ static inline bool file_mmap_ok(struct file > *file, struct inode *inode, >*/ > unsigned long do_mmap(struct file *file, unsigned long addr, > unsigned long len, unsigned long prot, > - unsigned long flags, unsigned long pgoff, > - unsigned long *populate, struct list_head *uf) > +
Re: [PATCH v13 19/26] mm: Re-introduce do_mmap_pgoff()
On Fri, Sep 25, 2020 at 7:57 AM Yu-cheng Yu wrote: > > There was no more caller passing vm_flags to do_mmap(), and vm_flags was > removed from the function's input by: > > commit 45e55300f114 ("mm: remove unnecessary wrapper function > do_mmap_pgoff()"). > > There is a new user now. Shadow stack allocation passes VM_SHSTK to > do_mmap(). Re-introduce the vm_flags and do_mmap_pgoff(). I would prefer to change the callers to pass the additional 0 argument instead of bringing the wrapper function back, but if we're going to bring it back then we should fix the naming (both functions take a pgoff argument, so the previous name do_mmap_pgoff() was just plain confusing). Peter
Re: [PATCH] ACPICA: fix UBSAN warning using __builtin_offsetof
On Mon, Jun 1, 2020 at 4:18 PM Nick Desaulniers wrote: > > Will reported UBSAN warnings: > UBSAN: null-ptr-deref in drivers/acpi/acpica/tbfadt.c:459:37 > UBSAN: null-ptr-deref in arch/arm64/kernel/smp.c:596:6 > > Looks like the emulated offsetof macro ACPI_OFFSET is causing these. We > can avoid this by using the compiler builtin, __builtin_offsetof. Would it be better to s/ACPI_OFFSET/offsetof/g the existing users of this macro and remove it? It looks like offsetof is already being used pervasively in the kernel, and its definition comes from . Peter Peter > The non-kernel runtime of UBSAN would print: > runtime error: member access within null pointer of type > for this macro. > > Link: https://lore.kernel.org/lkml/20200521100952.GA5360@willie-the-truck/ > Cc: sta...@vger.kernel.org > Reported-by: Will Deacon > Suggested-by: Ard Biesheuvel > Signed-off-by: Nick Desaulniers > --- > include/acpi/actypes.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/acpi/actypes.h b/include/acpi/actypes.h > index 4defed58ea33..04359c70b198 100644 > --- a/include/acpi/actypes.h > +++ b/include/acpi/actypes.h > @@ -508,7 +508,7 @@ typedef u64 acpi_integer; > > #define ACPI_TO_POINTER(i) ACPI_CAST_PTR (void, (acpi_size) (i)) > #define ACPI_TO_INTEGER(p) ACPI_PTR_DIFF (p, (void *) 0) > -#define ACPI_OFFSET(d, f) ACPI_PTR_DIFF (&(((d *) 0)->f), > (void *) 0) > +#define ACPI_OFFSET(d, f) __builtin_offsetof(d, f) > #define ACPI_PHYSADDR_TO_PTR(i) ACPI_TO_POINTER(i) > #define ACPI_PTR_TO_PHYSADDR(i) ACPI_TO_INTEGER(i) > > -- > 2.27.0.rc2.251.g90737beb825-goog >
Re: linux-next: build failure after merge of the arm64 tree
On Wed, Aug 7, 2019 at 8:25 AM Will Deacon wrote: > > Hello Masahiro, > > On Wed, Aug 07, 2019 at 11:43:02PM +0900, Masahiro Yamada wrote: > > On Wed, Aug 7, 2019 at 8:46 PM Will Deacon wrote: > > > On Tue, Aug 06, 2019 at 07:34:36PM -0700, Peter Collingbourne wrote: > > > > We're setting NM to something else based on a config option, which I > > > > presume sets up some sort of circular dependency that confuses > > > > Kconfig. Removing this fragment of the makefile (or appending > > > > --synthetic unconditionally) also makes the problem go away. > > > > Exactly. This makes a circular dependency. > > Kconfig determines the environment variable 'NM' has been changed, > > and re-runs. > > > > > Yes, I think you're right. The lack of something like KBUILD_NMFLAGS means > > > that architectures are forced to override NM entirely if they want to pass > > > any specific options. Making that conditional on a Kconfig option appears > > > to send the entire thing recursive. > > > > Adding KBUILD_NMFLAGS is probably the right thing to do. > > (Is there somebody who wants to volunteer for this?) > > I don't think so ;) We don't do this for many other tools, and there's only > this one case that really seems to require it. > > > But, for this particular case, I vote for > > the entire removal of --synthetic. > > > > This dates back to 2004, and the commit message > > did not clearly explain why it was needed. > > > > The PowerPC maintainers should re-evaluate > > whether or not this flag is necessary. > > > > ppc32 is working without --synthetic, > > so probably ppc64 would be ... > > Again, this is up to the PPC maintainers. > > > > diff --git a/init/Kconfig b/init/Kconfig > > > index d96127ebc44e..a38027a06b79 100644 > > > --- a/init/Kconfig > > > +++ b/init/Kconfig > > > @@ -31,7 +31,7 @@ config CC_HAS_ASM_GOTO > > > def_bool $(success,$(srctree)/scripts/gcc-goto.sh $(CC)) > > > > > > config TOOLS_SUPPORT_RELR > > > - def_bool $(success,env "CC=$(CC)" "LD=$(LD)" "NM=$(NM)" > > > "OBJCOPY=$(OBJCOPY)" $(srctree)/scripts/tools-support-relr.sh) > > > + def_bool $(success,env "CC=$(CC)" "LD=$(LD)" > > > "NM=$(CROSS_COMPILE)nm" "OBJCOPY=$(OBJCOPY)" > > > $(srctree)/scripts/tools-support-relr.sh) > > > > > > config CC_HAS_WARN_MAYBE_UNINITIALIZED > > > def_bool $(cc-option,-Wmaybe-uninitialized) > > > > Maybe, > > > > def_bool $(success,env "CC=$(CC)" "LD=$(LD)" "OBJCOPY=$(OBJCOPY)" > > $(srctree)/scripts/tools-support-relr.sh) > > > > will work. > > Oh yes, I prefer that. I've included the updated patch below, and I'll > put it into -next shortly so that we resolve the build breakage. Hopefully > we'll have a better fix once the ozlabs folks wake up. > > > Or more simply > > > > def_bool $(success,$(srctree)/scripts/tools-support-relr.sh) > > > > CC, LD, OBJCOPY, NM are environment variables, > > so I think tools-support-relr.sh can directly use them. > > Maybe, but the other scripts invoked here tend to pass $(CC) through as > an argument, and that helps with your observation below...: > > > However, this bypasses the detection of environment variable changes. > > If a user passes NM= from the command line, Kconfig will no longer > > notice the change of NM. > > ... but since my proposal also breaks this, then I think your idea of just > dropping $NM for now is best. Thanks Will, that sounds good to me as well. I verified that it fixes the hang in ppc64 on my end, and that we can still produce RELR relocations on arm64 by passing in llvm-nm as $NM. > Will > > --->8 > > From 71c67a31f09fa8fdd1495dffd96a5f0d4cef2ede Mon Sep 17 00:00:00 2001 > From: Will Deacon > Date: Wed, 7 Aug 2019 12:48:33 +0100 > Subject: [PATCH] init/Kconfig: Fix infinite Kconfig recursion on PPC > > Commit 5cf896fb6be3 ("arm64: Add support for relocating the kernel with > RELR relocations") introduced CONFIG_TOOLS_SUPPORT_RELR, which checks > for RELR support in the toolchain as part of the kernel configuration. > During this procedure, "$(NM)" is invoked to see if it supports the new > relocation format, however PowerPC conditionally overrides this variable > in the architecture Makefile in order to pass '--synthetic' when > targetting PPC64. > > This conditional override causes Kconfig to recurse forev
Re: linux-next: build failure after merge of the arm64 tree
On Tue, Aug 6, 2019 at 4:50 PM Stephen Rothwell wrote: > > Hi all, > > After merging the arm64 tree, today's linux-next build (powerpc > ppc64_defconfig) was just spinning in make - it executing some scripts, > but it was hard to catch just what. > > Apparently caused by commit > > 5cf896fb6be3 ("arm64: Add support for relocating the kernel with RELR > relocations") > > I have not idea why, but reverting the above commit allows to build > to finish. Okay, I can reproduce with: $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- defconfig *** Default configuration is based on 'ppc64_defconfig' # # No change to .config # $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j72 scripts/kconfig/conf --syncconfig Kconfig scripts/kconfig/conf --syncconfig Kconfig scripts/kconfig/conf --syncconfig Kconfig [...] And confirmed that backing out my patch fixes it. The problem seems to come from the use of $(NM) in the Kconfig file. If I apply this diff: diff --git a/init/Kconfig b/init/Kconfig index d96127ebc44e0..177a95b323230 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -31,7 +31,7 @@ config CC_HAS_ASM_GOTO def_bool $(success,$(srctree)/scripts/gcc-goto.sh $(CC)) config TOOLS_SUPPORT_RELR - def_bool $(success,env "CC=$(CC)" "LD=$(LD)" "NM=$(NM)" "OBJCOPY=$(OBJCOPY)" $(srctree)/scripts/tools-support-relr.sh) + def_bool $(success,$(NM)) config CC_HAS_WARN_MAYBE_UNINITIALIZED def_bool $(cc-option,-Wmaybe-uninitialized) I still see the hang. Replacing $(NM) with something else makes it go away. That leads me to ask what is special about $(NM) + powerpc? It turns out to be this fragment of arch/powerpc/Makefile: ifdef CONFIG_PPC64 new_nm := $(shell if $(NM) --help 2>&1 | grep -- '--synthetic' > /dev/null; then echo y; else echo n; fi) ifeq ($(new_nm),y) NM := $(NM) --synthetic endif endif We're setting NM to something else based on a config option, which I presume sets up some sort of circular dependency that confuses Kconfig. Removing this fragment of the makefile (or appending --synthetic unconditionally) also makes the problem go away. We should at least able to remove the test for a new-enough binutils. According to changes.rst we require binutils 2.21 which was released in 2011, and support for --synthetic was added to binutils in 2004: https://github.com/bminor/binutils-gdb/commit/0873df2aec48685715d2c5041c1f6f4ed43976c1 But why are we passing --synthetic at all on ppc64? This behaviour seems to come from this commit from 2004: https://github.com/mpe/linux-fullhistory/commit/0e32679a4ea48a634d94e97355d47512ef14d71f which states: "On new toolchains we need to use nm --synthetic or we miss code symbols." But I only see a couple of missing symbols if I compare the output of nm with and without --synthetic on a powerpc64 kernel: $ diff -u <(powerpc-linux-gnu-nm --synthetic vmlinux) <(powerpc-linux-gnu-nm vmlinux) --- /dev/fd/63 2019-08-06 18:48:56.127020621 -0700 +++ /dev/fd/62 2019-08-06 18:48:56.131020636 -0700 @@ -74840,7 +74840,6 @@ c1901b10 D LZ4_decompress_fast c07819a0 T .LZ4_decompress_fast_continue c1901b70 D LZ4_decompress_fast_continue -c07811e0 t .LZ4_decompress_fast_extDict c1901b40 d LZ4_decompress_fast_extDict c0781960 T .LZ4_decompress_fast_usingDict c1901b58 D LZ4_decompress_fast_usingDict @@ -74856,7 +74855,6 @@ c1901bd0 D LZ4_decompress_safe_usingDict c07822b0 T .LZ4_decompress_safe_withPrefix64k c1901b88 D LZ4_decompress_safe_withPrefix64k -c0780c60 t .LZ4_decompress_safe_withSmallPrefix c1901b28 d LZ4_decompress_safe_withSmallPrefix c077fbe0 T .LZ4_setStreamDecode c1901ac8 D LZ4_setStreamDecode I guess the problem was worse back in 2004. It almost certainly didn't involve these particular symbols because they were added in commit 2209fda323e2fd2a2d0885595fd5097717f8d2aa from 2018. So I guess we have a couple of possible quick fixes (assuming that the Kconfig issue can't be solved somehow): either stop passing --synthetic on powerpc and lose a couple of symbols in 64-bit kernels, or start passing it unconditionally on powerpc (it doesn't seem to make a difference to the nm output on a ppc64_defconfig kernel with CONFIG_PPC64=n). I'm cc'ing the powerpc maintainers for their opinion on what to do. While this is being resolved we should probably back out my patch from -next. Peter
[tip:timers/vdso] arm64: vdso: Build vDSO with -ffixed-x18
Commit-ID: 98cd3c3f83fbba27a6bacd75ad12e8388a61a32a Gitweb: https://git.kernel.org/tip/98cd3c3f83fbba27a6bacd75ad12e8388a61a32a Author: Peter Collingbourne AuthorDate: Fri, 21 Jun 2019 10:52:32 +0100 Committer: Thomas Gleixner CommitDate: Sat, 22 Jun 2019 21:21:06 +0200 arm64: vdso: Build vDSO with -ffixed-x18 The vDSO needs to be built with x18 reserved in order to accommodate userspace platform ABIs built on top of Linux that use the register to carry inter-procedural state, as provided for by the AAPCS. An example of such a platform ABI is the one that will be used by an upcoming version of Android. Although this change is currently a no-op due to the fact that the vDSO is currently implemented in pure assembly on arm64, it is necessary in order to prepare for using the generic C implementation of the vDSO. [ tglx: Massaged changelog ] Signed-off-by: Peter Collingbourne Signed-off-by: Vincenzo Frascino Signed-off-by: Thomas Gleixner Tested-by: Shijith Thotton Tested-by: Andre Przywara Cc: linux-a...@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-m...@vger.kernel.org Cc: linux-kselft...@vger.kernel.org Cc: Catalin Marinas Cc: Will Deacon Cc: Arnd Bergmann Cc: Russell King Cc: Ralf Baechle Cc: Paul Burton Cc: Daniel Lezcano Cc: Mark Salyzyn Cc: Shuah Khan Cc: Dmitry Safonov <0x7f454...@gmail.com> Cc: Rasmus Villemoes Cc: Huw Davies Cc: Mark Salyzyn Link: https://lkml.kernel.org/r/20190621095252.32307-6-vincenzo.frasc...@arm.com --- arch/arm64/kernel/vdso/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index 3acfc813e966..ec81d28aeb5d 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile @@ -20,7 +20,7 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso)) ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 --hash-style=sysv \ --build-id -n -T -ccflags-y := -fno-common -fno-builtin -fno-stack-protector +ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18 ccflags-y += -DDISABLE_BRANCH_PROFILING VDSO_LDFLAGS := -Bsymbolic