[PATCH v2] powerpc/pseries: fix max polling time in plpks_confirm_object_flushed() function
usleep_range() function takes input time and range in usec. However, currently it is assumed in msec in the function plpks_confirm_object_flushed(). Fix the total polling time for the object flushing from 5msec to 5sec. Reported-by: Nageswara R Sastry Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore") Suggested-by: Michael Ellerman Signed-off-by: Nayna Jain Tested-by: Nageswara R Sastry --- v2: * Updated based on feedback from Michael Ellerman Replaced usleep_range with fsleep. Since there is no more need to specify range, sleep time is reverted back to 10 msec. arch/powerpc/include/asm/plpks.h | 5 ++--- arch/powerpc/platforms/pseries/plpks.c | 3 +-- 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/plpks.h b/arch/powerpc/include/asm/plpks.h index 23b77027c916..7a84069759b0 100644 --- a/arch/powerpc/include/asm/plpks.h +++ b/arch/powerpc/include/asm/plpks.h @@ -44,9 +44,8 @@ #define PLPKS_MAX_DATA_SIZE4000 // Timeouts for PLPKS operations -#define PLPKS_MAX_TIMEOUT 5000 // msec -#define PLPKS_FLUSH_SLEEP 10 // msec -#define PLPKS_FLUSH_SLEEP_RANGE400 +#define PLPKS_MAX_TIMEOUT (5 * USEC_PER_SEC) +#define PLPKS_FLUSH_SLEEP 1 // usec struct plpks_var { char *component; diff --git a/arch/powerpc/platforms/pseries/plpks.c b/arch/powerpc/platforms/pseries/plpks.c index febe18f251d0..bcfcd5acc5c2 100644 --- a/arch/powerpc/platforms/pseries/plpks.c +++ b/arch/powerpc/platforms/pseries/plpks.c @@ -415,8 +415,7 @@ static int plpks_confirm_object_flushed(struct label *label, break; } - usleep_range(PLPKS_FLUSH_SLEEP, -PLPKS_FLUSH_SLEEP + PLPKS_FLUSH_SLEEP_RANGE); + fsleep(PLPKS_FLUSH_SLEEP); timeout = timeout + PLPKS_FLUSH_SLEEP; } while (timeout < PLPKS_MAX_TIMEOUT); -- 2.31.1
[powerpc:merge] BUILD SUCCESS d5bdfb09862ffbf009951f56d324129d1efb9de0
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge branch HEAD: d5bdfb09862ffbf009951f56d324129d1efb9de0 Automatic merge of 'next' into merge (2024-03-12 09:59) elapsed time: 2848m configs tested: 122 configs skipped: 3 The following configs have been built successfully. More configs may be tested in the coming days. tested configs: alpha allnoconfig gcc alphaallyesconfig gcc alpha defconfig gcc arc allmodconfig gcc arc allnoconfig gcc arc allyesconfig gcc arc defconfig gcc arc haps_hs_smp_defconfig gcc arm allmodconfig gcc arm allnoconfig clang arm allyesconfig gcc arm defconfig clang arm netwinder_defconfig gcc armshmobile_defconfig gcc arm socfpga_defconfig gcc arm64allmodconfig clang arm64 allnoconfig gcc arm64 defconfig gcc csky allmodconfig gcc csky allnoconfig gcc csky allyesconfig gcc cskydefconfig gcc hexagon allmodconfig clang hexagon allnoconfig clang hexagon allyesconfig clang hexagon defconfig clang i386 allmodconfig gcc i386 allnoconfig gcc i386 allyesconfig gcc i386 buildonly-randconfig-001-20240313 gcc i386 buildonly-randconfig-001-20240314 clang i386 buildonly-randconfig-002-20240313 gcc i386 buildonly-randconfig-002-20240314 clang i386 buildonly-randconfig-003-20240313 clang i386 buildonly-randconfig-004-20240313 clang i386 buildonly-randconfig-005-20240313 clang i386 buildonly-randconfig-006-20240313 gcc i386 buildonly-randconfig-006-20240314 clang i386defconfig clang i386 randconfig-001-20240313 clang i386 randconfig-002-20240313 clang i386 randconfig-002-20240314 clang i386 randconfig-003-20240313 clang i386 randconfig-004-20240313 gcc i386 randconfig-004-20240314 clang i386 randconfig-005-20240313 gcc i386 randconfig-006-20240313 clang i386 randconfig-011-20240313 gcc i386 randconfig-011-20240314 clang i386 randconfig-012-20240313 clang i386 randconfig-013-20240313 gcc i386 randconfig-013-20240314 clang i386 randconfig-014-20240313 gcc i386 randconfig-015-20240313 clang i386 randconfig-015-20240314 clang i386 randconfig-016-20240313 gcc i386 randconfig-016-20240314 clang loongarchallmodconfig gcc loongarch allnoconfig gcc loongarch defconfig gcc m68k allmodconfig gcc m68k allnoconfig gcc m68k allyesconfig gcc m68kdefconfig gcc microblaze allmodconfig gcc microblazeallnoconfig gcc microblaze allyesconfig gcc microblaze defconfig gcc mips allnoconfig gcc mips allyesconfig gcc nios2allmodconfig gcc nios2 allnoconfig gcc nios2allyesconfig gcc nios2 defconfig gcc openrisc allnoconfig gcc openrisc allyesconfig gcc openriscdefconfig gcc parisc allmodconfig gcc pariscallnoconfig gcc parisc allyesconfig gcc parisc defconfig gcc parisc64defconfig gcc powerpc allmodconfig gcc powerpc allnoconfig gcc powerpc allyesconfig clang powerpc
Re: [PATCH v10 09/12] powerpc: mm: Add common pud_pfn stub for all platforms
On Wed, 2024-03-13 at 11:08 +, Christophe Leroy wrote: > > > Le 13/03/2024 à 05:21, Rohan McLure a écrit : > > Prior to this commit, pud_pfn was implemented with BUILD_BUG as the > > inline > > function for 64-bit Book3S systems but is never included, as its > > invocations in generic code are guarded by calls to pud_devmap > > which return > > zero on such systems. A future patch will provide support for page > > table > > checks, the generic code for which depends on a pud_pfn stub being > > implemented, even while the patch will not interact with puds > > directly. > > > > Remove the 64-bit Book3S stub and define pud_pfn to warn on all > > platforms. pud_pfn may be defined properly on a per-platform basis > > should it grow real usages in future. Apologies, I don't actually remove the 64-bit, Book3S stub, as it currently correctly reflects how transparent hugepages should work. Also the stub that was previously implemented for all platforms has been removed in commit 27af67f35631 ("powerpc/book3s64/mm: enable transparent pud hugepage"). > > Can you please re-explain why that's needed ? I remember we discussed > it > already in the past, but I checked again today and can't see the > need: > > In mm/page_table_check.c, the call to pud_pfn() is gated by a call to > pud_user_accessible_page(pud). If I look into arm64 version of > pud_user_accessible_page(), it depends on pud_leaf(). When pud_leaf() > is > constant 0, pud_user_accessible_page() is always false and the call > to > pud_pfn() should be folded away. As it will be folded away on non 64-bit Book3S platforms, I could even replace the WARN_ONCE with a BUILD_BUG for your stated reason. The __page_table_check_pud_set() function will still be included in the build and references this routine so a fallback stub is still necessary. > > > > > Signed-off-by: Rohan McLure > > --- > > arch/powerpc/include/asm/pgtable.h | 14 ++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/arch/powerpc/include/asm/pgtable.h > > b/arch/powerpc/include/asm/pgtable.h > > index 0c0ffbe7a3b5..13f661831333 100644 > > --- a/arch/powerpc/include/asm/pgtable.h > > +++ b/arch/powerpc/include/asm/pgtable.h > > @@ -213,6 +213,20 @@ static inline bool > > arch_supports_memmap_on_memory(unsigned long vmemmap_size) > > > > #endif /* CONFIG_PPC64 */ > > > > +/* > > + * Currently only consumed by page_table_check_pud_{set,clear}. > > Since clears > > + * and sets to page table entries at any level are done through > > + * page_table_check_pte_{set,clear}, provide stub implementation. > > + */ > > +#ifndef pud_pfn > > +#define pud_pfn pud_pfn > > +static inline int pud_pfn(pud_t pud) > > +{ > > + WARN_ONCE(1, "pud: platform does not use pud entries > > directly"); > > + return 0; > > +} > > +#endif > > + > > #endif /* __ASSEMBLY__ */ > > > > #endif /* _ASM_POWERPC_PGTABLE_H */
Re: [PATCH v7 1/5] net: wan: Add support for QMC HDLC
Hi Herve, Herve Codina writes: > The QMC HDLC driver provides support for HDLC using the QMC (QUICC > Multichannel Controller) to transfer the HDLC data. ... > > diff --git a/drivers/net/wan/fsl_qmc_hdlc.c b/drivers/net/wan/fsl_qmc_hdlc.c > new file mode 100644 > index ..5fd7ed325f5b > --- /dev/null > +++ b/drivers/net/wan/fsl_qmc_hdlc.c > @@ -0,0 +1,419 @@ ... > +static int qmc_hdlc_remove(struct platform_device *pdev) > +{ > + struct qmc_hdlc *qmc_hdlc = platform_get_drvdata(pdev); > + > + unregister_hdlc_device(qmc_hdlc->netdev); > + free_netdev(qmc_hdlc->netdev); > + > + return 0; > +} > + > +static const struct of_device_id qmc_hdlc_id_table[] = { > + { .compatible = "fsl,qmc-hdlc" }, > + {} /* sentinel */ > +}; > +MODULE_DEVICE_TABLE(of, qmc_hdlc_driver); This breaks when building as a module (eg. ppc32_allmodconfig): In file included from ../include/linux/device/driver.h:21, from ../include/linux/device.h:32, from ../include/linux/dma-mapping.h:8, from ../drivers/net/wan/fsl_qmc_hdlc.c:13: ../drivers/net/wan/fsl_qmc_hdlc.c:405:25: error: ‘qmc_hdlc_driver’ undeclared here (not in a function); did you mean ‘qmc_hdlc_probe’? 405 | MODULE_DEVICE_TABLE(of, qmc_hdlc_driver); | ^~~ IIUIC it should be pointing to the table, not the driver, so: diff --git a/drivers/net/wan/fsl_qmc_hdlc.c b/drivers/net/wan/fsl_qmc_hdlc.c index 5fd7ed325f5b..705c3681fb92 100644 --- a/drivers/net/wan/fsl_qmc_hdlc.c +++ b/drivers/net/wan/fsl_qmc_hdlc.c @@ -402,7 +402,7 @@ static const struct of_device_id qmc_hdlc_id_table[] = { { .compatible = "fsl,qmc-hdlc" }, {} /* sentinel */ }; -MODULE_DEVICE_TABLE(of, qmc_hdlc_driver); +MODULE_DEVICE_TABLE(of, qmc_hdlc_id_table); static struct platform_driver qmc_hdlc_driver = { .driver = { Which then builds correctly. cheers
Re: [PATCH v3 07/12] powerpc: Use initializer for struct vm_unmapped_area_info
"Edgecombe, Rick P" writes: > On Wed, 2024-03-13 at 06:44 +, Christophe Leroy wrote: >> I understand from this text that, as agreed, this patch removes the >> pointless/redundant zero-init of individual members. But it is not >> what >> is done, see below ? > > Err, right. I think I decided to leave it because it was already acked > and there wasn't enough discussion on the ack to be sure. I will update > it. That's fine by me, you can keep my ack. cheers
[PATCH 13/13] mm: Document pXd_leaf() API
From: Peter Xu There's one small section already, but since we're going to remove pXd_huge(), that comment may start to obsolete. Rewrite that section with more information, hopefully with that the API is crystal clear on what it implies. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- include/linux/pgtable.h | 24 +++- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 85fc7554cd52..6b0d222a7fad 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1770,11 +1770,25 @@ typedef unsigned int pgtbl_mod_mask; #endif /* - * p?d_leaf() - true if this entry is a final mapping to a physical address. - * This differs from p?d_huge() by the fact that they are always available (if - * the architecture supports large pages at the appropriate level) even - * if CONFIG_HUGETLB_PAGE is not defined. - * Only meaningful when called on a valid entry. + * pXd_leaf() is the API to check whether a pgtable entry is a huge page + * mapping. It should work globally across all archs, without any + * dependency on CONFIG_* options. For architectures that do not support + * huge mappings on specific levels, below fallbacks will be used. + * + * A leaf pgtable entry should always imply the following: + * + * - It is a "present" entry. IOW, before using this API, please check it + * with pXd_present() first. NOTE: it may not always mean the "present + * bit" is set. For example, PROT_NONE entries are always "present". + * + * - It should _never_ be a swap entry of any type. Above "present" check + * should have guarded this, but let's be crystal clear on this. + * + * - It should contain a huge PFN, which points to a huge page larger than + * PAGE_SIZE of the platform. The PFN format isn't important here. + * + * - It should cover all kinds of huge mappings (e.g., pXd_trans_huge(), + * pXd_devmap(), or hugetlb mappings). */ #ifndef pgd_leaf #define pgd_leaf(x)false -- 2.44.0
[PATCH 12/13] mm/treewide: Remove pXd_huge()
From: Peter Xu This API is not used anymore, drop it for the whole tree. Signed-off-by: Peter Xu --- arch/arm/mm/Makefile | 1 - arch/arm/mm/hugetlbpage.c | 29 --- arch/arm64/mm/hugetlbpage.c | 10 --- arch/loongarch/mm/hugetlbpage.c | 10 --- arch/mips/include/asm/pgtable-32.h| 2 +- arch/mips/include/asm/pgtable-64.h| 2 +- arch/mips/mm/hugetlbpage.c| 10 --- arch/parisc/mm/hugetlbpage.c | 11 --- .../include/asm/book3s/64/pgtable-4k.h| 10 --- .../include/asm/book3s/64/pgtable-64k.h | 25 arch/powerpc/include/asm/nohash/pgtable.h | 10 --- arch/riscv/mm/hugetlbpage.c | 10 --- arch/s390/mm/hugetlbpage.c| 10 --- arch/sh/mm/hugetlbpage.c | 10 --- arch/sparc/mm/hugetlbpage.c | 10 --- arch/x86/mm/hugetlbpage.c | 16 -- include/linux/hugetlb.h | 24 --- 17 files changed, 2 insertions(+), 198 deletions(-) delete mode 100644 arch/arm/mm/hugetlbpage.c diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile index 71b858c9b10c..1779e12db085 100644 --- a/arch/arm/mm/Makefile +++ b/arch/arm/mm/Makefile @@ -21,7 +21,6 @@ KASAN_SANITIZE_physaddr.o := n obj-$(CONFIG_DEBUG_VIRTUAL)+= physaddr.o obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o -obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_ARM_PV_FIXUP) += pv-fixup-asm.o obj-$(CONFIG_CPU_ABRT_NOMMU) += abort-nommu.o diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c deleted file mode 100644 index c2fa643f6bb5.. --- a/arch/arm/mm/hugetlbpage.c +++ /dev/null @@ -1,29 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * arch/arm/mm/hugetlbpage.c - * - * Copyright (C) 2012 ARM Ltd. - * - * Based on arch/x86/include/asm/hugetlb.h and Bill Carson's patches - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -int pud_huge(pud_t pud) -{ - return 0; -} - -int pmd_huge(pmd_t pmd) -{ - return pmd_leaf(pmd); -} diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index f494fc31201f..ca58210d6c07 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -79,16 +79,6 @@ bool arch_hugetlb_migration_supported(struct hstate *h) } #endif -int pmd_huge(pmd_t pmd) -{ - return pmd_leaf(pmd); -} - -int pud_huge(pud_t pud) -{ - return pud_leaf(pud); -} - static int find_num_contig(struct mm_struct *mm, unsigned long addr, pte_t *ptep, size_t *pgsize) { diff --git a/arch/loongarch/mm/hugetlbpage.c b/arch/loongarch/mm/hugetlbpage.c index a4e78e74aa21..1c56cb59 100644 --- a/arch/loongarch/mm/hugetlbpage.c +++ b/arch/loongarch/mm/hugetlbpage.c @@ -50,16 +50,6 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, return (pte_t *) pmd; } -int pmd_huge(pmd_t pmd) -{ - return (pmd_val(pmd) & _PAGE_HUGE) != 0; -} - -int pud_huge(pud_t pud) -{ - return (pud_val(pud) & _PAGE_HUGE) != 0; -} - uint64_t pmd_to_entrylo(unsigned long pmd_val) { uint64_t val; diff --git a/arch/mips/include/asm/pgtable-32.h b/arch/mips/include/asm/pgtable-32.h index 0e196650f4f4..92b7591aac2a 100644 --- a/arch/mips/include/asm/pgtable-32.h +++ b/arch/mips/include/asm/pgtable-32.h @@ -129,7 +129,7 @@ static inline int pmd_none(pmd_t pmd) static inline int pmd_bad(pmd_t pmd) { #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT - /* pmd_huge(pmd) but inline */ + /* pmd_leaf(pmd) but inline */ if (unlikely(pmd_val(pmd) & _PAGE_HUGE)) return 0; #endif diff --git a/arch/mips/include/asm/pgtable-64.h b/arch/mips/include/asm/pgtable-64.h index 20ca48c1b606..7c28510b3768 100644 --- a/arch/mips/include/asm/pgtable-64.h +++ b/arch/mips/include/asm/pgtable-64.h @@ -245,7 +245,7 @@ static inline int pmd_none(pmd_t pmd) static inline int pmd_bad(pmd_t pmd) { #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT - /* pmd_huge(pmd) but inline */ + /* pmd_leaf(pmd) but inline */ if (unlikely(pmd_val(pmd) & _PAGE_HUGE)) return 0; #endif diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c index 7eaff5b07873..0b9e1b59 100644 --- a/arch/mips/mm/hugetlbpage.c +++ b/arch/mips/mm/hugetlbpage.c @@ -57,13 +57,3 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, } return (pte_t *) pmd; } - -int pmd_huge(pmd_t pmd) -{ - return (pmd_val(pmd) & _PAGE_HUGE) != 0; -} - -int pud_huge(pud_t pud) -{ - return (pud_val(pud) & _PAGE_HUGE) != 0; -} diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c index a9f7e21f6656..0356199bd9e7 100644 --- a/arch/parisc/mm/hugetlbpage.c +++
[PATCH 11/13] mm/treewide: Replace pXd_huge() with pXd_leaf()
From: Peter Xu Now after we're sure all pXd_huge() definitions are the same as pXd_leaf(), reuse it. Luckily, pXd_huge() isn't widely used. Signed-off-by: Peter Xu --- arch/arm/include/asm/pgtable-3level.h | 2 +- arch/arm64/include/asm/pgtable.h | 2 +- arch/arm64/mm/hugetlbpage.c | 4 ++-- arch/loongarch/mm/hugetlbpage.c | 2 +- arch/mips/mm/tlb-r4k.c| 2 +- arch/powerpc/mm/pgtable_64.c | 6 +++--- arch/x86/mm/pgtable.c | 4 ++-- mm/gup.c | 4 ++-- mm/hmm.c | 2 +- mm/memory.c | 2 +- 10 files changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index e7aecbef75c9..9e3c44f0aea2 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -190,7 +190,7 @@ static inline pte_t pte_mkspecial(pte_t pte) #define pmd_dirty(pmd) (pmd_isset((pmd), L_PMD_SECT_DIRTY)) #define pmd_hugewillfault(pmd) (!pmd_young(pmd) || !pmd_write(pmd)) -#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd)) +#define pmd_thp_or_huge(pmd) (pmd_leaf(pmd) || pmd_trans_huge(pmd)) #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define pmd_trans_huge(pmd)(pmd_val(pmd) && !pmd_table(pmd)) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 14d24c357c7a..c4efa47fed5f 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -512,7 +512,7 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd) return pmd; } -#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd)) +#define pmd_thp_or_huge(pmd) (pmd_leaf(pmd) || pmd_trans_huge(pmd)) #define pmd_write(pmd) pte_write(pmd_pte(pmd)) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 1234bbaef5bf..f494fc31201f 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -321,7 +321,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, if (sz != PUD_SIZE && pud_none(pud)) return NULL; /* hugepage or swap? */ - if (pud_huge(pud) || !pud_present(pud)) + if (pud_leaf(pud) || !pud_present(pud)) return (pte_t *)pudp; /* table; check the next level */ @@ -333,7 +333,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, if (!(sz == PMD_SIZE || sz == CONT_PMD_SIZE) && pmd_none(pmd)) return NULL; - if (pmd_huge(pmd) || !pmd_present(pmd)) + if (pmd_leaf(pmd) || !pmd_present(pmd)) return (pte_t *)pmdp; if (sz == CONT_PTE_SIZE) diff --git a/arch/loongarch/mm/hugetlbpage.c b/arch/loongarch/mm/hugetlbpage.c index 1e76fcb83093..a4e78e74aa21 100644 --- a/arch/loongarch/mm/hugetlbpage.c +++ b/arch/loongarch/mm/hugetlbpage.c @@ -64,7 +64,7 @@ uint64_t pmd_to_entrylo(unsigned long pmd_val) { uint64_t val; /* PMD as PTE. Must be huge page */ - if (!pmd_huge(__pmd(pmd_val))) + if (!pmd_leaf(__pmd(pmd_val))) panic("%s", __func__); val = pmd_val ^ _PAGE_HUGE; diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c index 4106084e57d7..76f3b9c0a9f0 100644 --- a/arch/mips/mm/tlb-r4k.c +++ b/arch/mips/mm/tlb-r4k.c @@ -326,7 +326,7 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte) idx = read_c0_index(); #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT /* this could be a huge page */ - if (pmd_huge(*pmdp)) { + if (pmd_leaf(*pmdp)) { unsigned long lo; write_c0_pagemask(PM_HUGE_MASK); ptep = (pte_t *)pmdp; diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c index 9b99113cb51a..6621cfc3baf8 100644 --- a/arch/powerpc/mm/pgtable_64.c +++ b/arch/powerpc/mm/pgtable_64.c @@ -102,7 +102,7 @@ struct page *p4d_page(p4d_t p4d) { if (p4d_leaf(p4d)) { if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) - VM_WARN_ON(!p4d_huge(p4d)); + VM_WARN_ON(!p4d_leaf(p4d)); return pte_page(p4d_pte(p4d)); } return virt_to_page(p4d_pgtable(p4d)); @@ -113,7 +113,7 @@ struct page *pud_page(pud_t pud) { if (pud_leaf(pud)) { if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) - VM_WARN_ON(!pud_huge(pud)); + VM_WARN_ON(!pud_leaf(pud)); return pte_page(pud_pte(pud)); } return virt_to_page(pud_pgtable(pud)); @@ -132,7 +132,7 @@ struct page *pmd_page(pmd_t pmd) * enabled so these checks can't be used. */ if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) - VM_WARN_ON(!(pmd_leaf(pmd) || pmd_huge(pmd))); + VM_WARN_ON(!pmd_leaf(pmd)); return
[PATCH 10/13] mm/gup: Merge pXd huge mapping checks
From: Peter Xu Huge mapping checks in GUP are slightly redundant and can be simplified. pXd_huge() now is the same as pXd_leaf(). pmd_trans_huge() and pXd_devmap() should both imply pXd_leaf(). Time to merge them into one. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- mm/gup.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 802987281b2f..e2415e9789bc 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -3005,8 +3005,7 @@ static int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, unsigned lo if (!pmd_present(pmd)) return 0; - if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) || -pmd_devmap(pmd))) { + if (unlikely(pmd_leaf(pmd))) { /* See gup_pte_range() */ if (pmd_protnone(pmd)) return 0; @@ -3043,7 +3042,7 @@ static int gup_pud_range(p4d_t *p4dp, p4d_t p4d, unsigned long addr, unsigned lo next = pud_addr_end(addr, end); if (unlikely(!pud_present(pud))) return 0; - if (unlikely(pud_huge(pud) || pud_devmap(pud))) { + if (unlikely(pud_leaf(pud))) { if (!gup_huge_pud(pud, pudp, addr, next, flags, pages, nr)) return 0; @@ -3096,7 +3095,7 @@ static void gup_pgd_range(unsigned long addr, unsigned long end, next = pgd_addr_end(addr, end); if (pgd_none(pgd)) return; - if (unlikely(pgd_huge(pgd))) { + if (unlikely(pgd_leaf(pgd))) { if (!gup_huge_pgd(pgd, pgdp, addr, next, flags, pages, nr)) return; -- 2.44.0
[PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf()
From: Peter Xu PowerPC book3s 4K mostly has the same definition on both, except pXd_huge() constantly returns 0 for hash MMUs. As Michael Ellerman pointed out [1], it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so it will keep returning false. As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create such huge mappings for 4K hash MMUs. Meanwhile, the major powerpc hugetlb pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb mappings. The goal should be that we will have one API pXd_leaf() to detect all kinds of huge mappings. AFAICT we need to use the pXd_leaf() impl (rather than pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true. This helps to simplify a follow up patch to drop pXd_huge() treewide. NOTE: *_leaf() definition need to be moved before the inclusion of asm/book3s/64/pgtable-4k.h, which defines pXd_huge() with it. [1] https://lore.kernel.org/r/87v85zo6w7.fsf@mail.lhotse Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: "Aneesh Kumar K.V" Cc: "Naveen N. Rao" Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Peter Xu --- .../include/asm/book3s/64/pgtable-4k.h| 14 ++ arch/powerpc/include/asm/book3s/64/pgtable.h | 27 +-- 2 files changed, 14 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h index 48f21820afe2..92545981bb49 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h @@ -8,22 +8,12 @@ #ifdef CONFIG_HUGETLB_PAGE static inline int pmd_huge(pmd_t pmd) { - /* -* leaf pte for huge page -*/ - if (radix_enabled()) - return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE)); - return 0; + return pmd_leaf(pmd); } static inline int pud_huge(pud_t pud) { - /* -* leaf pte for huge page -*/ - if (radix_enabled()) - return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE)); - return 0; + return pud_leaf(pud); } /* diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index df66dce8306f..fd7180fded75 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -262,6 +262,18 @@ extern unsigned long __kernel_io_end; extern struct page *vmemmap; extern unsigned long pci_io_base; + +#define pmd_leaf pmd_leaf +static inline bool pmd_leaf(pmd_t pmd) +{ + return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE)); +} + +#define pud_leaf pud_leaf +static inline bool pud_leaf(pud_t pud) +{ + return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE)); +} #endif /* __ASSEMBLY__ */ #include @@ -1436,20 +1448,5 @@ static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_va return false; } -/* - * Like pmd_huge(), but works regardless of config options - */ -#define pmd_leaf pmd_leaf -static inline bool pmd_leaf(pmd_t pmd) -{ - return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE)); -} - -#define pud_leaf pud_leaf -static inline bool pud_leaf(pud_t pud) -{ - return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE)); -} - #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */ -- 2.44.0
[PATCH 08/13] mm/arm64: Merge pXd_huge() and pXd_leaf() definitions
From: Peter Xu Unlike most archs, aarch64 defines pXd_huge() and pXd_leaf() slightly differently. Redefine the pXd_huge() with pXd_leaf(). There used to be two traps for old aarch64 definitions over these APIs that I found when reading the code around, they're: (1) 4797ec2dc83a ("arm64: fix pud_huge() for 2-level pagetables") (2) 23bc8f69f0ec ("arm64: mm: fix p?d_leaf()") Define pXd_huge() with the current pXd_leaf() will make sure (2) isn't a problem (on PROT_NONE checks). To make sure it also works for (1), we move over the __PAGETABLE_PMD_FOLDED check to pud_leaf(), allowing it to constantly returning "false" for 2-level pgtables, which looks even safer to cover both now. Cc: Muchun Song Cc: Mark Salter Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Signed-off-by: Peter Xu --- arch/arm64/include/asm/pgtable.h | 4 arch/arm64/mm/hugetlbpage.c | 8 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 401087e8a43d..14d24c357c7a 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -704,7 +704,11 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd) #define pud_none(pud) (!pud_val(pud)) #define pud_bad(pud) (!pud_table(pud)) #define pud_present(pud) pte_present(pud_pte(pud)) +#ifndef __PAGETABLE_PMD_FOLDED #define pud_leaf(pud) (pud_present(pud) && !pud_table(pud)) +#else +#define pud_leaf(pud) false +#endif #define pud_valid(pud) pte_valid(pud_pte(pud)) #define pud_user(pud) pte_user(pud_pte(pud)) #define pud_user_exec(pud) pte_user_exec(pud_pte(pud)) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 0f0e10bb0a95..1234bbaef5bf 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -81,16 +81,12 @@ bool arch_hugetlb_migration_supported(struct hstate *h) int pmd_huge(pmd_t pmd) { - return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT); + return pmd_leaf(pmd); } int pud_huge(pud_t pud) { -#ifndef __PAGETABLE_PMD_FOLDED - return pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT); -#else - return 0; -#endif + return pud_leaf(pud); } static int find_num_contig(struct mm_struct *mm, unsigned long addr, -- 2.44.0
[PATCH 07/13] mm/arm: Redefine pmd_huge() with pmd_leaf()
From: Peter Xu Most of the archs already define these two APIs the same way. ARM is more complicated in two aspects: - For pXd_huge() it's always checking against !PXD_TABLE_BIT, while for pXd_leaf() it's always checking against PXD_TYPE_SECT. - SECT/TABLE bits are defined differently on 2-level v.s. 3-level ARM pgtables, which makes the whole thing even harder to follow. Luckily, the second complexity should be hidden by the pmd_leaf() implementation against 2-level v.s. 3-level headers. Invoke pmd_leaf() directly for pmd_huge(), to remove the first part of complexity. This prepares to drop pXd_huge() API globally. When at it, drop the obsolete comments - it's outdated. Cc: Russell King Cc: Shawn Guo Cc: Krzysztof Kozlowski Cc: Bjorn Andersson Cc: Arnd Bergmann Cc: Konrad Dybcio Cc: Fabio Estevam Cc: linux-arm-ker...@lists.infradead.org Signed-off-by: Peter Xu --- arch/arm/mm/hugetlbpage.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c index dd7a0277c5c0..c2fa643f6bb5 100644 --- a/arch/arm/mm/hugetlbpage.c +++ b/arch/arm/mm/hugetlbpage.c @@ -18,11 +18,6 @@ #include #include -/* - * On ARM, huge pages are backed by pmd's rather than pte's, so we do a lot - * of type casting from pmd_t * to pte_t *. - */ - int pud_huge(pud_t pud) { return 0; @@ -30,5 +25,5 @@ int pud_huge(pud_t pud) int pmd_huge(pmd_t pmd) { - return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT); + return pmd_leaf(pmd); } -- 2.44.0
[PATCH 06/13] mm/arm: Use macros to define pmd/pud helpers
From: Peter Xu It's already confusing that ARM 2-level v.s. 3-level defines SECT bit differently on pmd/puds. Always use a macro which is much clearer. Cc: Russell King Cc: Shawn Guo Cc: Krzysztof Kozlowski Cc: Bjorn Andersson Cc: Arnd Bergmann Cc: Konrad Dybcio Cc: Fabio Estevam Cc: linux-arm-ker...@lists.infradead.org Signed-off-by: Peter Xu --- arch/arm/include/asm/pgtable-2level.h | 4 ++-- arch/arm/include/asm/pgtable-3level-hwdef.h | 1 + arch/arm/include/asm/pgtable-3level.h | 4 ++-- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h index b0a262566eb9..4245c2e74720 100644 --- a/arch/arm/include/asm/pgtable-2level.h +++ b/arch/arm/include/asm/pgtable-2level.h @@ -213,8 +213,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr) #define pmd_pfn(pmd) (__phys_to_pfn(pmd_val(pmd) & PHYS_MASK)) -#define pmd_leaf(pmd) (pmd_val(pmd) & 2) -#define pmd_bad(pmd) (pmd_val(pmd) & 2) +#define pmd_leaf(pmd) (pmd_val(pmd) & PMD_TYPE_SECT) +#define pmd_bad(pmd) pmd_leaf(pmd) #define pmd_present(pmd) (pmd_val(pmd)) #define copy_pmd(pmdpd,pmdps) \ diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h index 2f35b4eddaa8..e7b666cf0060 100644 --- a/arch/arm/include/asm/pgtable-3level-hwdef.h +++ b/arch/arm/include/asm/pgtable-3level-hwdef.h @@ -14,6 +14,7 @@ * + Level 1/2 descriptor * - common */ +#define PUD_TABLE_BIT (_AT(pmdval_t, 1) << 1) #define PMD_TYPE_MASK (_AT(pmdval_t, 3) << 0) #define PMD_TYPE_FAULT (_AT(pmdval_t, 0) << 0) #define PMD_TYPE_TABLE (_AT(pmdval_t, 3) << 0) diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 4b1d9eb3908a..e7aecbef75c9 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -112,7 +112,7 @@ #ifndef __ASSEMBLY__ #define pud_none(pud) (!pud_val(pud)) -#define pud_bad(pud) (!(pud_val(pud) & 2)) +#define pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT)) #define pud_present(pud) (pud_val(pud)) #define pmd_table(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \ PMD_TYPE_TABLE) @@ -137,7 +137,7 @@ static inline pmd_t *pud_pgtable(pud_t pud) return __va(pud_val(pud) & PHYS_MASK & (s32)PAGE_MASK); } -#define pmd_bad(pmd) (!(pmd_val(pmd) & 2)) +#define pmd_bad(pmd) (!(pmd_val(pmd) & PMD_TABLE_BIT)) #define copy_pmd(pmdpd,pmdps) \ do {\ -- 2.44.0
[PATCH 05/13] mm/sparc: Change pXd_huge() behavior to exclude swap entries
From: Peter Xu Please refer to the previous patch on the reasoning for x86. Now sparc is the only architecture that will allow swap entries to be reported as pXd_huge(). After this patch, all architectures should forbid swap entries in pXd_huge(). Cc: David S. Miller Cc: Andreas Larsson Cc: sparcli...@vger.kernel.org Signed-off-by: Peter Xu --- arch/sparc/mm/hugetlbpage.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c index b432500c13a5..d31c2cec35c9 100644 --- a/arch/sparc/mm/hugetlbpage.c +++ b/arch/sparc/mm/hugetlbpage.c @@ -409,14 +409,12 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, int pmd_huge(pmd_t pmd) { - return !pmd_none(pmd) && - (pmd_val(pmd) & (_PAGE_VALID|_PAGE_PMD_HUGE)) != _PAGE_VALID; + return pmd_leaf(pmd);; } int pud_huge(pud_t pud) { - return !pud_none(pud) && - (pud_val(pud) & (_PAGE_VALID|_PAGE_PUD_HUGE)) != _PAGE_VALID; + return pud_leaf(pud); } static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd, -- 2.44.0
[PATCH 03/13] mm/gup: Check p4d presence before going on
From: Peter Xu Currently there should have no p4d swap entries so it may not matter much, however this may help us to rule out swap entries in pXd_huge() API, which will include p4d_huge(). The p4d_present() checks make it 100% clear that we won't rely on p4d_huge() for swap entries. Signed-off-by: Peter Xu --- mm/gup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 69a777f4fc5c..802987281b2f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -776,7 +776,7 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma, p4dp = p4d_offset(pgdp, address); p4d = READ_ONCE(*p4dp); - if (p4d_none(p4d)) + if (!p4d_present(p4d)) return no_page_table(vma, flags); BUILD_BUG_ON(p4d_huge(p4d)); if (unlikely(p4d_bad(p4d))) @@ -3069,7 +3069,7 @@ static int gup_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr, unsigned lo p4d_t p4d = READ_ONCE(*p4dp); next = p4d_addr_end(addr, end); - if (p4d_none(p4d)) + if (!p4d_present(p4d)) return 0; BUILD_BUG_ON(p4d_huge(p4d)); if (unlikely(is_hugepd(__hugepd(p4d_val(p4d) { -- 2.44.0
[PATCH 04/13] mm/x86: Change pXd_huge() behavior to exclude swap entries
From: Peter Xu This patch partly reverts below commits: 3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry") cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present hugepage") Right now, pXd_huge() definition across kernel is unclear. We have two groups that think differently on swap entries: - x86/sparc: Allow pXd_huge() to accept swap entries - all the rest: Doesn't allow pXd_huge() to accept swap entries This is so confusing. Since the sparc helpers seem to be added in 2016, which is after x86's (2015), so sparc could have followed a trend. x86 proposed such swap handling in 2015 to resolve hugetlb swap entries hit in GUP, but now GUP guards swap entries with !pXd_present() in all layers so we should be safe. We should define this API properly, one way or another, rather than keep them defined differently across archs. Gut feeling tells me that pXd_huge() shouldn't include swap entries, and it turns out that I am not the only one thinking so, the question was raised when the current pmd_huge() for x86 was proposed by Ville Syrjälä: https://lore.kernel.org/all/y2wq7i4lxh8iu...@intel.com/ I might also be missing something obvious, but why is it even necessary to treat PRESENT==0+PSE==0 as a huge entry? It is also questioned when Jason Gunthorpe reviewed the other patchset on swap entry handlings: https://lore.kernel.org/all/20240221125753.gq13...@nvidia.com/ Revert its meaning back to original. It shouldn't have any functional change as we should be ready with guards on !pXd_present() explicitly everywhere. Note that I also dropped the "#if CONFIG_PGTABLE_LEVELS > 2", it was there probably because it was breaking things when 3a194f3f8ad0 was proposed, according to the report here: https://lore.kernel.org/all/y2lyxitkqyajt...@intel.com/ Now we shouldn't need that. Instead of reverting to _PAGE_PSE raw check, leverage pXd_leaf(). Cc: Naoya Horiguchi Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x...@kernel.org Signed-off-by: Peter Xu --- arch/x86/mm/hugetlbpage.c | 18 -- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c index 5804bbae4f01..8362953a24ce 100644 --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -20,29 +20,19 @@ #include /* - * pmd_huge() returns 1 if @pmd is hugetlb related entry, that is normal - * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry. - * Otherwise, returns 0. + * pmd_huge() returns 1 if @pmd is hugetlb related entry. */ int pmd_huge(pmd_t pmd) { - return !pmd_none(pmd) && - (pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT; + return pmd_leaf(pmd); } /* - * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal - * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry. - * Otherwise, returns 0. + * pud_huge() returns 1 if @pud is hugetlb related entry. */ int pud_huge(pud_t pud) { -#if CONFIG_PGTABLE_LEVELS > 2 - return !pud_none(pud) && - (pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT; -#else - return 0; -#endif + return pud_leaf(pud); } #ifdef CONFIG_HUGETLB_PAGE -- 2.44.0
[PATCH 02/13] mm/gup: Cache p4d in follow_p4d_mask()
From: Peter Xu Add a variable to cache p4d in follow_p4d_mask(). It's a good practise to make sure all the following checks will have a consistent view of the entry. Signed-off-by: Peter Xu --- mm/gup.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index df83182ec72d..69a777f4fc5c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -772,16 +772,17 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma, unsigned int flags, struct follow_page_context *ctx) { - p4d_t *p4d; + p4d_t *p4dp, p4d; - p4d = p4d_offset(pgdp, address); - if (p4d_none(*p4d)) + p4dp = p4d_offset(pgdp, address); + p4d = READ_ONCE(*p4dp); + if (p4d_none(p4d)) return no_page_table(vma, flags); - BUILD_BUG_ON(p4d_huge(*p4d)); - if (unlikely(p4d_bad(*p4d))) + BUILD_BUG_ON(p4d_huge(p4d)); + if (unlikely(p4d_bad(p4d))) return no_page_table(vma, flags); - return follow_pud_mask(vma, address, p4d, flags, ctx); + return follow_pud_mask(vma, address, p4dp, flags, ctx); } /** -- 2.44.0
[PATCH 00/13] mm/treewide: Remove pXd_huge() API
From: Peter Xu [based on akpm/mm-unstable latest commit 9af2e4c429b5] v1: - Rebase, remove RFC tag - Fixed powerpc patch build issue, enhancing commit message [Michael] - Optimize patch 1 & 3 on "none || !present" check [Jason] In previous work [1], we removed the pXd_large() API, which is arch specific. This patchset further removes the hugetlb pXd_huge() API. Hugetlb was never special on creating huge mappings when compared with other huge mappings. Having a standalone API just to detect such pgtable entries is more or less redundant, especially after the pXd_leaf() API set is introduced with/without CONFIG_HUGETLB_PAGE. When looking at this problem, a few issues are also exposed that we don't have a clear definition of the *_huge() variance API. This patchset started by cleaning these issues first, then replace all *_huge() users to use *_leaf(), then drop all *_huge() code. On x86/sparc, swap entries will be reported "true" in pXd_huge(), while for all the rest archs they're reported "false" instead. This part is done in patch 1-5, in which I suspect patch 1 can be seen as a bug fix, but I'll leave that to hmm experts to decide. Besides, there are three archs (arm, arm64, powerpc) that have slightly different definitions between the *_huge() v.s. *_leaf() variances. I tackled them separately so that it'll be easier for arch experts to chim in when necessary. This part is done in patch 6-9. The final patches 10-13 do the rest on the final removal, since *_leaf() will be the ultimate API in the future, and we seem to have quite some confusions on how *_huge() APIs can be defined, provide a rich comment for *_leaf() API set to define them properly to avoid future misuse, and hopefully that'll also help new archs to start support huge mappings and avoid traps (like either swap entries, or PROT_NONE entry checks). The whole series is only lightly tested on x86, while as usual I don't have the capability to test all archs that it touches. [1] https://lore.kernel.org/r/20240305043750.93762-1-pet...@redhat.com Peter Xu (13): mm/hmm: Process pud swap entry without pud_huge() mm/gup: Cache p4d in follow_p4d_mask() mm/gup: Check p4d presence before going on mm/x86: Change pXd_huge() behavior to exclude swap entries mm/sparc: Change pXd_huge() behavior to exclude swap entries mm/arm: Use macros to define pmd/pud helpers mm/arm: Redefine pmd_huge() with pmd_leaf() mm/arm64: Merge pXd_huge() and pXd_leaf() definitions mm/powerpc: Redefine pXd_huge() with pXd_leaf() mm/gup: Merge pXd huge mapping checks mm/treewide: Replace pXd_huge() with pXd_leaf() mm/treewide: Remove pXd_huge() mm: Document pXd_leaf() API arch/arm/include/asm/pgtable-2level.h | 4 +-- arch/arm/include/asm/pgtable-3level-hwdef.h | 1 + arch/arm/include/asm/pgtable-3level.h | 6 ++-- arch/arm/mm/Makefile | 1 - arch/arm/mm/hugetlbpage.c | 34 --- arch/arm64/include/asm/pgtable.h | 6 +++- arch/arm64/mm/hugetlbpage.c | 18 ++ arch/loongarch/mm/hugetlbpage.c | 12 +-- arch/mips/include/asm/pgtable-32.h| 2 +- arch/mips/include/asm/pgtable-64.h| 2 +- arch/mips/mm/hugetlbpage.c| 10 -- arch/mips/mm/tlb-r4k.c| 2 +- arch/parisc/mm/hugetlbpage.c | 11 -- .../include/asm/book3s/64/pgtable-4k.h| 20 --- .../include/asm/book3s/64/pgtable-64k.h | 25 -- arch/powerpc/include/asm/book3s/64/pgtable.h | 27 +++ arch/powerpc/include/asm/nohash/pgtable.h | 10 -- arch/powerpc/mm/pgtable_64.c | 6 ++-- arch/riscv/mm/hugetlbpage.c | 10 -- arch/s390/mm/hugetlbpage.c| 10 -- arch/sh/mm/hugetlbpage.c | 10 -- arch/sparc/mm/hugetlbpage.c | 12 --- arch/x86/mm/hugetlbpage.c | 26 -- arch/x86/mm/pgtable.c | 4 +-- include/linux/hugetlb.h | 24 - include/linux/pgtable.h | 24 ++--- mm/gup.c | 24 ++--- mm/hmm.c | 9 ++--- mm/memory.c | 2 +- 29 files changed, 68 insertions(+), 284 deletions(-) delete mode 100644 arch/arm/mm/hugetlbpage.c -- 2.44.0
[PATCH 01/13] mm/hmm: Process pud swap entry without pud_huge()
From: Peter Xu Swap pud entries do not always return true for pud_huge() for all archs. x86 and sparc (so far) allow it, but all the rest do not accept a swap entry to be reported as pud_huge(). So it's not safe to check swap entries within pud_huge(). Check swap entries before pud_huge(), so it should be always safe. This is the only place in the kernel that (IMHO, wrongly) relies on pud_huge() to return true on pud swap entries. The plan is to cleanup pXd_huge() to only report non-swap mappings for all archs. Cc: Alistair Popple Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- mm/hmm.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/mm/hmm.c b/mm/hmm.c index 277ddcab4947..c95b9ec5d95f 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -424,7 +424,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, walk->action = ACTION_CONTINUE; pud = READ_ONCE(*pudp); - if (pud_none(pud)) { + if (!pud_present(pud)) { spin_unlock(ptl); return hmm_vma_walk_hole(start, end, -1, walk); } @@ -435,11 +435,6 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, unsigned long *hmm_pfns; unsigned long cpu_flags; - if (!pud_present(pud)) { - spin_unlock(ptl); - return hmm_vma_walk_hole(start, end, -1, walk); - } - i = (addr - range->start) >> PAGE_SHIFT; npages = (end - addr) >> PAGE_SHIFT; hmm_pfns = >hmm_pfns[i]; -- 2.44.0
Re: [PATCH v9 07/10] PCI: dwc: ep: Remove "core_init_notifier" flag
On Mon, Mar 11, 2024 at 10:54:28PM +0100, Niklas Cassel wrote: > On Mon, Mar 11, 2024 at 08:15:59PM +0530, Manivannan Sadhasivam wrote: > > > > > > I would say that it is the following change that breaks things: > > > > > > > - if (!core_init_notifier) { > > > > - ret = pci_epf_test_core_init(epf); > > > > - if (ret) > > > > - return ret; > > > > - } > > > > - > > > > > > Since without this code, pci_epf_test_core_init() will no longer be > > > called, > > > as there is currently no one that calls epf->core_init() for a EPF driver > > > after it has been bound. (For drivers that call dw_pcie_ep_init_notify() > > > in > > > .probe()) > > > > > > > Thanks a lot for testing, Niklas! > > > > > I guess one way to solve this would be for the EPC core to keep track of > > > the current EPC "core state" (up/down). If the core is "up" at EPF .bind() > > > time, notify the EPF driver directly after .bind()? > > > > > > > Yeah, that's a good solution. But I think it would be better if the EPC > > caches > > all events if the EPF drivers are not available and dispatch them once the > > bind > > happens for each EPF driver. Even though INIT_COMPLETE is the only event > > that is > > getting generated before bind() now, IMO it is better to add provision to > > catch > > other events also. > > > > Wdyt? > > I'm not sure. > What if the EPF goes up/down/up, it seems a bit silly to send all those > events to the EPF driver that will alloc+free+alloc. > > Do we know for sure that we will want to store + replay events other than > INIT_COMPLETE? > > And how many events should we store? > > > Until we can think of a good reason which events other than UP/DOWN we > can to store, I think that just storing the state as an integer in > struct pci_epc seems simpler. > Hmm, makes sense. > > Or I guess we could continue with a flag in struct pci_epc_features, > like has_perst_notifier, which would then require the EPC driver to > call both epc_notify_core_up() and epc_notify_core_down() when receiving > the PERST deassert/assert. > For a driver without the flag set, the EPC core would call > .epc_notify_core_up() after bind. (And .epc_notify_core_down() would never > be called, or it could call it before unbind().) > That way an EPF driver itself would not need any different handling > (all callbacks would always come, either triggered by an EPC driver that > has PERST GPIO irq, or triggered by the EPC core for a driver that lacks > a PERST GPIO). > For simplicity, I've just used a flag in 'struct pci_epc' to track the core_init and call the callback during bind(). But the series has grown big, so I decided to split it into two. One to address the DBI access issue and also remove the 'core_init_notifier' flag and another one to make EPF drivers more robust to handle the host reboot scenario. - Mani -- மணிவண்ணன் சதாசிவம்
Re: [PATCH v3 07/12] powerpc: Use initializer for struct vm_unmapped_area_info
On Wed, 2024-03-13 at 06:44 +, Christophe Leroy wrote: > I understand from this text that, as agreed, this patch removes the > pointless/redundant zero-init of individual members. But it is not > what > is done, see below ? Err, right. I think I decided to leave it because it was already acked and there wasn't enough discussion on the ack to be sure. I will update it.
[PATCH v1 1/1] powerpc/52xx: Replace of_gpio.h by proper one
of_gpio.h is deprecated and subject to remove. The driver doesn't use it directly, replace it with what is really being used. Signed-off-by: Andy Shevchenko --- arch/powerpc/platforms/52xx/mpc52xx_common.c | 2 -- arch/powerpc/platforms/52xx/mpc52xx_gpt.c| 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/52xx/mpc52xx_common.c b/arch/powerpc/platforms/52xx/mpc52xx_common.c index b4938e344f71..253421ffb4e5 100644 --- a/arch/powerpc/platforms/52xx/mpc52xx_common.c +++ b/arch/powerpc/platforms/52xx/mpc52xx_common.c @@ -12,12 +12,10 @@ #undef DEBUG -#include #include #include #include #include -#include #include #include #include diff --git a/arch/powerpc/platforms/52xx/mpc52xx_gpt.c b/arch/powerpc/platforms/52xx/mpc52xx_gpt.c index 581059527c36..2bd6abcdc113 100644 --- a/arch/powerpc/platforms/52xx/mpc52xx_gpt.c +++ b/arch/powerpc/platforms/52xx/mpc52xx_gpt.c @@ -48,6 +48,7 @@ * the output mode. This driver does not change the output mode setting. */ +#include #include #include #include @@ -56,7 +57,6 @@ #include #include #include -#include #include #include #include -- 2.43.0.rc1.1.gbec44491f096
Re: [PATCH] boot: simple_alloc: check after increasing memory allocation
On Mon, 19 Dec 2022 10:18:16 +0800, Li zeming wrote: > The pointer new adds judgment and should help with program robustness. > > Applied to powerpc/next. [1/1] boot: simple_alloc: check after increasing memory allocation https://git.kernel.org/powerpc/c/69b0194ccec033c208b071e019032c1919c2822d cheers
Re: [PATCH] powerpc/32: Curb objtool unannotated intra-function call warning
On Thu, 15 Dec 2022 17:22:58 +0530, Sathvika Vasireddy wrote: > objtool throws the following warning: > arch/powerpc/kexec/relocate_32.o: warning: objtool: .text+0x2bc: unannotated > intra-function call > > Fix this warning by annotating intra-function call, using > ANNOTATE_INTRA_FUNCTION_CALL macro, to indicate that the branch target > is valid. > > [...] Applied to powerpc/next. [1/1] powerpc/32: Curb objtool unannotated intra-function call warning https://git.kernel.org/powerpc/c/6035e7e35482653d6d93f35f01e1a320573d58f0 cheers
Re: [PATCH v3] powerpc: macio: Make remove callback of macio driver void returned
On Wed, 01 Feb 2023 22:36:19 +0800, Dawei Li wrote: > Commit fc7a6209d571 ("bus: Make remove callback return void") forces > bus_type::remove be void-returned, it doesn't make much sense for any > bus based driver implementing remove callbalk to return non-void to > its caller. > > This change is for macio bus based drivers. > > [...] Applied to powerpc/next. [1/1] powerpc: macio: Make remove callback of macio driver void returned https://git.kernel.org/powerpc/c/9db2235326c4b868b6e065dfa3a69011ee570848 cheers
Re: [PATCH] macintosh: adb: make adb_dev_class constant
On Tue, 05 Mar 2024 17:13:48 -0300, Ricardo B. Marliere wrote: > Since commit 43a7206b0963 ("driver core: class: make class_register() take > a const *"), the driver core allows for struct class to be in read-only > memory, so move the adb_dev_class structure to be declared at build time > placing it into read-only memory, instead of having to be dynamically > allocated at boot time. > > > [...] Applied to powerpc/next. [1/1] macintosh: adb: make adb_dev_class constant https://git.kernel.org/powerpc/c/83bc680e87292f78c6e823100e417d58a66dcb06 cheers
Re: [PATCH] powerpc: xor_vmx: Add '-mhard-float' to CFLAGS
On Sat, 27 Jan 2024 11:07:43 -0700, Nathan Chancellor wrote: > arch/powerpc/lib/xor_vmx.o is built with '-msoft-float' (from the main > powerpc Makefile) and '-maltivec' (from its CFLAGS), which causes an > error when building with clang after a recent change in main: > > error: option '-msoft-float' cannot be specified with '-maltivec' > make[6]: *** [scripts/Makefile.build:243: arch/powerpc/lib/xor_vmx.o] Error > 1 > > [...] Applied to powerpc/next. [1/1] powerpc: xor_vmx: Add '-mhard-float' to CFLAGS https://git.kernel.org/powerpc/c/35f20786c481d5ced9283ff42de5c69b65e5ed13 cheers
Re: [PATCH v2 1/5] of: Add of_machine_compatible_match()
On Thu, 14 Dec 2023 21:31:48 +1100, Michael Ellerman wrote: > We have of_machine_is_compatible() to check if a machine is compatible > with a single compatible string. However some code is able to support > multiple compatible boards, and so wants to check for one of many > compatible strings. > > So add of_machine_compatible_match() which takes a NULL terminated > array of compatible strings to check against the root node's > compatible property. > > [...] Applied to powerpc/next. [1/5] of: Add of_machine_compatible_match() https://git.kernel.org/powerpc/c/c029b22f8a98e14988f800d5c0176a9eaec3c8db [2/5] of: Change of_machine_is_compatible() to return bool https://git.kernel.org/powerpc/c/cefdb366dcbe97908b6055595a15bf7689556bf8 [3/5] of: Reimplement of_machine_is_compatible() using of_machine_compatible_match() https://git.kernel.org/powerpc/c/1ac8205f907517a306b661212496fedce79d7cc5 [4/5] powerpc/machdep: Define 'compatibles' property in ppc_md and use it https://git.kernel.org/powerpc/c/28da734d58c8d0113d0ac4f59880d94c9f249564 [5/5] powerpc: Stop using of_root https://git.kernel.org/powerpc/c/2a066ae11861257223500d7515e1541199cb7832 cheers
Re: [PATCH] powerpc/irq: Allow softirq to hardirq stack transition
On Thu, 30 Nov 2023 23:50:45 +1100, Michael Ellerman wrote: > Allow a transition from the softirq stack to the hardirq stack when > handling a hardirq. Doing so means a hardirq received while deep in > softirq processing is less likely to cause a stack overflow of the > softirq stack. > > Previously it wasn't safe to do so because irq_exit() (which initiates > softirq processing) was called on the hardirq stack. > > [...] Applied to powerpc/next. [1/1] powerpc/irq: Allow softirq to hardirq stack transition https://git.kernel.org/powerpc/c/4eb20bf34ea296f648971a8528e32cd80efcbe89 cheers
Re: [PATCH] powerpc/boot: Only free if realloc() succeeds
On Thu, 29 Feb 2024 22:51:49 +1100, Michael Ellerman wrote: > simple_realloc() frees the original buffer (ptr) even if the > reallocation failed. > > Fix it to behave like standard realloc() and only free the original > buffer if the reallocation succeeded. > > > [...] Applied to powerpc/next. [1/1] powerpc/boot: Only free if realloc() succeeds https://git.kernel.org/powerpc/c/f2d5bccaca3e8c09c9b9c8485375f7bdbb2631d2 cheers
Re: [PATCH] powerpc: Add allmodconfig for all 32-bit sub-arches
On Thu, 29 Feb 2024 22:41:08 +1100, Michael Ellerman wrote: > 32-bit powerpc kernels can be built for one of 5 sub-arches, see > Kconfig.cputype: > > PPC_BOOK3S_32: "512x/52xx/6xx/7xx/74xx/82xx/83xx/86xx" > PPC_85xx: "Freescale 85xx" > PPC_8xx: "Freescale 8xx" > 40x: "AMCC 40x" > 44x: "AMCC 44x, 46x or 47x" > > [...] Applied to powerpc/next. [1/1] powerpc: Add allmodconfig for all 32-bit sub-arches https://git.kernel.org/powerpc/c/af1ebca503f4c5bb9345dd251faaa825431ce972 cheers
Re: [PATCH] powerpc: Enable support for 32 bit MSI-X vectors
On Wed, 17 Jan 2024 15:46:32 -0600, Brian King wrote: > Some devices are not capable of addressing 64 bits > via DMA, which includes MSI-X vectors. This allows > us to ensure these devices use MSI-X vectors in > 32 bit space. > > Applied to powerpc/next. [1/1] powerpc: Enable support for 32 bit MSI-X vectors https://git.kernel.org/powerpc/c/b997bf240ebdfb36de5a138e94b77c3228507f07 cheers
Re: [PATCH] powerpc/kprobes: Handle error returned by set_memory_rox()
On Fri, 16 Feb 2024 11:13:28 +0100, Christophe Leroy wrote: > set_memory_rox() can fail. > > In case it fails, free allocated memory and return NULL. > > Applied to powerpc/next. [1/1] powerpc/kprobes: Handle error returned by set_memory_rox() https://git.kernel.org/powerpc/c/f7f18e30b468458b2611ca65d745b50edcda9f43 cheers
Re: [PATCH] powerpc/85xx: Make some pic_init functions static
On Thu, 29 Feb 2024 22:42:16 +1100, Michael Ellerman wrote: > These functions can all be static, make them so, which also fixes no > previous prototype warnings. > > Applied to powerpc/next. [1/1] powerpc/85xx: Make some pic_init functions static https://git.kernel.org/powerpc/c/3f9f3557aca2bc5335747f0ac613661fb573be54 cheers
Re: [PATCH 1/5] powerpc/64s: Move dcbt/dcbtst sequence into a macro
On Thu, 29 Feb 2024 23:25:17 +1100, Michael Ellerman wrote: > There's an almost identical code sequence to specify load/store access > hints in __copy_tofrom_user_power7(), copypage_power7() and > memcpy_power7(). > > Move the sequence into a common macro, which is passed the registers to > use as they differ slightly. > > [...] Applied to powerpc/next. [1/5] powerpc/64s: Move dcbt/dcbtst sequence into a macro https://git.kernel.org/powerpc/c/8488cdcb00fd5f238754005a43a3a7445860d344 [2/5] powerpc/64s: Use .machine power4 around dcbt https://git.kernel.org/powerpc/c/4e284e38ed586edeb8bdb2b0c544273a7f72021c [3/5] powerpc/fsl: Fix mfpmr build errors with newer binutils https://git.kernel.org/powerpc/c/5f491356b7149564ab22323ccce79c8d595bfd0c [4/5] powerpc/fsl: Modernise mt/mfpmr https://git.kernel.org/powerpc/c/f01dbd73ccf122486ad4b52e74f5505985dd6af4 [5/5] powerpc: Remove cpu-as-y completely https://git.kernel.org/powerpc/c/ca3d3aa14e7673f1b15e862b71998a4664d50ebe cheers
Re: [PATCH 1/3] powerpc/embedded6xx: Fix no previous prototype for avr_uart_send() etc.
On Tue, 05 Mar 2024 23:34:08 +1100, Michael Ellerman wrote: > Move the prototypes into mpc10x.h which is included by all the relevant > C files, fixes: > > arch/powerpc/platforms/embedded6xx/ls_uart.c:59:6: error: no previous > prototype for 'avr_uart_configure' > arch/powerpc/platforms/embedded6xx/ls_uart.c:82:6: error: no previous > prototype for 'avr_uart_send' > > > [...] Applied to powerpc/next. [1/3] powerpc/embedded6xx: Fix no previous prototype for avr_uart_send() etc. https://git.kernel.org/powerpc/c/20933531be0577cdd782216858c26150dbc7936f [2/3] powerpc/amigaone: Make several functions static https://git.kernel.org/powerpc/c/e8b1ce0e287fd1493334f3435d763aecd517afd9 [3/3] powerpc/4xx: Fix warp_gpio_leds build failure https://git.kernel.org/powerpc/c/5b9e00a6004cf837c43fdb8d5290df619de78024 cheers
Re: (subset) [PATCH 1/3] powerpc/64s: Fix get_hugepd_cache_index() build failure
On Wed, 06 Mar 2024 23:58:51 +1100, Michael Ellerman wrote: > With CONFIG_BUG=n, the 64-bit Book3S build fails with: > > arch/powerpc/include/asm/book3s/64/pgtable-64k.h: In function > 'get_hugepd_cache_index': > arch/powerpc/include/asm/book3s/64/pgtable-64k.h:51:1: error: no return > statement in function returning non-void > > Currently the body of the function is just BUG(), so when CONFIG_BUG=n > it is an empty function, leading to the error. > > [...] Patches 1 & 2 applied to powerpc/next. [1/3] powerpc/64s: Fix get_hugepd_cache_index() build failure https://git.kernel.org/powerpc/c/329105ce53437ff64b29f6c429dfe5dc2aa7b676 [2/3] powerpc/83xx: Fix build failure with FPU=n https://git.kernel.org/powerpc/c/c2e5d70cf05b48bfbd5b6625bbd0ec3052cecd5d cheers
Re: [PATCH] powerpc/pseries: Fix potential memleak in papr_get_attr()
On Thu, 08 Dec 2022 21:34:49 +0800, Qiheng Lin wrote: > `buf` is allocated in papr_get_attr(), and krealloc() of `buf` > could fail. We need to free the original `buf` in the case of failure. > > Applied to powerpc/next. [1/1] powerpc/pseries: Fix potential memleak in papr_get_attr() https://git.kernel.org/powerpc/c/cda9c0d556283e2d4adaa9960b2dc19b16156bae cheers
Re: [PATCH 1/2] powerpc: Refactor __kernel_map_pages()
On Fri, 16 Feb 2024 11:17:33 +0100, Christophe Leroy wrote: > __kernel_map_pages() is almost identical for PPC32 and RADIX. > > Refactor it. > > On PPC32 it is not needed for KFENCE, but to keep it simple > just make it similar to PPC64. > > [...] Applied to powerpc/next. [1/2] powerpc: Refactor __kernel_map_pages() https://git.kernel.org/powerpc/c/3c8016e681c5e0f5f3ad15edb4569727cd32eaff [2/2] powerpc: Don't ignore errors from set_memory_{n}p() in __kernel_map_pages() https://git.kernel.org/powerpc/c/9cbacb834b4afcb55eb8ac5115fa82fc7ede5c83 cheers
Re: [PATCH v2] powerpc/mm: Code cleanup for __hash_page_thp
On Fri, 01 Mar 2024 16:58:34 +0800, Kunwu Chan wrote: > This part was commented from commit 6d492ecc6489 > ("powerpc/THP: Add code to handle HPTE faults for hugepages") > in about 11 years before. > > If there are no plans to enable this part code in the future, > we can remove this dead code and replace with a comment > explaining what the dead code was trying to say. > > [...] Applied to powerpc/next. [1/1] powerpc/mm: Code cleanup for __hash_page_thp https://git.kernel.org/powerpc/c/d9cf600ecb7b053345aa76c1988cf374260cfdaf cheers
Re: [PATCH v2] powerpc/hv-gpci: Fix the H_GET_PERF_COUNTER_INFO hcall return value checks
On Thu, 29 Feb 2024 17:58:47 +0530, Kajol Jain wrote: > Running event > hv_gpci/dispatch_timebase_by_processor_processor_time_in_timebase_cycles,phys_processor_idx=0/ > in one of the system throws below error: > > ---Logs--- > # perf list | grep > hv_gpci/dispatch_timebase_by_processor_processor_time_in_timebase_cycles > > hv_gpci/dispatch_timebase_by_processor_processor_time_in_timebase_cycles,phys_processor_idx=?/[Kernel > PMU event] > > [...] Applied to powerpc/next. [1/1] powerpc/hv-gpci: Fix the H_GET_PERF_COUNTER_INFO hcall return value checks https://git.kernel.org/powerpc/c/ad86d7ee43b22aa2ed60fb982ae94b285c1be671 cheers
Re: [PATCH] powerpc/trace: Restrict hash_fault trace event to HASH MMU
On Fri, 16 Feb 2024 10:46:43 +0100, Christophe Leroy wrote: > 'perf list' on powerpc 8xx shows an event named "1:hash_fault". > > This event is pointless because trace_hash_fault() is called only > from mm/book3s64/hash_utils.c > > Only define it when CONFIG_PPC_64S_HASH_MMU is selected. > > [...] Applied to powerpc/next. [1/1] powerpc/trace: Restrict hash_fault trace event to HASH MMU https://git.kernel.org/powerpc/c/9e00743aba832f3f30ecb017d3345baf1f372140 cheers
Re: [PATCH] powerpc: Use user_mode() macro when possible
On Fri, 16 Feb 2024 11:10:36 +0100, Christophe Leroy wrote: > There is a nice macro to check user mode. > > Use it instead of open coding anding with MSR_PR to increase > readability and avoid having to comment what that anding is for. > > Applied to powerpc/next. [1/1] powerpc: Use user_mode() macro when possible https://git.kernel.org/powerpc/c/d5835fb60bad641dbae64fe30c02f10857bf4647 cheers
Re: [PATCH] powerpc: Implement set_memory_rox()
On Fri, 16 Feb 2024 11:12:05 +0100, Christophe Leroy wrote: > Same as x86 and s390, add set_memory_rox() to avoid doing > one pass with set_memory_ro() and a second pass with set_memory_x(). > > See commit 60463628c9e0 ("x86/mm: Implement native set_memory_rox()") > and commit 22e99fa56443 ("s390/mm: implement set_memory_rox()") for > more information. > > [...] Applied to powerpc/next. [1/1] powerpc: Implement set_memory_rox() https://git.kernel.org/powerpc/c/09ca1b11716f96461a4675eb0418d5cb97687389 cheers
Re: [RFC PATCH 3/3] pseries/iommu: Enable DDW for VFIO TCE create
Hi Shivaprasad, Shivaprasad G Bhat writes: > The commit 9d67c9433509 ("powerpc/iommu: Add \"borrowing\" > iommu_table_group_ops") implemented the "borrow" mechanism for > the pSeries SPAPR TCE. It did implement this support partially > that it left out creating the DDW if not present already. > > The patch here attempts to fix the missing gaps. > - Expose the DDW info to user by collecting it during probe. > - Create the window and the iommu table if not present during >VFIO_SPAPR_TCE_CREATE. > - Remove and recreate the window if the pageshift and window sizes >do not match. > - Restore the original window in enable_ddw() if the user had >created/modified the DDW. As there is preference for DIRECT mapping >on the host driver side, the user created window is removed. > > The changes work only for the non-SRIOV-VF scenarios for PEs having > 2 DMA windows. This crashes on powernv. Full log at https://github.com/linuxppc/linux-snowpatch/actions/runs/8253875566/job/22577897225. [0.958561][T1] pci_bus 0002:01: Configuring PE for bus [0.959699][T1] pci 0002:01 : [PE# fd] Secondary bus 0x0001 associated with PE#fd [0.961692][T1] pci 0002:01:00.0: Configured PE#fd [0.962424][T1] pci 0002:01 : [PE# fd] Setting up 32-bit TCE table at 0..8000 [0.966424][T1] IOMMU table initialized, virtual merging enabled [0.967544][T1] pci 0002:01 : [PE# fd] Setting up window#0 0.. pg=1 [0.969362][T1] pci 0002:01 : [PE# fd] Enabling 64-bit DMA bypass [0.971386][T1] pci 0002:01:00.0: Adding to iommu group 0 [0.973481][T1] BUG: Unable to handle kernel instruction fetch (NULL pointer?) [0.974388][T1] Faulting instruction address: 0x [0.975578][T1] Oops: Kernel access of bad area, sig: 11 [#1] [0.976476][T1] LE PAGE_SIZE=64K MMU=Hash SMP ERROR: Error: saw oops/warning etc. while expecting NR_CPUS=2048 NUMA PowerNV [0.97][T1] Modules linked in: [0.978570][T1] CPU: 1 PID: 1 Comm: swapper/1 Not tainted 6.8.0-rc6-g80dcb4e6d0aa #1 [0.979766][T1] Hardware name: IBM PowerNV (emulated by qemu) POWER8 0x4d0200 opal:v6.8-104-g820d43c0 PowerNV [0.981197][T1] NIP: LR: c005653c CTR: [0.982221][T1] REGS: c3687420 TRAP: 0480 Not tainted (6.8.0-rc6-g80dcb4e6d0aa) [0.983400][T1] MSR: 90009033 CR: 44004422 XER: [0.984742][T1] CFAR: c0056538 IRQMASK: 0 [0.984742][T1] GPR00: c0056520 c36876c0 c15b9800 c363ae58 [0.984742][T1] GPR04: c352f0a0 c26d4748 0001 [0.984742][T1] GPR08: c2716668 0003 8000 [0.984742][T1] GPR12: c2be c00110cc [0.984742][T1] GPR16: [0.984742][T1] GPR20: 0001 [0.984742][T1] GPR24: c14681d8 c3068a00 0001 [0.984742][T1] GPR28: c3068a00 c363ae58 c352f0a0 [0.994647][T1] NIP [] 0x0 [0.995699][T1] LR [c005653c] spapr_tce_platform_iommu_attach_dev+0x74/0xc8 [0.997399][T1] Call Trace: [0.997897][T1] [c36876c0] [c0056514] spapr_tce_platform_iommu_attach_dev+0x4c/0xc8 (unreliable) [0.999383][T1] [c3687700] [c0b383dc] __iommu_attach_device+0x44/0xfc [1.000476][T1] [c3687730] [c0b38574] __iommu_device_set_domain+0xe0/0x170 [1.001728][T1] [c36877c0] [c0b3869c] __iommu_group_set_domain_internal+0x98/0x1c0 [1.003014][T1] [c3687820] [c0b3bb10] iommu_setup_default_domain+0x544/0x650 [1.004306][T1] [c36878e0] [c0b3d3b4] __iommu_probe_device+0x5b0/0x604 [1.005500][T1] [c3687950] [c0b3d454] iommu_probe_device+0x4c/0xb0 [1.006563][T1] [c3687990] [c005648c] iommu_add_device+0x3c/0x78 [1.007590][T1] [c36879b0] [c00db920] pnv_pci_ioda_dma_dev_setup+0x168/0x73c [1.008918][T1] [c3687a60] [c00729f4] pcibios_bus_add_device+0x80/0x328 [1.010077][T1] [c3687ac0] [c0a49fa0] pci_bus_add_device+0x30/0x11c [1.011169][T1] [c3687b30] [c0a4a0e4] pci_bus_add_devices+0x58/0xb4 [1.012230][T1] [c3687b70] [c0a4a118] pci_bus_add_devices+0x8c/0xb4 [1.013301][T1] [c3687bb0] [c201a3c8] pcibios_init+0xd8/0x140 [1.014314][T1] [c3687c30] [c0010d58] do_one_initcall+0x80/0x2f8 [
Re: [PATCH v10 11/12] powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal usages
Le 13/03/2024 à 05:21, Rohan McLure a écrit : > In the new set_ptes() API, set_pte_at() (a special case of set_ptes()) > is intended to be instrumented by the page table check facility. There > are however several other routines that constitute the API for setting > page table entries, including set_pmd_at() among others. Such routines > are themselves implemented in terms of set_ptes_at(). > > A future patch providing support for page table checking on powerpc > must take care to avoid duplicate calls to > page_table_check_p{te,md,ud}_set(). Allow for assignment of pte entries > without instrumentation through the set_pte_at_unchecked() routine > introduced in this patch. > > Cause API-facing routines that call set_pte_at() to instead call > set_pte_at_unchecked(), which will remain uninstrumented by page > table check. set_ptes() is itself implemented by calls to > __set_pte_at(), so this eliminates redundant code. > > Also prefer set_pte_at_unchecked() in early-boot usages which should not be > instrumented. > > Signed-off-by: Rohan McLure > --- > v9: New patch > v10: don't reuse __set_pte_at(), as that will not apply filters. Instead > use new set_pte_at_unchecked(). Are filters needed at all in those usecases ? > --- > arch/powerpc/include/asm/pgtable.h | 2 ++ > arch/powerpc/mm/book3s64/hash_pgtable.c | 2 +- > arch/powerpc/mm/book3s64/pgtable.c | 6 +++--- > arch/powerpc/mm/book3s64/radix_pgtable.c | 8 > arch/powerpc/mm/nohash/book3e_pgtable.c | 2 +- > arch/powerpc/mm/pgtable.c| 7 +++ > arch/powerpc/mm/pgtable_32.c | 2 +- > 7 files changed, 19 insertions(+), 10 deletions(-) > > diff --git a/arch/powerpc/include/asm/pgtable.h > b/arch/powerpc/include/asm/pgtable.h > index 3741a63fb82e..6ff1d8cfa216 100644 > --- a/arch/powerpc/include/asm/pgtable.h > +++ b/arch/powerpc/include/asm/pgtable.h > @@ -44,6 +44,8 @@ struct mm_struct; > void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, > pte_t pte, unsigned int nr); > #define set_ptes set_ptes > +void set_pte_at_unchecked(struct mm_struct *mm, unsigned long addr, > + pte_t *ptep, pte_t pte); > #define update_mmu_cache(vma, addr, ptep) \ > update_mmu_cache_range(NULL, vma, addr, ptep, 1) > > diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c > b/arch/powerpc/mm/book3s64/hash_pgtable.c > index 988948d69bc1..871472f99a01 100644 > --- a/arch/powerpc/mm/book3s64/hash_pgtable.c > +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c > @@ -165,7 +165,7 @@ int hash__map_kernel_page(unsigned long ea, unsigned long > pa, pgprot_t prot) > ptep = pte_alloc_kernel(pmdp, ea); > if (!ptep) > return -ENOMEM; > - set_pte_at(_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, prot)); > + set_pte_at_unchecked(_mm, ea, ptep, pfn_pte(pa >> > PAGE_SHIFT, prot)); > } else { > /* >* If the mm subsystem is not fully up, we cannot create a > diff --git a/arch/powerpc/mm/book3s64/pgtable.c > b/arch/powerpc/mm/book3s64/pgtable.c > index 3438ab72c346..25082ab6018b 100644 > --- a/arch/powerpc/mm/book3s64/pgtable.c > +++ b/arch/powerpc/mm/book3s64/pgtable.c > @@ -116,7 +116,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr, > WARN_ON(!(pmd_large(pmd))); > #endif > trace_hugepage_set_pmd(addr, pmd_val(pmd)); > - return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd)); > + return set_pte_at_unchecked(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd)); > } > > void set_pud_at(struct mm_struct *mm, unsigned long addr, > @@ -133,7 +133,7 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr, > WARN_ON(!(pud_large(pud))); > #endif > trace_hugepage_set_pud(addr, pud_val(pud)); > - return set_pte_at(mm, addr, pudp_ptep(pudp), pud_pte(pud)); > + return set_pte_at_unchecked(mm, addr, pudp_ptep(pudp), pud_pte(pud)); > } > > static void do_serialize(void *arg) > @@ -539,7 +539,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, > unsigned long addr, > if (radix_enabled()) > return radix__ptep_modify_prot_commit(vma, addr, > ptep, old_pte, pte); > - set_pte_at(vma->vm_mm, addr, ptep, pte); > + set_pte_at_unchecked(vma->vm_mm, addr, ptep, pte); > } > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c > b/arch/powerpc/mm/book3s64/radix_pgtable.c > index 46fa46ce6526..c661e42bb2f1 100644 > --- a/arch/powerpc/mm/book3s64/radix_pgtable.c > +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c > @@ -109,7 +109,7 @@ static int early_map_kernel_page(unsigned long ea, > unsigned long pa, > ptep = pte_offset_kernel(pmdp, ea); > > set_the_pte: > - set_pte_at(_mm, ea, ptep, pfn_pte(pfn, flags)); > + set_pte_at_unchecked(_mm, ea, ptep, pfn_pte(pfn,
Re: [PATCH v10 10/12] poweprc: mm: Implement *_user_accessible_page() for ptes
Le 13/03/2024 à 05:21, Rohan McLure a écrit : > Page table checking depends on architectures providing an > implementation of p{te,md,ud}_user_accessible_page. With > refactorisations made on powerpc/mm, the pte_access_permitted() and > similar methods verify whether a userland page is accessible with the > required permissions. > > Since page table checking is the only user of > p{te,md,ud}_user_accessible_page(), implement these for all platforms, > using some of the same preliminay checks taken by pte_access_permitted() > on that platform. > > Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by pte_read()") > pte_user() is no longer required to be present on all platforms as it > may be equivalent to or implied by pte_read(). Hence implementations are > specialised. > > Signed-off-by: Rohan McLure > --- > v9: New implementation > v10: Let book3s/64 use pte_user(), but otherwise default other platforms > to using the address provided with the call to infer whether it is a > user page or not. pmd/pud variants will warn on all other platforms, as > they should not be used for user page mappings > --- > arch/powerpc/include/asm/book3s/64/pgtable.h | 19 ++ > arch/powerpc/include/asm/pgtable.h | 26 > 2 files changed, 45 insertions(+) > > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > b/arch/powerpc/include/asm/book3s/64/pgtable.h > index 382724c5e872..ca765331e21d 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > @@ -538,6 +538,12 @@ static inline bool pte_access_permitted(pte_t pte, bool > write) > return arch_pte_access_permitted(pte_val(pte), write, 0); > } > > +#define pte_user_accessible_page pte_user_accessible_page > +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) > +{ > + return pte_present(pte) && pte_user(pte); > +} > + > /* >* Conversion functions: convert a page and protection to a page entry, >* and a page entry and page directory to the page they refer to. > @@ -881,6 +887,7 @@ static inline int pud_present(pud_t pud) > > extern struct page *pud_page(pud_t pud); > extern struct page *pmd_page(pmd_t pmd); > + Garbage ? > static inline pte_t pud_pte(pud_t pud) > { > return __pte_raw(pud_raw(pud)); > @@ -926,6 +933,12 @@ static inline bool pud_access_permitted(pud_t pud, bool > write) > return pte_access_permitted(pud_pte(pud), write); > } > > +#define pud_user_accessible_page pud_user_accessible_page > +static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr) > +{ > + return pte_user_accessible_page(pud_pte(pud), addr); > +} > + If I understand what is done on arm64, you should first check pud_leaf(). Then this function could be common to all powerpc platforms, only pte_user_accessible_page() would be platform specific. > #define __p4d_raw(x)((p4d_t) { __pgd_raw(x) }) > static inline __be64 p4d_raw(p4d_t x) > { > @@ -1091,6 +1104,12 @@ static inline bool pmd_access_permitted(pmd_t pmd, > bool write) > return pte_access_permitted(pmd_pte(pmd), write); > } > > +#define pmd_user_accessible_page pmd_user_accessible_page > +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr) > +{ > + return pte_user_accessible_page(pmd_pte(pmd), addr); > +} Same, pmd_leaf() should be checked. > + > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot); > extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot); > diff --git a/arch/powerpc/include/asm/pgtable.h > b/arch/powerpc/include/asm/pgtable.h > index 13f661831333..3741a63fb82e 100644 > --- a/arch/powerpc/include/asm/pgtable.h > +++ b/arch/powerpc/include/asm/pgtable.h > @@ -227,6 +227,32 @@ static inline int pud_pfn(pud_t pud) > } > #endif > > +#ifndef pte_user_accessible_page > +#define pte_user_accessible_page pte_user_accessible_page > +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) > +{ > + return pte_present(pte) && !is_kernel_addr(addr); > +} > +#endif I would prefer to see one version in asm/book3s/32/pgtable.h and one in asm/nohash/pgtable.h and then avoid this game with ifdefs. > + > +#ifndef pmd_user_accessible_page > +#define pmd_user_accessible_page pmd_user_accessible_page > +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr) > +{ > + WARN_ONCE(1, "pmd: platform does not use pmd entries directly"); > + return false; > +} > +#endif Also check pmd_leaf() and this function on all platforms. > + > +#ifndef pud_user_accessible_page > +#define pud_user_accessible_page pud_user_accessible_page > +static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr) > +{ > + WARN_ONCE(1, "pud: platform does not use pud entries directly"); > + return false; > +} Also check pud_leaf() and this function on all
Re: [PATCH v10 09/12] powerpc: mm: Add common pud_pfn stub for all platforms
Le 13/03/2024 à 05:21, Rohan McLure a écrit : > Prior to this commit, pud_pfn was implemented with BUILD_BUG as the inline > function for 64-bit Book3S systems but is never included, as its > invocations in generic code are guarded by calls to pud_devmap which return > zero on such systems. A future patch will provide support for page table > checks, the generic code for which depends on a pud_pfn stub being > implemented, even while the patch will not interact with puds directly. > > Remove the 64-bit Book3S stub and define pud_pfn to warn on all > platforms. pud_pfn may be defined properly on a per-platform basis > should it grow real usages in future. Can you please re-explain why that's needed ? I remember we discussed it already in the past, but I checked again today and can't see the need: In mm/page_table_check.c, the call to pud_pfn() is gated by a call to pud_user_accessible_page(pud). If I look into arm64 version of pud_user_accessible_page(), it depends on pud_leaf(). When pud_leaf() is constant 0, pud_user_accessible_page() is always false and the call to pud_pfn() should be folded away. > > Signed-off-by: Rohan McLure > --- > arch/powerpc/include/asm/pgtable.h | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/arch/powerpc/include/asm/pgtable.h > b/arch/powerpc/include/asm/pgtable.h > index 0c0ffbe7a3b5..13f661831333 100644 > --- a/arch/powerpc/include/asm/pgtable.h > +++ b/arch/powerpc/include/asm/pgtable.h > @@ -213,6 +213,20 @@ static inline bool > arch_supports_memmap_on_memory(unsigned long vmemmap_size) > > #endif /* CONFIG_PPC64 */ > > +/* > + * Currently only consumed by page_table_check_pud_{set,clear}. Since clears > + * and sets to page table entries at any level are done through > + * page_table_check_pte_{set,clear}, provide stub implementation. > + */ > +#ifndef pud_pfn > +#define pud_pfn pud_pfn > +static inline int pud_pfn(pud_t pud) > +{ > + WARN_ONCE(1, "pud: platform does not use pud entries directly"); > + return 0; > +} > +#endif > + > #endif /* __ASSEMBLY__ */ > > #endif /* _ASM_POWERPC_PGTABLE_H */
Re: [PATCH v10 08/12] powerpc: mm: Replace p{u,m,4}d_is_leaf with p{u,m,4}_leaf
Hi, Le 13/03/2024 à 05:21, Rohan McLure a écrit : > Replace occurrences of p{u,m,4}d_is_leaf with p{u,m,4}_leaf, as the > latter is the name given to checking that a higher-level entry in > multi-level paging contains a page translation entry (pte) throughout > all other archs. There's already an equivalent commit in mm-stable, that will likely go into v6.9: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-stable=bd18b688220c7225fb50498dabd9f9d0c9988e67 > > Reviewed-by: Christophe Leroy > Signed-off-by: Rohan McLure > --- > v9: No longer required in order to implement page table check, just a > refactor. > v10: Fix more occurances, and just delete p{u,m,4}_is_leaf() stubs as > equivalent p{u,m,4}_leaf() stubs already exist. > --- > arch/powerpc/include/asm/book3s/64/pgtable.h | 10 > arch/powerpc/include/asm/pgtable.h | 24 > arch/powerpc/kvm/book3s_64_mmu_radix.c | 12 +- > arch/powerpc/mm/book3s64/radix_pgtable.c | 14 ++-- > arch/powerpc/mm/pgtable.c| 6 ++--- > arch/powerpc/mm/pgtable_64.c | 6 ++--- > arch/powerpc/xmon/xmon.c | 6 ++--- > 7 files changed, 26 insertions(+), 52 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > b/arch/powerpc/include/asm/book3s/64/pgtable.h > index 62c43d3d80ec..382724c5e872 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > @@ -1443,16 +1443,14 @@ static inline bool is_pte_rw_upgrade(unsigned long > old_val, unsigned long new_va > /* >* Like pmd_huge() and pmd_large(), but works regardless of config options >*/ > -#define pmd_is_leaf pmd_is_leaf > -#define pmd_leaf pmd_is_leaf > -static inline bool pmd_is_leaf(pmd_t pmd) > +#define pmd_leaf pmd_leaf > +static inline bool pmd_leaf(pmd_t pmd) > { > return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE)); > } > > -#define pud_is_leaf pud_is_leaf > -#define pud_leaf pud_is_leaf > -static inline bool pud_is_leaf(pud_t pud) > +#define pud_leaf pud_leaf > +static inline bool pud_leaf(pud_t pud) > { > return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE)); > } > diff --git a/arch/powerpc/include/asm/pgtable.h > b/arch/powerpc/include/asm/pgtable.h > index 9224f23065ff..0c0ffbe7a3b5 100644 > --- a/arch/powerpc/include/asm/pgtable.h > +++ b/arch/powerpc/include/asm/pgtable.h > @@ -180,30 +180,6 @@ static inline void pte_frag_set(mm_context_t *ctx, void > *p) > } > #endif > > -#ifndef pmd_is_leaf > -#define pmd_is_leaf pmd_is_leaf > -static inline bool pmd_is_leaf(pmd_t pmd) > -{ > - return false; > -} > -#endif > - > -#ifndef pud_is_leaf > -#define pud_is_leaf pud_is_leaf > -static inline bool pud_is_leaf(pud_t pud) > -{ > - return false; > -} > -#endif > - > -#ifndef p4d_is_leaf > -#define p4d_is_leaf p4d_is_leaf > -static inline bool p4d_is_leaf(p4d_t p4d) > -{ > - return false; > -} > -#endif > - > #define pmd_pgtable pmd_pgtable > static inline pgtable_t pmd_pgtable(pmd_t pmd) > { > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c > b/arch/powerpc/kvm/book3s_64_mmu_radix.c > index 4a1abb9f7c05..408d98f8a514 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > @@ -503,7 +503,7 @@ static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t > *pmd, bool full, > for (im = 0; im < PTRS_PER_PMD; ++im, ++p) { > if (!pmd_present(*p)) > continue; > - if (pmd_is_leaf(*p)) { > + if (pmd_leaf(*p)) { > if (full) { > pmd_clear(p); > } else { > @@ -532,7 +532,7 @@ static void kvmppc_unmap_free_pud(struct kvm *kvm, pud_t > *pud, > for (iu = 0; iu < PTRS_PER_PUD; ++iu, ++p) { > if (!pud_present(*p)) > continue; > - if (pud_is_leaf(*p)) { > + if (pud_leaf(*p)) { > pud_clear(p); > } else { > pmd_t *pmd; > @@ -635,12 +635,12 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, > pte_t pte, > new_pud = pud_alloc_one(kvm->mm, gpa); > > pmd = NULL; > - if (pud && pud_present(*pud) && !pud_is_leaf(*pud)) > + if (pud && pud_present(*pud) && !pud_leaf(*pud)) > pmd = pmd_offset(pud, gpa); > else if (level <= 1) > new_pmd = kvmppc_pmd_alloc(); > > - if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd))) > + if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_leaf(*pmd))) > new_ptep = kvmppc_pte_alloc(); > > /* Check if we might have been invalidated; let the guest retry if so */ > @@ -658,7 +658,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, > pte_t pte, > new_pud = NULL; > } > pud
Re: [PATCH net v5 1/2] soc: fsl: qbman: Always disable interrupts when taking cgr_lock
Hello: This series was applied to netdev/net.git (main) by David S. Miller : On Mon, 11 Mar 2024 12:38:29 -0400 you wrote: > smp_call_function_single disables IRQs when executing the callback. To > prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere. > This is already done by qman_update_cgr and qman_delete_cgr; fix the > other lockers. > > Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()") > CC: sta...@vger.kernel.org > Signed-off-by: Sean Anderson > Reviewed-by: Camelia Groza > Tested-by: Vladimir Oltean > > [...] Here is the summary with links: - [net,v5,1/2] soc: fsl: qbman: Always disable interrupts when taking cgr_lock https://git.kernel.org/netdev/net/c/584c2a9184a3 - [net,v5,2/2] soc: fsl: qbman: Use raw spinlock for cgr_lock https://git.kernel.org/netdev/net/c/fbec4e7fed89 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Re: [PATCH 00/14] Add support for suppressing warning backtraces
Thanks! Acked-by: Dan Carpenter regards, dan carpenter
[PATCH] KVM: PPC: Book3S HV nestedv2: Cancel pending HDEC exception
This reverts commit 180c6b072bf360b686e53d893d8dcf7dbbaec6bb ("KVM: PPC: Book3S HV nestedv2: Do not cancel pending decrementer exception") which prevented cancelling a pending HDEC exception for nestedv2 KVM guests. It was done to avoid overhead of a H_GUEST_GET_STATE hcall to read the 'HDEC expiry TB' register which was higher compared to handling extra decrementer exceptions. This overhead of reading 'HDEC expiry TB' register has been mitigated recently by the L0 hypervisor(PowerVM) by putting the value of this register in L2 guest-state output buffer on trap to L1. From there the value of this register is cached, made available in kvmhv_run_single_vcpu() to compare it against host(L1) timebase and cancel the pending hypervisor decrementer exception if needed. Fixes: 180c6b072bf3 ("KVM: PPC: Book3S HV nestedv2: Do not cancel pending decrementer exception") Signed-off-by: Vaibhav Jain --- arch/powerpc/kvm/book3s_hv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 0b921704da45..e47b954ce266 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4856,7 +4856,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, * entering a nested guest in which case the decrementer is now owned * by L2 and the L1 decrementer is provided in hdec_expires */ - if (!kvmhv_is_nestedv2() && kvmppc_core_pending_dec(vcpu) && + if (kvmppc_core_pending_dec(vcpu) && ((tb < kvmppc_dec_expires_host_tb(vcpu)) || (trap == BOOK3S_INTERRUPT_SYSCALL && kvmppc_get_gpr(vcpu, 3) == H_ENTER_NESTED))) -- 2.44.0
Re: kexec verbose dumps with 6.8 [was: [PATCH v4 1/7] kexec_file: add kexec_file flag to control debug printing]
On 03/13/24 at 06:58am, Jiri Slaby wrote: > Hi, > > > On 13. 03. 24, 1:48, Baoquan He wrote: > > Hi Jiri, > > > > On 03/12/24 at 10:58am, Jiri Slaby wrote: > > > On 13. 12. 23, 6:57, Baoquan He wrote: > > ... snip... > > > > --- a/include/linux/kexec.h > > > > +++ b/include/linux/kexec.h > > > ... > > > > @@ -500,6 +500,13 @@ static inline int > > > > crash_hotplug_memory_support(void) { return 0; } > > > >static inline unsigned int crash_get_elfcorehdr_size(void) { return > > > > 0; } > > > >#endif > > > > +extern bool kexec_file_dbg_print; > > > > + > > > > +#define kexec_dprintk(fmt, ...) > > > > \ > > > > + printk("%s" fmt,\ > > > > + kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG, \ > > > > + ##__VA_ARGS__) > > > > > > This means you dump it _always_. Only with different levels. > > > > It dumped always too with pr_debug() before, I just add a switch to > > control it's pr_info() or pr_debug(). > > Not really, see below. > > > > > > > And without any prefix whatsoever, so people see bloat like this in their > > > log now: > > > [ +0.01] 1000-0009 (1) > > > [ +0.02] 7f96d000-7f97efff (3) > > > [ +0.02] 0080-00807fff (4) > > > [ +0.01] 0080b000-0080bfff (4) > > > [ +0.02] 0081-008f (4) > > > [ +0.01] 7f97f000-7f9fefff (4) > > > [ +0.01] 7ff0-7fff (4) > > > [ +0.02] -0fff (2) > > > > On which arch are you seeing this? There should be one line above these > > range printing to tell what they are, like: > > > > E820 memmap: > > Ah this is there too. It's a lot of output, so I took it out of context, > apparently. > > > -0009a3ff (1) > > 0009a400-0009 (2) > > 000e-000f (2) > > 0010-6ff83fff (1) > > 6ff84000-7ac50fff (2) > > It should all be prefixed like kdump: or kexec: in any way. I can reproduce it now on fedora. OK, I will add kexec or something similar to prefix. Thanks. > > > > without actually knowing what that is. > > > > > > There should be nothing logged if that is not asked for and especially if > > > kexec load went fine, right? > > > > Right. Before this patch, those pr_debug() were already there. You need > > enable them to print out like add '#define DEBUG' in *.c file, or enable > > the dynamic debugging of the file or function. > > I think it's perfectly fine for DEBUG builds to print this out. And many > (all major?) distros use dyndbg, so it used to print nothing by default. > > > With this patch applied, > > you only need specify '-d' when you execute kexec command with > > kexec_file load interface, like: > > > > kexec -s -l -d /boot/vmlinuz-.img --initrd xxx.img --reuse-cmdline > > Perhaps our (SUSE) tooling passes -d? But I am seeing this every time I > boot. > > No, it does not seem so: > load.sh[915]: Starting kdump kernel load; kexec cmdline: /sbin/kexec -p > /var/lib/kdump/kernel --append=" loglevel=7 console=tty0 console=ttyS0 > video=1920x1080,1024x768,800x600 oops=panic > lsm=lockdown,capability,integrity,selinux sysrq=yes reset_devices > acpi_no_memhotplug cgroup_disable=memory nokaslr numa=off irqpoll nr_cpus=1 > root=kdump rootflags=bind rd.udev.children-max=8 disable_cpu_apicid=0 > panic=1" --initrd=/var/lib/kdump/initrd -a > > > For kexec_file load, it is not logging if not specifying '-d', unless > > you take way to make pr_debug() work in that file. > > So is -d detection malfunctioning under some circumstances? > > > > Can this be redesigned, please? > > > > Sure, after making clear what's going on with this, I will try. > > > > > > > > Actually what was wrong on the pr_debug()s? Can you simply turn them on > > > from > > > the kernel when -d is passed to kexec instead of all this? > > > > Joe suggested this during v1 reviewing: > > https://lore.kernel.org/all/1e7863ec4e4ab10b84fd0e64f30f8464d2e484a3.ca...@perches.com/T/#u > > > > > > > > ... > > > > --- a/kernel/kexec_core.c > > > > +++ b/kernel/kexec_core.c > > > > @@ -52,6 +52,8 @@ atomic_t __kexec_lock = ATOMIC_INIT(0); > > > >/* Flag to indicate we are going to kexec a new kernel */ > > > >bool kexec_in_progress = false; > > > > +bool kexec_file_dbg_print; > > > > > > Ugh, and a global flag for this? > > > > Yeah, kexec_file_dbg_print records if '-d' is specified when 'kexec' > > command executed. Anything wrong with the global flag? > > Global variables are frowned upon. To cite coding style: unless you > **really** need them. Here, it looks like you do not. I see your point, will consider and change. Thanks again.
Re: [PATCH v3 07/12] powerpc: Use initializer for struct vm_unmapped_area_info
Le 12/03/2024 à 23:28, Rick Edgecombe a écrit : > Future changes will need to add a new member to struct > vm_unmapped_area_info. This would cause trouble for any call site that > doesn't initialize the struct. Currently every caller sets each member > manually, so if new members are added they will be uninitialized and the > core code parsing the struct will see garbage in the new member. > > It could be possible to initialize the new member manually to 0 at each > call site. This and a couple other options were discussed, and a working > consensus (see links) was that in general the best way to accomplish this > would be via static initialization with designated member initiators. > Having some struct vm_unmapped_area_info instances not zero initialized > will put those sites at risk of feeding garbage into vm_unmapped_area() if > the convention is to zero initialize the struct and any new member addition > misses a call site that initializes each member manually. > > It could be possible to leave the code mostly untouched, and just change > the line: > struct vm_unmapped_area_info info > to: > struct vm_unmapped_area_info info = {}; > > However, that would leave cleanup for the members that are manually set > to zero, as it would no longer be required. > > So to be reduce the chance of bugs via uninitialized members, instead > simply continue the process to initialize the struct this way tree wide. > This will zero any unspecified members. Move the member initializers to the > struct declaration when they are known at that time. Leave the members out > that were manually initialized to zero, as this would be redundant for > designated initializers. I understand from this text that, as agreed, this patch removes the pointless/redundant zero-init of individual members. But it is not what is done, see below ? > > Signed-off-by: Rick Edgecombe > Acked-by: Michael Ellerman > Cc: Michael Ellerman > Cc: Nicholas Piggin > Cc: Christophe Leroy > Cc: Aneesh Kumar K.V > Cc: Naveen N. Rao > Cc: linuxppc-dev@lists.ozlabs.org > Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t > Link: > https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/ > --- > v3: > - Fixed spelling errors in log > - Be consistent about field vs member in log > > Hi, > > This patch was split and refactored out of a tree-wide change [0] to just > zero-init each struct vm_unmapped_area_info. The overall goal of the > series is to help shadow stack guard gaps. Currently, there is only one > arch with shadow stacks, but two more are in progress. It is compile tested > only. > > There was further discussion that this method of initializing the structs > while nice in some ways has a greater risk of introducing bugs in some of > the more complicated callers. Since this version was reviewed my arch > maintainers already, leave it as was already acknowledged. > > Thanks, > > Rick > > [0] > https://lore.kernel.org/lkml/20240226190951.3240433-6-rick.p.edgeco...@intel.com/ > --- > arch/powerpc/mm/book3s64/slice.c | 23 --- > 1 file changed, 12 insertions(+), 11 deletions(-) > > diff --git a/arch/powerpc/mm/book3s64/slice.c > b/arch/powerpc/mm/book3s64/slice.c > index c0b58afb9a47..6c7ac8c73a6c 100644 > --- a/arch/powerpc/mm/book3s64/slice.c > +++ b/arch/powerpc/mm/book3s64/slice.c > @@ -282,12 +282,12 @@ static unsigned long slice_find_area_bottomup(struct > mm_struct *mm, > { > int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT); > unsigned long found, next_end; > - struct vm_unmapped_area_info info; > - > - info.flags = 0; > - info.length = len; > - info.align_mask = PAGE_MASK & ((1ul << pshift) - 1); > - info.align_offset = 0; > + struct vm_unmapped_area_info info = { > + .flags = 0, Please remove zero-init as agreed and explained in the commit message > + .length = len, > + .align_mask = PAGE_MASK & ((1ul << pshift) - 1), > + .align_offset = 0 Same here. > + }; > /* >* Check till the allow max value for this mmap request >*/ > @@ -326,13 +326,14 @@ static unsigned long slice_find_area_topdown(struct > mm_struct *mm, > { > int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT); > unsigned long found, prev; > - struct vm_unmapped_area_info info; > + struct vm_unmapped_area_info info = { > + .flags = VM_UNMAPPED_AREA_TOPDOWN, > + .length = len, > + .align_mask = PAGE_MASK & ((1ul << pshift) - 1), > + .align_offset = 0 Same here. > + }; > unsigned long min_addr = max(PAGE_SIZE, mmap_min_addr); > > - info.flags = VM_UNMAPPED_AREA_TOPDOWN; > - info.length = len; > - info.align_mask = PAGE_MASK & ((1ul << pshift) - 1); > - info.align_offset = 0; > /* >* If we are trying to allocate above