Re: [PATCH 1/2] powerpc/perf: Fix the threshold compare group constraint for power10
> On 06-May-2022, at 11:40 AM, Kajol Jain wrote: > > Thresh compare bits for a event is used to program thresh compare > field in Monitor Mode Control Register A (MMCRA: 8-18 bits for power10). > When scheduling events as a group, all events in that group should > match value in threshold bits. Otherwise event open for the sibling > events should fail. But in the current code, incase thresh compare bits are > not valid, we are not failing in group_constraint function which can result > in invalid group schduling. > > Fix the issue by returning -1 incase event is threshold and threshold > compare value is not valid in group_constraint function. > > Patch also fixes the p10_thresh_cmp_val function to return -1, > incase threshold bits are not valid and changes corresponding check in > is_thresh_cmp_valid function to return false only when the thresh_cmp > value is less then 0. > > Thresh control bits in the event code is used to program thresh_ctl > field in Monitor Mode Control Register A (MMCRA: 48-55). In below example, > the scheduling of group events PM_MRK_INST_CMPL (3534401e0) and > PM_THRESH_MET (34340101ec) is expected to fail as both event > request different thresh control bits. > > Result before the patch changes: > > [command]# perf stat -e "{r35340401e0,r34340101ec}" sleep 1 > > Performance counter stats for 'sleep 1': > > 8,482 r35340401e0 > 0 r34340101ec > > 1.001474838 seconds time elapsed > > 0.001145000 seconds user > 0.0 seconds sys > > Result after the patch changes: > > [command]# perf stat -e "{r35340401e0,r34340101ec}" sleep 1 > > Performance counter stats for 'sleep 1': > > r35340401e0 > r34340101ec > > 1.001499607 seconds time elapsed > > 0.000204000 seconds user > 0.00076 seconds sys > > Fixes: 82d2c16b350f7 ("powerpc/perf: Adds support for programming > of Thresholding in P10") > Signed-off-by: Kajol Jain Reviewed-by: Athira Rajeev Thanks Athira > --- > arch/powerpc/perf/isa207-common.c | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/perf/isa207-common.c > b/arch/powerpc/perf/isa207-common.c > index a74d382ecbb7..013b06af6fe6 100644 > --- a/arch/powerpc/perf/isa207-common.c > +++ b/arch/powerpc/perf/isa207-common.c > @@ -108,7 +108,7 @@ static void mmcra_sdar_mode(u64 event, unsigned long > *mmcra) > *mmcra |= MMCRA_SDAR_MODE_TLB; > } > > -static u64 p10_thresh_cmp_val(u64 value) > +static int p10_thresh_cmp_val(u64 value) > { > int exp = 0; > u64 result = value; > @@ -139,7 +139,7 @@ static u64 p10_thresh_cmp_val(u64 value) >* exponent is also zero. >*/ > if (!(value & 0xC0) && exp) > - result = 0; > + result = -1; > else > result = (exp << 8) | value; > } > @@ -187,7 +187,7 @@ static bool is_thresh_cmp_valid(u64 event) > unsigned int cmp, exp; > > if (cpu_has_feature(CPU_FTR_ARCH_31)) > - return p10_thresh_cmp_val(event) != 0; > + return p10_thresh_cmp_val(event) >= 0; > > /* >* Check the mantissa upper two bits are not zero, unless the > @@ -502,7 +502,8 @@ int isa207_get_constraint(u64 event, unsigned long > *maskp, unsigned long *valp, > value |= CNST_THRESH_CTL_SEL_VAL(event >> > EVENT_THRESH_SHIFT); > mask |= p10_CNST_THRESH_CMP_MASK; > value |= > p10_CNST_THRESH_CMP_VAL(p10_thresh_cmp_val(event_config1)); > - } > + } else if (event_is_threshold(event)) > + return -1; > } else if (cpu_has_feature(CPU_FTR_ARCH_300)) { > if (event_is_threshold(event) && is_thresh_cmp_valid(event)) { > mask |= CNST_THRESH_MASK; > -- > 2.31.1 >
Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte
Le 08/05/2022 à 15:09, Baolin Wang a écrit : > > > On 5/8/2022 7:09 PM, Muchun Song wrote: >> On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote: >>> It is incorrect to use ptep_clear_flush() to nuke a hugetlb page >>> table when unmapping or migrating a hugetlb page, and will change >>> to use huge_ptep_clear_flush() instead in the following patches. >>> >>> So this is a preparation patch, which changes the >>> huge_ptep_clear_flush() >>> to return the original pte to help to nuke a hugetlb page table. >>> >>> Signed-off-by: Baolin Wang >>> Acked-by: Mike Kravetz >> >> Reviewed-by: Muchun Song > > Thanks for reviewing. > >> >> But one nit below: >> >> [...] >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>> index 8605d7e..61a21af 100644 >>> --- a/mm/hugetlb.c >>> +++ b/mm/hugetlb.c >>> @@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct >>> *mm, struct vm_area_struct *vma, >>> ClearHPageRestoreReserve(new_page); >>> /* Break COW or unshare */ >>> - huge_ptep_clear_flush(vma, haddr, ptep); >>> + (void)huge_ptep_clear_flush(vma, haddr, ptep); >> >> Why add a "(void)" here? Is there any warning if no "(void)"? >> IIUC, I think we can remove this, right? > > I did not meet any warning without the casting, but this is per Mike's > comment[1] to make the code consistent with other functions casting to > void type explicitly in hugetlb.c file. > > [1] > https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/ > As far as I understand, Mike said that you should be accompagnied with a big fat comment explaining why we ignore the returned value from huge_ptep_clear_flush(). By the way huge_ptep_clear_flush() is not declared 'must_check' so this cast is just visual polution and should be removed. In the meantime the comment suggested by Mike should be added instead. Christophe
[PATCH v3 21/25] powerpc/ftrace: Don't use copy_from_kernel_nofault() in module_trampoline_target()
module_trampoline_target() is quite a hot path used when activating/deactivating function tracer. Avoid the heavy copy_from_kernel_nofault() by doing four calls to copy_inst_from_kernel_nofault(). Use __copy_inst_from_kernel_nofault() for the 3 last calls. First call is done to copy_from_kernel_nofault() to check address is within kernel space. No risk to wrap out the top of kernel space because the last page is never mapped so if address is in last page the first copy will fails and the other ones will never be performed. And also make it notrace just like all functions that call it. Signed-off-by: Christophe Leroy --- v3: Use ppc_inst_t to fix sparse warnings and split trampoline verification in one line per instruction. --- arch/powerpc/kernel/module_32.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c index a0432ef46967..715a42f383d0 100644 --- a/arch/powerpc/kernel/module_32.c +++ b/arch/powerpc/kernel/module_32.c @@ -289,23 +289,32 @@ int apply_relocate_add(Elf32_Shdr *sechdrs, } #ifdef CONFIG_DYNAMIC_FTRACE -int module_trampoline_target(struct module *mod, unsigned long addr, -unsigned long *target) +notrace int module_trampoline_target(struct module *mod, unsigned long addr, +unsigned long *target) { - unsigned int jmp[4]; + ppc_inst_t jmp[4]; /* Find where the trampoline jumps to */ - if (copy_from_kernel_nofault(jmp, (void *)addr, sizeof(jmp))) + if (copy_inst_from_kernel_nofault(jmp, (void *)addr)) + return -EFAULT; + if (__copy_inst_from_kernel_nofault(jmp + 1, (void *)addr + 4)) + return -EFAULT; + if (__copy_inst_from_kernel_nofault(jmp + 2, (void *)addr + 8)) + return -EFAULT; + if (__copy_inst_from_kernel_nofault(jmp + 3, (void *)addr + 12)) return -EFAULT; /* verify that this is what we expect it to be */ - if ((jmp[0] & 0x) != PPC_RAW_LIS(_R12, 0) || - (jmp[1] & 0x) != PPC_RAW_ADDI(_R12, _R12, 0) || - jmp[2] != PPC_RAW_MTCTR(_R12) || - jmp[3] != PPC_RAW_BCTR()) + if ((ppc_inst_val(jmp[0]) & 0x) != PPC_RAW_LIS(_R12, 0)) + return -EINVAL; + if ((ppc_inst_val(jmp[1]) & 0x) != PPC_RAW_ADDI(_R12, _R12, 0)) + return -EINVAL; + if (ppc_inst_val(jmp[2]) != PPC_RAW_MTCTR(_R12)) + return -EINVAL; + if (ppc_inst_val(jmp[3]) != PPC_RAW_BCTR()) return -EINVAL; - addr = (jmp[1] & 0x) | ((jmp[0] & 0x) << 16); + addr = (ppc_inst_val(jmp[1]) & 0x) | ((ppc_inst_val(jmp[0]) & 0x) << 16); if (addr & 0x8000) addr -= 0x1; -- 2.35.1
[PATCH v3 23/25] powerpc/modules: Use PPC_LI macros instead of opencoding
Use PPC_LI_MASK and PPC_LI() instead of opencoding. Signed-off-by: Christophe Leroy --- v2: Utilisation de PPC_LI() et PPC_LI_MASK --- arch/powerpc/kernel/module_32.c | 11 --- arch/powerpc/kernel/module_64.c | 3 +-- 2 files changed, 5 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c index 715a42f383d0..3d47e9853f3e 100644 --- a/arch/powerpc/kernel/module_32.c +++ b/arch/powerpc/kernel/module_32.c @@ -256,9 +256,8 @@ int apply_relocate_add(Elf32_Shdr *sechdrs, value, (uint32_t)location); pr_debug("Location before: %08X.\n", *(uint32_t *)location); - value = (*(uint32_t *)location & ~0x03fc) - | ((value - (uint32_t)location) - & 0x03fc); + value = (*(uint32_t *)location & ~PPC_LI_MASK) | + PPC_LI(value - (uint32_t)location); if (patch_instruction(location, ppc_inst(value))) return -EFAULT; @@ -266,10 +265,8 @@ int apply_relocate_add(Elf32_Shdr *sechdrs, pr_debug("Location after: %08X.\n", *(uint32_t *)location); pr_debug("ie. jump to %08X+%08X = %08X\n", - *(uint32_t *)location & 0x03fc, - (uint32_t)location, - (*(uint32_t *)location & 0x03fc) - + (uint32_t)location); +*(uint32_t *)PPC_LI((uint32_t)location), (uint32_t)location, +(*(uint32_t *)PPC_LI((uint32_t)location)) + (uint32_t)location); break; case R_PPC_REL32: diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index c1d87937b962..4c844198185e 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c @@ -653,8 +653,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, } /* Only replace bits 2 through 26 */ - value = (*(uint32_t *)location & ~0x03fc) - | (value & 0x03fc); + value = (*(uint32_t *)location & ~PPC_LI_MASK) | PPC_LI(value); if (patch_instruction((u32 *)location, ppc_inst(value))) return -EFAULT; -- 2.35.1
[PATCH v3 25/25] powerpc/opcodes: Remove unused PPC_INST_XXX macros
The following PPC_INST_XXX macros are not used anymore outside ppc-opcode.h: - PPC_INST_LD - PPC_INST_STD - PPC_INST_ADDIS - PPC_INST_ADD - PPC_INST_DIVD Remove them. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/ppc-opcode.h | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 9ca8996ee1cd..b9d6f95b66e9 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -285,11 +285,6 @@ #define PPC_INST_TRECHKPT 0x7c0007dd #define PPC_INST_TRECLAIM 0x7c00075d #define PPC_INST_TSR 0x7c0005dd -#define PPC_INST_LD0xe800 -#define PPC_INST_STD 0xf800 -#define PPC_INST_ADDIS 0x3c00 -#define PPC_INST_ADD 0x7c000214 -#define PPC_INST_DIVD 0x7c0003d2 #define PPC_INST_BRANCH_COND 0x4080 /* Prefixes */ @@ -462,10 +457,10 @@ (0x10c7 | ___PPC_RT(vrt) | ___PPC_RA(vra) | ___PPC_RB(vrb) | __PPC_RC21) #define PPC_RAW_VCMPEQUB_RC(vrt, vra, vrb) \ (0x1006 | ___PPC_RT(vrt) | ___PPC_RA(vra) | ___PPC_RB(vrb) | __PPC_RC21) -#define PPC_RAW_LD(r, base, i) (PPC_INST_LD | ___PPC_RT(r) | ___PPC_RA(base) | IMM_DS(i)) +#define PPC_RAW_LD(r, base, i) (0xe800 | ___PPC_RT(r) | ___PPC_RA(base) | IMM_DS(i)) #define PPC_RAW_LWZ(r, base, i)(0x8000 | ___PPC_RT(r) | ___PPC_RA(base) | IMM_L(i)) #define PPC_RAW_LWZX(t, a, b) (0x7c2e | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b)) -#define PPC_RAW_STD(r, base, i)(PPC_INST_STD | ___PPC_RS(r) | ___PPC_RA(base) | IMM_DS(i)) +#define PPC_RAW_STD(r, base, i)(0xf800 | ___PPC_RS(r) | ___PPC_RA(base) | IMM_DS(i)) #define PPC_RAW_STDCX(s, a, b) (0x7c0001ad | ___PPC_RS(s) | ___PPC_RA(a) | ___PPC_RB(b)) #define PPC_RAW_LFSX(t, a, b) (0x7c00042e | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b)) #define PPC_RAW_STFSX(s, a, b) (0x7c00052e | ___PPC_RS(s) | ___PPC_RA(a) | ___PPC_RB(b)) @@ -476,8 +471,8 @@ #define PPC_RAW_ADDE(t, a, b) (0x7c000114 | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b)) #define PPC_RAW_ADDZE(t, a)(0x7c000194 | ___PPC_RT(t) | ___PPC_RA(a)) #define PPC_RAW_ADDME(t, a)(0x7c0001d4 | ___PPC_RT(t) | ___PPC_RA(a)) -#define PPC_RAW_ADD(t, a, b) (PPC_INST_ADD | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b)) -#define PPC_RAW_ADD_DOT(t, a, b) (PPC_INST_ADD | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b) | 0x1) +#define PPC_RAW_ADD(t, a, b) (0x7c000214 | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b)) +#define PPC_RAW_ADD_DOT(t, a, b) (0x7c000214 | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b) | 0x1) #define PPC_RAW_ADDC(t, a, b) (0x7c14 | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b)) #define PPC_RAW_ADDC_DOT(t, a, b) (0x7c14 | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b) | 0x1) #define PPC_RAW_NOP() PPC_RAW_ORI(0, 0, 0) -- 2.35.1
[PATCH v3 22/25] powerpc/inst: Remove PPC_INST_BRANCH
Convert last users of PPC_INST_BRANCH to PPC_RAW_BRANCH() And remove PPC_INST_BRANCH. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/ppc-opcode.h | 3 +-- arch/powerpc/lib/feature-fixups.c | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 3e9aa96ae74b..1871a86c5436 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -290,7 +290,6 @@ #define PPC_INST_ADDIS 0x3c00 #define PPC_INST_ADD 0x7c000214 #define PPC_INST_DIVD 0x7c0003d2 -#define PPC_INST_BRANCH0x4800 #define PPC_INST_BL0x4801 #define PPC_INST_BRANCH_COND 0x4080 @@ -575,7 +574,7 @@ #define PPC_RAW_MTSPR(spr, d) (0x7c0003a6 | ___PPC_RS(d) | __PPC_SPR(spr)) #define PPC_RAW_EIEIO()(0x7c0006ac) -#define PPC_RAW_BRANCH(addr) (PPC_INST_BRANCH | ((addr) & 0x03fc)) +#define PPC_RAW_BRANCH(offset) (0x4800 | PPC_LI(offset)) #define PPC_RAW_BL(offset) (0x4801 | PPC_LI(offset)) /* Deal with instructions that older assemblers aren't aware of */ diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index 343a78826035..993d3f31832a 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -451,7 +451,7 @@ static int __do_rfi_flush_fixups(void *data) if (types & L1D_FLUSH_FALLBACK) /* b .+16 to fallback flush */ - instrs[0] = PPC_INST_BRANCH | 16; + instrs[0] = PPC_RAW_BRANCH(16); i = 0; if (types & L1D_FLUSH_ORI) { -- 2.35.1
[PATCH v3 24/25] powerpc/inst: Remove PPC_INST_BL
Convert last users of PPC_INST_BL to PPC_RAW_BL() And remove PPC_INST_BL. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/ppc-opcode.h | 1 - arch/powerpc/net/bpf_jit.h| 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 1871a86c5436..9ca8996ee1cd 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -290,7 +290,6 @@ #define PPC_INST_ADDIS 0x3c00 #define PPC_INST_ADD 0x7c000214 #define PPC_INST_DIVD 0x7c0003d2 -#define PPC_INST_BL0x4801 #define PPC_INST_BRANCH_COND 0x4080 /* Prefixes */ diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h index 80d973da9093..a4f7880f959d 100644 --- a/arch/powerpc/net/bpf_jit.h +++ b/arch/powerpc/net/bpf_jit.h @@ -35,7 +35,7 @@ } while (0) /* bl (unconditional 'branch' with link) */ -#define PPC_BL(dest) EMIT(PPC_INST_BL | (((dest) - (unsigned long)(image + ctx->idx)) & 0x03fc)) +#define PPC_BL(dest) EMIT(PPC_RAW_BL((dest) - (unsigned long)(image + ctx->idx))) /* "cond" here covers BO:BI fields. */ #define PPC_BCC_SHORT(cond, dest)\ -- 2.35.1
[PATCH v3 05/25] powerpc/code-patching: Inline create_branch()
create_branch() is a good candidate for inlining because: - Flags can be folded in. - Range tests are likely to be already done. Hence reducing the create_branch() to only a set of instructions. So inline it. It improves ftrace activation by 10%. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/code-patching.h | 22 -- arch/powerpc/lib/code-patching.c | 20 2 files changed, 20 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index e7c5df50cb4e..4260e89f62b1 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -49,8 +49,26 @@ static inline bool is_offset_in_cond_branch_range(long offset) return offset >= -0x8000 && offset <= 0x7fff && !(offset & 0x3); } -int create_branch(ppc_inst_t *instr, const u32 *addr, - unsigned long target, int flags); +static inline int create_branch(ppc_inst_t *instr, const u32 *addr, + unsigned long target, int flags) +{ + long offset; + + *instr = ppc_inst(0); + offset = target; + if (! (flags & BRANCH_ABSOLUTE)) + offset = offset - (unsigned long)addr; + + /* Check we can represent the target in the instruction format */ + if (!is_offset_in_branch_range(offset)) + return 1; + + /* Mask out the flags and target, so they don't step on each other. */ + *instr = ppc_inst(0x4800 | (flags & 0x3) | (offset & 0x03FC)); + + return 0; +} + int create_cond_branch(ppc_inst_t *instr, const u32 *addr, unsigned long target, int flags); int patch_branch(u32 *addr, unsigned long target, int flags); diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 58262c7e447c..7adbdb05fee7 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -230,26 +230,6 @@ bool is_conditional_branch(ppc_inst_t instr) } NOKPROBE_SYMBOL(is_conditional_branch); -int create_branch(ppc_inst_t *instr, const u32 *addr, - unsigned long target, int flags) -{ - long offset; - - *instr = ppc_inst(0); - offset = target; - if (! (flags & BRANCH_ABSOLUTE)) - offset = offset - (unsigned long)addr; - - /* Check we can represent the target in the instruction format */ - if (!is_offset_in_branch_range(offset)) - return 1; - - /* Mask out the flags and target, so they don't step on each other. */ - *instr = ppc_inst(0x4800 | (flags & 0x3) | (offset & 0x03FC)); - - return 0; -} - int create_cond_branch(ppc_inst_t *instr, const u32 *addr, unsigned long target, int flags) { -- 2.35.1
[PATCH v3 03/25] powerpc/code-patching: Inline is_offset_in_{cond}_branch_range()
Test in is_offset_in_branch_range() and is_offset_in_cond_branch_range() are simple tests that are worth inlining. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/code-patching.h | 29 ++-- arch/powerpc/lib/code-patching.c | 27 -- 2 files changed, 27 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 409483b2d0ce..e7c5df50cb4e 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -22,8 +22,33 @@ #define BRANCH_SET_LINK0x1 #define BRANCH_ABSOLUTE0x2 -bool is_offset_in_branch_range(long offset); -bool is_offset_in_cond_branch_range(long offset); +/* + * Powerpc branch instruction is : + * + * 0 6 30 31 + * +-++---+---+ + * | opcode | LI |AA |LK | + * +-++---+---+ + * Where AA = 0 and LK = 0 + * + * LI is a signed 24 bits integer. The real branch offset is computed + * by: imm32 = SignExtend(LI:'0b00', 32); + * + * So the maximum forward branch should be: + * (0x007f << 2) = 0x01fc = 0x1fc + * The maximum backward branch should be: + * (0xff80 << 2) = 0xfe00 = -0x200 + */ +static inline bool is_offset_in_branch_range(long offset) +{ + return (offset >= -0x200 && offset <= 0x1fc && !(offset & 0x3)); +} + +static inline bool is_offset_in_cond_branch_range(long offset) +{ + return offset >= -0x8000 && offset <= 0x7fff && !(offset & 0x3); +} + int create_branch(ppc_inst_t *instr, const u32 *addr, unsigned long target, int flags); int create_cond_branch(ppc_inst_t *instr, const u32 *addr, diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 00c68e7fb11e..58262c7e447c 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -208,33 +208,6 @@ int patch_branch(u32 *addr, unsigned long target, int flags) return patch_instruction(addr, instr); } -bool is_offset_in_branch_range(long offset) -{ - /* -* Powerpc branch instruction is : -* -* 0 6 30 31 -* +-++---+---+ -* | opcode | LI |AA |LK | -* +-++---+---+ -* Where AA = 0 and LK = 0 -* -* LI is a signed 24 bits integer. The real branch offset is computed -* by: imm32 = SignExtend(LI:'0b00', 32); -* -* So the maximum forward branch should be: -* (0x007f << 2) = 0x01fc = 0x1fc -* The maximum backward branch should be: -* (0xff80 << 2) = 0xfe00 = -0x200 -*/ - return (offset >= -0x200 && offset <= 0x1fc && !(offset & 0x3)); -} - -bool is_offset_in_cond_branch_range(long offset) -{ - return offset >= -0x8000 && offset <= 0x7fff && !(offset & 0x3); -} - /* * Helper to check if a given instruction is a conditional branch * Derived from the conditional checks in analyse_instr() -- 2.35.1
[PATCH v3 07/25] powerpc/ftrace: Use patch_instruction() return directly
Instead of returning -EPERM when patch_instruction() fails, just return what patch_instruction returns. That simplifies ftrace_modify_code(): 0: 94 21 ff c0 stwur1,-64(r1) 4: 93 e1 00 3c stw r31,60(r1) 8: 7c 7f 1b 79 mr. r31,r3 c: 40 80 00 30 bge 3c 10: 93 c1 00 38 stw r30,56(r1) 14: 7c 9e 23 78 mr r30,r4 18: 7c a4 2b 78 mr r4,r5 1c: 80 bf 00 00 lwz r5,0(r31) 20: 7c 1e 28 40 cmplw r30,r5 24: 40 82 00 34 bne 58 28: 83 c1 00 38 lwz r30,56(r1) 2c: 7f e3 fb 78 mr r3,r31 30: 83 e1 00 3c lwz r31,60(r1) 34: 38 21 00 40 addir1,r1,64 38: 48 00 00 00 b 38 38: R_PPC_REL24 patch_instruction Before: 0: 94 21 ff c0 stwur1,-64(r1) 4: 93 e1 00 3c stw r31,60(r1) 8: 7c 7f 1b 79 mr. r31,r3 c: 40 80 00 4c bge 58 10: 93 c1 00 38 stw r30,56(r1) 14: 7c 9e 23 78 mr r30,r4 18: 7c a4 2b 78 mr r4,r5 1c: 80 bf 00 00 lwz r5,0(r31) 20: 7c 08 02 a6 mflrr0 24: 90 01 00 44 stw r0,68(r1) 28: 7c 1e 28 40 cmplw r30,r5 2c: 40 82 00 48 bne 74 30: 7f e3 fb 78 mr r3,r31 34: 48 00 00 01 bl 34 34: R_PPC_REL24 patch_instruction 38: 80 01 00 44 lwz r0,68(r1) 3c: 20 63 00 00 subfic r3,r3,0 40: 83 c1 00 38 lwz r30,56(r1) 44: 7c 63 19 10 subfe r3,r3,r3 48: 7c 08 03 a6 mtlrr0 4c: 83 e1 00 3c lwz r31,60(r1) 50: 38 21 00 40 addir1,r1,64 54: 4e 80 00 20 blr It improves ftrace activation/deactivation duration by about 3%. Modify patch_instruction() return on failure to -EPERM in order to match with ftrace expectations. Other users of patch_instruction() do not care about the exact error value returned. Signed-off-by: Christophe Leroy --- v2: Make patch_instruction() return -EPERM in case of failure --- arch/powerpc/kernel/trace/ftrace.c | 5 + arch/powerpc/lib/code-patching.c | 2 +- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 98e82fa4980f..1b05d33f96c6 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -78,10 +78,7 @@ ftrace_modify_code(unsigned long ip, ppc_inst_t old, ppc_inst_t new) } /* replace the text with the new text */ - if (patch_instruction((u32 *)ip, new)) - return -EPERM; - - return 0; + return patch_instruction((u32 *)ip, new); } /* diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 7adbdb05fee7..cd25c07df23c 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -32,7 +32,7 @@ static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr return 0; failed: - return -EFAULT; + return -EPERM; } int raw_patch_instruction(u32 *addr, ppc_inst_t instr) -- 2.35.1
[PATCH v3 08/25] powerpc: Add CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2
At the time being, we use CONFIG_CPU_LITTLE_ENDIAN and CONFIG_CPU_BIG_ENDIAN to pass -mabi=elfv1 or elfv2 to compiler, then define a PPC64_ELF_ABI_v1 or PPC64_ELF_ABI_v2 macro in asm/types.h based on _CALL_ELF define set by the compiler. Make it more straight forward with a CONFIG option that is directly usable. Signed-off-by: Christophe Leroy --- arch/powerpc/Makefile | 10 +- arch/powerpc/boot/Makefile | 2 ++ arch/powerpc/platforms/Kconfig.cputype | 6 ++ 3 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index eb541e730d3c..1ba98be84101 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -89,10 +89,10 @@ endif ifdef CONFIG_PPC64 ifndef CONFIG_CC_IS_CLANG -cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1) -cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mcall-aixdesc) -aflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1) -aflags-$(CONFIG_CPU_LITTLE_ENDIAN) += -mabi=elfv2 +cflags-$(CONFIG_PPC64_ELF_ABI_V1) += $(call cc-option,-mabi=elfv1) +cflags-$(CONFIG_PPC64_ELF_ABI_V1) += $(call cc-option,-mcall-aixdesc) +aflags-$(CONFIG_PPC64_ELF_ABI_V1) += $(call cc-option,-mabi=elfv1) +aflags-$(CONFIG_PPC64_ELF_ABI_V2) += -mabi=elfv2 endif endif @@ -141,7 +141,7 @@ endif CFLAGS-$(CONFIG_PPC64) := $(call cc-option,-mtraceback=no) ifndef CONFIG_CC_IS_CLANG -ifdef CONFIG_CPU_LITTLE_ENDIAN +ifdef CONFIG_PPC64_ELF_ABI_V2 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2,$(call cc-option,-mcall-aixdesc)) AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2) else diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 4b4827c475c6..b6d4fe04c594 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -49,6 +49,8 @@ ifdef CONFIG_CPU_BIG_ENDIAN BOOTCFLAGS += -mbig-endian else BOOTCFLAGS += -mlittle-endian +endif +ifdef CONFIG_PPC64_ELF_ABI_V2 BOOTCFLAGS += $(call cc-option,-mabi=elfv2) endif diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index e2e1fec91c6e..9bfcf972d21d 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -556,6 +556,12 @@ config CPU_LITTLE_ENDIAN endchoice +config PPC64_ELF_ABI_V1 + def_bool PPC64 && CPU_BIG_ENDIAN + +config PPC64_ELF_ABI_V2 + def_bool PPC64 && CPU_LITTLE_ENDIAN + config PPC64_BOOT_WRAPPER def_bool n depends on CPU_LITTLE_ENDIAN -- 2.35.1
[PATCH v3 11/25] powerpc/ftrace: Make __ftrace_make_{nop/call}() common to PPC32 and PPC64
Since c93d4f6ecf4b ("powerpc/ftrace: Add module_trampoline_target() for PPC32"), __ftrace_make_nop() for PPC32 is very similar to the one for PPC64. Same for __ftrace_make_call(). Make them common. Signed-off-by: Christophe Leroy --- v2: - Fixed comment to -mprofile-kernel versus -mkernel_profile - Replaced a couple of #ifdef with CONFIG_PPC64_ELF_ABI_V1 as suggested by Naveen. --- arch/powerpc/kernel/trace/ftrace.c | 108 +++-- 1 file changed, 8 insertions(+), 100 deletions(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 0b199fc9cfd3..531da4d93c58 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -114,7 +114,6 @@ static unsigned long find_bl_target(unsigned long ip, ppc_inst_t op) } #ifdef CONFIG_MODULES -#ifdef CONFIG_PPC64 static int __ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr) @@ -154,10 +153,11 @@ __ftrace_make_nop(struct module *mod, return -EINVAL; } -#ifdef CONFIG_MPROFILE_KERNEL - /* When using -mkernel_profile there is no load to jump over */ + /* When using -mprofile-kernel or PPC32 there is no load to jump over */ pop = ppc_inst(PPC_RAW_NOP()); +#ifdef CONFIG_PPC64 +#ifdef CONFIG_MPROFILE_KERNEL if (copy_inst_from_kernel_nofault(, (void *)(ip - 4))) { pr_err("Fetching instruction at %lx failed.\n", ip - 4); return -EFAULT; @@ -201,6 +201,7 @@ __ftrace_make_nop(struct module *mod, return -EINVAL; } #endif /* CONFIG_MPROFILE_KERNEL */ +#endif /* PPC64 */ if (patch_instruction((u32 *)ip, pop)) { pr_err("Patching NOP failed.\n"); @@ -209,48 +210,6 @@ __ftrace_make_nop(struct module *mod, return 0; } - -#else /* !PPC64 */ -static int -__ftrace_make_nop(struct module *mod, - struct dyn_ftrace *rec, unsigned long addr) -{ - ppc_inst_t op; - unsigned long ip = rec->ip; - unsigned long tramp, ptr; - - if (copy_from_kernel_nofault(, (void *)ip, MCOUNT_INSN_SIZE)) - return -EFAULT; - - /* Make sure that that this is still a 24bit jump */ - if (!is_bl_op(op)) { - pr_err("Not expected bl: opcode is %s\n", ppc_inst_as_str(op)); - return -EINVAL; - } - - /* lets find where the pointer goes */ - tramp = find_bl_target(ip, op); - - /* Find where the trampoline jumps to */ - if (module_trampoline_target(mod, tramp, )) { - pr_err("Failed to get trampoline target\n"); - return -EFAULT; - } - - if (ptr != addr) { - pr_err("Trampoline location %08lx does not match addr\n", - tramp); - return -EINVAL; - } - - op = ppc_inst(PPC_RAW_NOP()); - - if (patch_instruction((u32 *)ip, op)) - return -EPERM; - - return 0; -} -#endif /* PPC64 */ #endif /* CONFIG_MODULES */ static unsigned long find_ftrace_tramp(unsigned long ip) @@ -437,13 +396,12 @@ int ftrace_make_nop(struct module *mod, } #ifdef CONFIG_MODULES -#ifdef CONFIG_PPC64 /* * Examine the existing instructions for __ftrace_make_call. * They should effectively be a NOP, and follow formal constraints, * depending on the ABI. Return false if they don't. */ -#ifndef CONFIG_MPROFILE_KERNEL +#ifdef CONFIG_PPC64_ELF_ABI_V1 static int expected_nop_sequence(void *ip, ppc_inst_t op0, ppc_inst_t op1) { @@ -465,7 +423,7 @@ expected_nop_sequence(void *ip, ppc_inst_t op0, ppc_inst_t op1) static int expected_nop_sequence(void *ip, ppc_inst_t op0, ppc_inst_t op1) { - /* look for patched "NOP" on ppc64 with -mprofile-kernel */ + /* look for patched "NOP" on ppc64 with -mprofile-kernel or ppc32 */ if (!ppc_inst_equal(op0, ppc_inst(PPC_RAW_NOP( return 0; return 1; @@ -484,8 +442,10 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) if (copy_inst_from_kernel_nofault(op, ip)) return -EFAULT; +#ifdef CONFIG_PPC64_ELF_ABI_V1 if (copy_inst_from_kernel_nofault(op + 1, ip + 4)) return -EFAULT; +#endif if (!expected_nop_sequence(ip, op[0], op[1])) { pr_err("Unexpected call sequence at %p: %s %s\n", @@ -531,58 +491,6 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) return 0; } - -#else /* !CONFIG_PPC64: */ -static int -__ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) -{ - int err; - ppc_inst_t op; - u32 *ip = (u32 *)rec->ip; - struct module *mod = rec->arch.mod; - unsigned long tramp; - - /* read where this goes */ - if (copy_inst_from_kernel_nofault(, ip)) - return -EFAULT; - - /* It should be pointing to a nop */ - if (!ppc_inst_equal(op,
[PATCH v3 02/25] powerpc/ftrace: Remove redundant create_branch() calls
Since commit d5937db114e4 ("powerpc/code-patching: Fix patch_branch() return on out-of-range failure") patch_branch() fails with -ERANGE when trying to branch out of range. No need to perform the test twice. Remove redundant create_branch() calls. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/trace/ftrace.c | 20 1 file changed, 20 deletions(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 7a266fd469b7..3ce3697e8a7c 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -301,7 +301,6 @@ static int setup_mcount_compiler_tramp(unsigned long tramp) int i; ppc_inst_t op; unsigned long ptr; - ppc_inst_t instr; static unsigned long ftrace_plt_tramps[NUM_FTRACE_TRAMPS]; /* Is this a known long jump tramp? */ @@ -344,12 +343,6 @@ static int setup_mcount_compiler_tramp(unsigned long tramp) #else ptr = ppc_global_function_entry((void *)ftrace_caller); #endif - if (create_branch(, (void *)tramp, ptr, 0)) { - pr_debug("%ps is not reachable from existing mcount tramp\n", - (void *)ptr); - return -1; - } - if (patch_branch((u32 *)tramp, ptr, 0)) { pr_debug("REL24 out of range!\n"); return -1; @@ -490,7 +483,6 @@ static int __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) { ppc_inst_t op[2]; - ppc_inst_t instr; void *ip = (void *)rec->ip; unsigned long entry, ptr, tramp; struct module *mod = rec->arch.mod; @@ -539,12 +531,6 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) return -EINVAL; } - /* Ensure branch is within 24 bits */ - if (create_branch(, ip, tramp, BRANCH_SET_LINK)) { - pr_err("Branch out of range\n"); - return -EINVAL; - } - if (patch_branch(ip, tramp, BRANCH_SET_LINK)) { pr_err("REL24 out of range!\n"); return -EINVAL; @@ -770,12 +756,6 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, return -EINVAL; } - /* Ensure branch is within 24 bits */ - if (create_branch(, (u32 *)ip, tramp, BRANCH_SET_LINK)) { - pr_err("Branch out of range\n"); - return -EINVAL; - } - if (patch_branch((u32 *)ip, tramp, BRANCH_SET_LINK)) { pr_err("REL24 out of range!\n"); return -EINVAL; -- 2.35.1
[PATCH v3 01/25] powerpc/ftrace: Refactor prepare_ftrace_return()
When we have CONFIG_DYNAMIC_FTRACE_WITH_ARGS, prepare_ftrace_return() is called by ftrace_graph_func() otherwise prepare_ftrace_return() is called from assembly. Refactor prepare_ftrace_return() into a static __prepare_ftrace_return() that will be called by both prepare_ftrace_return() and ftrace_graph_func(). It will allow GCC to fold __prepare_ftrace_return() inside ftrace_graph_func(). Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/trace/ftrace.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 4ee04aacf9f1..7a266fd469b7 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -939,8 +939,8 @@ int ftrace_disable_ftrace_graph_caller(void) * Hook the return address and push it in the stack of return addrs * in current thread info. Return the address we want to divert to. */ -unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip, - unsigned long sp) +static unsigned long +__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long sp) { unsigned long return_hooker; int bit; @@ -969,7 +969,13 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip, void ftrace_graph_func(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op, struct ftrace_regs *fregs) { - fregs->regs.link = prepare_ftrace_return(parent_ip, ip, fregs->regs.gpr[1]); + fregs->regs.link = __prepare_ftrace_return(parent_ip, ip, fregs->regs.gpr[1]); +} +#else +unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip, + unsigned long sp) +{ + return __prepare_ftrace_return(parent, ip, sp); } #endif #endif /* CONFIG_FUNCTION_GRAPH_TRACER */ -- 2.35.1
[PATCH v3 10/25] powerpc: Finalise cleanup around ABI use
Now that we have CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2, get rid of all indirect detection of ABI version. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig| 2 +- arch/powerpc/Makefile | 2 +- arch/powerpc/include/asm/types.h| 8 arch/powerpc/kernel/fadump.c| 13 - arch/powerpc/kernel/ptrace/ptrace.c | 6 -- arch/powerpc/net/bpf_jit_comp64.c | 4 ++-- 6 files changed, 12 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 174edabb74fa..5514fed3f072 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -208,7 +208,7 @@ config PPC select HAVE_EFFICIENT_UNALIGNED_ACCESS if !(CPU_LITTLE_ENDIAN && POWER7_CPU) select HAVE_FAST_GUP select HAVE_FTRACE_MCOUNT_RECORD - select HAVE_FUNCTION_DESCRIPTORSif PPC64 && !CPU_LITTLE_ENDIAN + select HAVE_FUNCTION_DESCRIPTORSif PPC64_ELF_ABI_V1 select HAVE_FUNCTION_ERROR_INJECTION select HAVE_FUNCTION_GRAPH_TRACER select HAVE_FUNCTION_TRACER diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 1ba98be84101..8bd3b631f094 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -213,7 +213,7 @@ CHECKFLAGS += -m$(BITS) -D__powerpc__ -D__powerpc$(BITS)__ ifdef CONFIG_CPU_BIG_ENDIAN CHECKFLAGS += -D__BIG_ENDIAN__ else -CHECKFLAGS += -D__LITTLE_ENDIAN__ -D_CALL_ELF=2 +CHECKFLAGS += -D__LITTLE_ENDIAN__ endif ifdef CONFIG_476FPE_ERR46 diff --git a/arch/powerpc/include/asm/types.h b/arch/powerpc/include/asm/types.h index 84078c28c1a2..93157a661dcc 100644 --- a/arch/powerpc/include/asm/types.h +++ b/arch/powerpc/include/asm/types.h @@ -11,14 +11,6 @@ #include -#ifdef __powerpc64__ -#if defined(_CALL_ELF) && _CALL_ELF == 2 -#define PPC64_ELF_ABI_v2 1 -#else -#define PPC64_ELF_ABI_v1 1 -#endif -#endif /* __powerpc64__ */ - #ifndef __ASSEMBLY__ typedef __vector128 vector128; diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 65562c4a0a69..5f7224d66586 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -968,11 +968,14 @@ static int fadump_init_elfcore_header(char *bufp) elf->e_entry = 0; elf->e_phoff = sizeof(struct elfhdr); elf->e_shoff = 0; -#if defined(_CALL_ELF) - elf->e_flags = _CALL_ELF; -#else - elf->e_flags = 0; -#endif + + if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2)) + elf->e_flags = 2; + else if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V1)) + elf->e_flags = 1; + else + elf->e_flags = 0; + elf->e_ehsize = sizeof(struct elfhdr); elf->e_phentsize = sizeof(struct elf_phdr); elf->e_phnum = 0; diff --git a/arch/powerpc/kernel/ptrace/ptrace.c b/arch/powerpc/kernel/ptrace/ptrace.c index 9fbe155a9bd0..4d2dc22d4a2d 100644 --- a/arch/powerpc/kernel/ptrace/ptrace.c +++ b/arch/powerpc/kernel/ptrace/ptrace.c @@ -444,10 +444,4 @@ void __init pt_regs_check(void) * real registers. */ BUILD_BUG_ON(PT_DSCR < sizeof(struct user_pt_regs) / sizeof(unsigned long)); - -#ifdef CONFIG_PPC64_ELF_ABI_V1 - BUILD_BUG_ON(!IS_ENABLED(CONFIG_HAVE_FUNCTION_DESCRIPTORS)); -#else - BUILD_BUG_ON(IS_ENABLED(CONFIG_HAVE_FUNCTION_DESCRIPTORS)); -#endif } diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index d7b42f45669e..594c54931e20 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -126,7 +126,7 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx) { int i; - if (__is_defined(CONFIG_PPC64_ELF_ABI_V2)) + if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2)) EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, kernel_toc))); /* @@ -266,7 +266,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o int b2p_index = bpf_to_ppc(BPF_REG_3); int bpf_tailcall_prologue_size = 8; - if (__is_defined(CONFIG_PPC64_ELF_ABI_V2)) + if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2)) bpf_tailcall_prologue_size += 4; /* skip past the toc load */ /* -- 2.35.1
[PATCH v3 14/25] powerpc/ftrace: Remove ftrace_plt_tramps[]
ftrace_plt_tramps table is never filled so it is useless. Remove it. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/trace/ftrace.c | 8 1 file changed, 8 deletions(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index f89bcaa5f0fc..010a8c7ff4ac 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -250,7 +250,6 @@ static int setup_mcount_compiler_tramp(unsigned long tramp) int i; ppc_inst_t op; unsigned long ptr; - static unsigned long ftrace_plt_tramps[NUM_FTRACE_TRAMPS]; /* Is this a known long jump tramp? */ for (i = 0; i < NUM_FTRACE_TRAMPS; i++) @@ -259,13 +258,6 @@ static int setup_mcount_compiler_tramp(unsigned long tramp) else if (ftrace_tramps[i] == tramp) return 0; - /* Is this a known plt tramp? */ - for (i = 0; i < NUM_FTRACE_TRAMPS; i++) - if (!ftrace_plt_tramps[i]) - break; - else if (ftrace_plt_tramps[i] == tramp) - return -1; - /* New trampoline -- read where this goes */ if (copy_inst_from_kernel_nofault(, (void *)tramp)) { pr_debug("Fetching opcode failed.\n"); -- 2.35.1
[PATCH v3 06/25] powerpc/ftrace: Inline ftrace_modify_code()
Inlining ftrace_modify_code(), it increases a bit the size of ftrace code but brings 5% improvment on ftrace activation. Usually in C files we let gcc decide what to do but here it really help to 'help' gcc to decide to inline, thought we don't want to force it with an __always_inline that would be too much for CONFIG_CC_OPTIMIZE_FOR_SIZE. Signed-off-by: Christophe Leroy --- v2: More explanation in commit message --- arch/powerpc/kernel/trace/ftrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 41c45b9c7f39..98e82fa4980f 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -53,7 +53,7 @@ ftrace_call_replace(unsigned long ip, unsigned long addr, int link) return op; } -static int +static inline int ftrace_modify_code(unsigned long ip, ppc_inst_t old, ppc_inst_t new) { ppc_inst_t replaced; -- 2.35.1
[PATCH v3 00/25] powerpc: ftrace optimisation and cleanup and more [v3]
This series provides optimisation and cleanup of ftrace on powerpc. With this series ftrace activation is about 20% faster on an 8xx. At the end of the series come additional cleanups around ppc-opcode, that would likely conflict with this series if posted separately. Change since v2: - The only change in v3 is in patch 21, to fix sparse problems reported by the Robot. Main changes since v1 (details in after each individual patch description): - Added 3 patches (8, 9, 10) that convert PPC64_ELF_ABI_v{1/2} macros by CONFIG_PPC64_ELF_ABI_V{1/2} - Taken comments from Naveen Christophe Leroy (25): powerpc/ftrace: Refactor prepare_ftrace_return() powerpc/ftrace: Remove redundant create_branch() calls powerpc/code-patching: Inline is_offset_in_{cond}_branch_range() powerpc/ftrace: Use is_offset_in_branch_range() powerpc/code-patching: Inline create_branch() powerpc/ftrace: Inline ftrace_modify_code() powerpc/ftrace: Use patch_instruction() return directly powerpc: Add CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2 powerpc: Replace PPC64_ELF_ABI_v{1/2} by CONFIG_PPC64_ELF_ABI_V{1/2} powerpc: Finalise cleanup around ABI use powerpc/ftrace: Make __ftrace_make_{nop/call}() common to PPC32 and PPC64 powerpc/ftrace: Don't include ftrace.o for CONFIG_FTRACE_SYSCALLS powerpc/ftrace: Use CONFIG_FUNCTION_TRACER instead of CONFIG_DYNAMIC_FTRACE powerpc/ftrace: Remove ftrace_plt_tramps[] powerpc/ftrace: Use BRANCH_SET_LINK instead of value 1 powerpc/ftrace: Use PPC_RAW_xxx() macros instead of opencoding. powerpc/ftrace: Use size macro instead of opencoding powerpc/ftrace: Simplify expected_nop_sequence() powerpc/ftrace: Minimise number of #ifdefs powerpc/inst: Add __copy_inst_from_kernel_nofault() powerpc/ftrace: Don't use copy_from_kernel_nofault() in module_trampoline_target() powerpc/inst: Remove PPC_INST_BRANCH powerpc/modules: Use PPC_LI macros instead of opencoding powerpc/inst: Remove PPC_INST_BL powerpc/opcodes: Remove unused PPC_INST_XXX macros arch/powerpc/Kconfig | 2 +- arch/powerpc/Makefile| 12 +- arch/powerpc/boot/Makefile | 2 + arch/powerpc/include/asm/code-patching.h | 65 +++- arch/powerpc/include/asm/ftrace.h| 4 +- arch/powerpc/include/asm/inst.h | 13 +- arch/powerpc/include/asm/linkage.h | 2 +- arch/powerpc/include/asm/module.h| 2 - arch/powerpc/include/asm/ppc-opcode.h| 22 +- arch/powerpc/include/asm/ppc_asm.h | 4 +- arch/powerpc/include/asm/ptrace.h| 2 +- arch/powerpc/include/asm/sections.h | 24 +- arch/powerpc/include/asm/types.h | 8 - arch/powerpc/kernel/fadump.c | 13 +- arch/powerpc/kernel/head_64.S| 2 +- arch/powerpc/kernel/interrupt_64.S | 2 +- arch/powerpc/kernel/kprobes.c| 6 +- arch/powerpc/kernel/misc_64.S| 2 +- arch/powerpc/kernel/module.c | 4 +- arch/powerpc/kernel/module_32.c | 38 ++- arch/powerpc/kernel/module_64.c | 7 +- arch/powerpc/kernel/ptrace/ptrace.c | 6 - arch/powerpc/kernel/trace/Makefile | 5 +- arch/powerpc/kernel/trace/ftrace.c | 375 +++ arch/powerpc/kvm/book3s_interrupts.S | 2 +- arch/powerpc/kvm/book3s_rmhandlers.S | 2 +- arch/powerpc/lib/code-patching.c | 49 +-- arch/powerpc/lib/feature-fixups.c| 2 +- arch/powerpc/net/bpf_jit.h | 4 +- arch/powerpc/net/bpf_jit_comp.c | 2 +- arch/powerpc/net/bpf_jit_comp64.c| 4 +- arch/powerpc/platforms/Kconfig.cputype | 6 + 32 files changed, 271 insertions(+), 422 deletions(-) -- 2.35.1
[PATCH v3 18/25] powerpc/ftrace: Simplify expected_nop_sequence()
Avoid ifdefs around expected_nop_sequence(). While at it make it a bool. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/trace/ftrace.c | 22 ++ 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 346b5485e7ef..c34cb394f8a8 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -390,24 +390,14 @@ int ftrace_make_nop(struct module *mod, * They should effectively be a NOP, and follow formal constraints, * depending on the ABI. Return false if they don't. */ -#ifdef CONFIG_PPC64_ELF_ABI_V1 -static int -expected_nop_sequence(void *ip, ppc_inst_t op0, ppc_inst_t op1) -{ - if (!ppc_inst_equal(op0, ppc_inst(PPC_RAW_BRANCH(8))) || - !ppc_inst_equal(op1, ppc_inst(PPC_INST_LD_TOC))) - return 0; - return 1; -} -#else -static int -expected_nop_sequence(void *ip, ppc_inst_t op0, ppc_inst_t op1) +static bool expected_nop_sequence(void *ip, ppc_inst_t op0, ppc_inst_t op1) { - if (!ppc_inst_equal(op0, ppc_inst(PPC_RAW_NOP( - return 0; - return 1; + if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V1)) + return ppc_inst_equal(op0, ppc_inst(PPC_RAW_BRANCH(8))) && + ppc_inst_equal(op1, ppc_inst(PPC_INST_LD_TOC)); + else + return ppc_inst_equal(op0, ppc_inst(PPC_RAW_NOP())); } -#endif static int __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) -- 2.35.1
[PATCH v3 12/25] powerpc/ftrace: Don't include ftrace.o for CONFIG_FTRACE_SYSCALLS
Since commit 7bea7ac0ca01 ("powerpc/syscalls: Fix syscall tracing") ftrace.o is not needed anymore for CONFIG_FTRACE_SYSCALLS. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/trace/Makefile | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile index 542aa7a8b2b4..fc32ec30b297 100644 --- a/arch/powerpc/kernel/trace/Makefile +++ b/arch/powerpc/kernel/trace/Makefile @@ -17,7 +17,6 @@ endif obj-$(CONFIG_FUNCTION_TRACER) += ftrace_low.o obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o obj-$(CONFIG_FUNCTION_GRAPH_TRACER)+= ftrace.o -obj-$(CONFIG_FTRACE_SYSCALLS) += ftrace.o obj-$(CONFIG_TRACING) += trace_clock.o obj-$(CONFIG_PPC64)+= $(obj64-y) -- 2.35.1
[PATCH v3 19/25] powerpc/ftrace: Minimise number of #ifdefs
A lot of #ifdefs can be replaced by IS_ENABLED() Do so. This requires to have kernel_toc_addr() defined at all time as well as PPC_INST_LD_TOC and PPC_INST_STD_LR. Signed-off-by: Christophe Leroy --- v2: Moved the setup of pop outside of the big if()/else() in __ftrace_make_nop() --- arch/powerpc/include/asm/code-patching.h | 2 - arch/powerpc/include/asm/module.h| 2 - arch/powerpc/include/asm/sections.h | 24 +-- arch/powerpc/kernel/trace/ftrace.c | 182 +++ 4 files changed, 103 insertions(+), 107 deletions(-) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 8b1a10868275..3f881548fb61 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -217,7 +217,6 @@ static inline unsigned long ppc_kallsyms_lookup_name(const char *name) return addr; } -#ifdef CONFIG_PPC64 /* * Some instruction encodings commonly used in dynamic ftracing * and function live patching. @@ -234,6 +233,5 @@ static inline unsigned long ppc_kallsyms_lookup_name(const char *name) /* usually preceded by a mflr r0 */ #define PPC_INST_STD_LRPPC_RAW_STD(_R0, _R1, PPC_LR_STKOFF) -#endif /* CONFIG_PPC64 */ #endif /* _ASM_POWERPC_CODE_PATCHING_H */ diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h index 857d9ff24295..09e2ffd360bb 100644 --- a/arch/powerpc/include/asm/module.h +++ b/arch/powerpc/include/asm/module.h @@ -41,9 +41,7 @@ struct mod_arch_specific { #ifdef CONFIG_DYNAMIC_FTRACE unsigned long tramp; -#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS unsigned long tramp_regs; -#endif #endif /* List of BUG addresses, source line numbers and filenames */ diff --git a/arch/powerpc/include/asm/sections.h b/arch/powerpc/include/asm/sections.h index 8be2c491c733..6980eaeb16fe 100644 --- a/arch/powerpc/include/asm/sections.h +++ b/arch/powerpc/include/asm/sections.h @@ -29,18 +29,6 @@ extern char start_virt_trampolines[]; extern char end_virt_trampolines[]; #endif -/* - * This assumes the kernel is never compiled -mcmodel=small or - * the total .toc is always less than 64k. - */ -static inline unsigned long kernel_toc_addr(void) -{ - unsigned long toc_ptr; - - asm volatile("mr %0, 2" : "=r" (toc_ptr)); - return toc_ptr; -} - static inline int overlaps_interrupt_vector_text(unsigned long start, unsigned long end) { @@ -60,5 +48,17 @@ static inline int overlaps_kernel_text(unsigned long start, unsigned long end) #endif +/* + * This assumes the kernel is never compiled -mcmodel=small or + * the total .toc is always less than 64k. + */ +static inline unsigned long kernel_toc_addr(void) +{ + unsigned long toc_ptr; + + asm volatile("mr %0, 2" : "=r" (toc_ptr)); + return toc_ptr; +} + #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_SECTIONS_H */ diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index c34cb394f8a8..5e7a4ed7ad22 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -150,26 +150,39 @@ __ftrace_make_nop(struct module *mod, return -EINVAL; } - /* When using -mprofile-kernel or PPC32 there is no load to jump over */ - pop = ppc_inst(PPC_RAW_NOP()); + if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) { + if (copy_inst_from_kernel_nofault(, (void *)(ip - 4))) { + pr_err("Fetching instruction at %lx failed.\n", ip - 4); + return -EFAULT; + } -#ifdef CONFIG_PPC64 -#ifdef CONFIG_MPROFILE_KERNEL - if (copy_inst_from_kernel_nofault(, (void *)(ip - 4))) { - pr_err("Fetching instruction at %lx failed.\n", ip - 4); - return -EFAULT; - } + /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */ + if (!ppc_inst_equal(op, ppc_inst(PPC_RAW_MFLR(_R0))) && + !ppc_inst_equal(op, ppc_inst(PPC_INST_STD_LR))) { + pr_err("Unexpected instruction %s around bl _mcount\n", + ppc_inst_as_str(op)); + return -EINVAL; + } + } else if (IS_ENABLED(CONFIG_PPC64)) { + /* +* Check what is in the next instruction. We can see ld r2,40(r1), but +* on first pass after boot we will see mflr r0. +*/ + if (copy_inst_from_kernel_nofault(, (void *)(ip + 4))) { + pr_err("Fetching op failed.\n"); + return -EFAULT; + } - /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */ - if (!ppc_inst_equal(op, ppc_inst(PPC_RAW_MFLR(_R0))) && - !ppc_inst_equal(op, ppc_inst(PPC_INST_STD_LR))) { - pr_err("Unexpected
[PATCH v3 17/25] powerpc/ftrace: Use size macro instead of opencoding
0x8000 is SZ_2G. Use it. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/trace/ftrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index ac3f97dd1729..346b5485e7ef 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -741,7 +741,7 @@ int __init ftrace_dyn_arch_init(void) #endif long reladdr = addr - kernel_toc_addr(); - if (reladdr > 0x7FFF || reladdr < -(0x8000L)) { + if (reladdr >= SZ_2G || reladdr < -SZ_2G) { pr_err("Address of %ps out of range of kernel_toc.\n", (void *)addr); return -1; -- 2.35.1
[PATCH v3 16/25] powerpc/ftrace: Use PPC_RAW_xxx() macros instead of opencoding.
PPC_RAW_xxx() macros are self explanatory and less error prone than open coding. Use them in ftrace.c Signed-off-by: Christophe Leroy --- v2: - Replaced PPC_INST_OFFSET24_MASK by PPC_LI_MASK and added PPC_LI(). - Fix ADDI instead of ADDIS --- arch/powerpc/include/asm/ppc-opcode.h | 5 + arch/powerpc/kernel/trace/ftrace.c| 32 +-- 2 files changed, 16 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 82f1f0041c6f..3e9aa96ae74b 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -352,6 +352,10 @@ #define PPC_HIGHER(v) (((v) >> 32) & 0x) #define PPC_HIGHEST(v) (((v) >> 48) & 0x) +/* LI Field */ +#define PPC_LI_MASK0x03fc +#define PPC_LI(v) ((v) & PPC_LI_MASK) + /* * Only use the larx hint bit on 64bit CPUs. e500v1/v2 based CPUs will treat a * larx with EH set as an illegal instruction. @@ -572,6 +576,7 @@ #define PPC_RAW_EIEIO()(0x7c0006ac) #define PPC_RAW_BRANCH(addr) (PPC_INST_BRANCH | ((addr) & 0x03fc)) +#define PPC_RAW_BL(offset) (0x4801 | PPC_LI(offset)) /* Deal with instructions that older assemblers aren't aware of */ #definePPC_BCCTR_FLUSH stringify_in_c(.long PPC_INST_BCCTR_FLUSH) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index c4a68340a351..ac3f97dd1729 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -90,19 +90,19 @@ static int test_24bit_addr(unsigned long ip, unsigned long addr) static int is_bl_op(ppc_inst_t op) { - return (ppc_inst_val(op) & 0xfc03) == 0x4801; + return (ppc_inst_val(op) & ~PPC_LI_MASK) == PPC_RAW_BL(0); } static int is_b_op(ppc_inst_t op) { - return (ppc_inst_val(op) & 0xfc03) == 0x4800; + return (ppc_inst_val(op) & ~PPC_LI_MASK) == PPC_RAW_BRANCH(0); } static unsigned long find_bl_target(unsigned long ip, ppc_inst_t op) { int offset; - offset = (ppc_inst_val(op) & 0x03fc); + offset = PPC_LI(ppc_inst_val(op)); /* make it signed */ if (offset & 0x0200) offset |= 0xfe00; @@ -182,7 +182,7 @@ __ftrace_make_nop(struct module *mod, * Use a b +8 to jump over the load. */ - pop = ppc_inst(PPC_INST_BRANCH | 8);/* b +8 */ + pop = ppc_inst(PPC_RAW_BRANCH(8)); /* b +8 */ /* * Check what is in the next instruction. We can see ld r2,40(r1), but @@ -394,17 +394,8 @@ int ftrace_make_nop(struct module *mod, static int expected_nop_sequence(void *ip, ppc_inst_t op0, ppc_inst_t op1) { - /* -* We expect to see: -* -* b +8 -* ld r2,XX(r1) -* -* The load offset is different depending on the ABI. For simplicity -* just mask it out when doing the compare. -*/ - if (!ppc_inst_equal(op0, ppc_inst(0x4808)) || - (ppc_inst_val(op1) & 0x) != 0xe841) + if (!ppc_inst_equal(op0, ppc_inst(PPC_RAW_BRANCH(8))) || + !ppc_inst_equal(op1, ppc_inst(PPC_INST_LD_TOC))) return 0; return 1; } @@ -412,7 +403,6 @@ expected_nop_sequence(void *ip, ppc_inst_t op0, ppc_inst_t op1) static int expected_nop_sequence(void *ip, ppc_inst_t op0, ppc_inst_t op1) { - /* look for patched "NOP" on ppc64 with -mprofile-kernel or ppc32 */ if (!ppc_inst_equal(op0, ppc_inst(PPC_RAW_NOP( return 0; return 1; @@ -738,11 +728,11 @@ int __init ftrace_dyn_arch_init(void) int i; unsigned int *tramp[] = { ftrace_tramp_text, ftrace_tramp_init }; u32 stub_insns[] = { - 0xe98d | PACATOC, /* ld r12,PACATOC(r13) */ - 0x3d8c, /* addis r12,r12, */ - 0x398c, /* addir12,r12,*/ - 0x7d8903a6, /* mtctr r12 */ - 0x4e800420, /* bctr */ + PPC_RAW_LD(_R12, _R13, PACATOC), + PPC_RAW_ADDIS(_R12, _R12, 0), + PPC_RAW_ADDI(_R12, _R12, 0), + PPC_RAW_MTCTR(_R12), + PPC_RAW_BCTR() }; #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS unsigned long addr = ppc_global_function_entry((void *)ftrace_regs_caller); -- 2.35.1
[PATCH v3 20/25] powerpc/inst: Add __copy_inst_from_kernel_nofault()
On the same model as get_user() versus __get_user(), introduce __copy_inst_from_kernel_nofault() which doesn't check address. To be used by callers that have already checked that the adress is a kernel address. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/inst.h | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h index 80b6d74146c6..b49aae9f6f27 100644 --- a/arch/powerpc/include/asm/inst.h +++ b/arch/powerpc/include/asm/inst.h @@ -158,13 +158,10 @@ static inline char *__ppc_inst_as_str(char str[PPC_INST_STR_LEN], ppc_inst_t x) __str; \ }) -static inline int copy_inst_from_kernel_nofault(ppc_inst_t *inst, u32 *src) +static inline int __copy_inst_from_kernel_nofault(ppc_inst_t *inst, u32 *src) { unsigned int val, suffix; - if (unlikely(!is_kernel_addr((unsigned long)src))) - return -ERANGE; - /* See https://github.com/ClangBuiltLinux/linux/issues/1521 */ #if defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 14 val = suffix = 0; @@ -181,4 +178,12 @@ static inline int copy_inst_from_kernel_nofault(ppc_inst_t *inst, u32 *src) return -EFAULT; } +static inline int copy_inst_from_kernel_nofault(ppc_inst_t *inst, u32 *src) +{ + if (unlikely(!is_kernel_addr((unsigned long)src))) + return -ERANGE; + + return __copy_inst_from_kernel_nofault(inst, src); +} + #endif /* _ASM_POWERPC_INST_H */ -- 2.35.1
[PATCH v3 13/25] powerpc/ftrace: Use CONFIG_FUNCTION_TRACER instead of CONFIG_DYNAMIC_FTRACE
Since commit 0c0c52306f47 ("powerpc: Only support DYNAMIC_FTRACE not static"), CONFIG_DYNAMIC_FTRACE is always selected when CONFIG_FUNCTION_TRACER is selected. To avoid confusion and have the reader wonder what's happen when CONFIG_FUNCTION_TRACER is selected and CONFIG_DYNAMIC_FTRACE is not, use CONFIG_FUNCTION_TRACER in ifdefs instead of CONFIG_DYNAMIC_FTRACE. As CONFIG_FUNCTION_GRAPH_TRACER depends on CONFIG_FUNCTION_TRACER, ftrace.o doesn't need to appear for both symbols in Makefile. Then as ftrace.o is built only when CONFIG_FUNCTION_TRACER is selected ifdef CONFIG_FUNCTION_TRACER is not needed in ftrace.c, and since it implies CONFIG_DYNAMIC_FTRACE, CONFIG_DYNAMIC_FTRACE is not needed in ftrace.c Signed-off-by: Christophe Leroy --- v2: Limit the change to the content of arch/powerpc/kernel/trace as suggested by Naveen. --- arch/powerpc/kernel/trace/Makefile | 4 +--- arch/powerpc/kernel/trace/ftrace.c | 4 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile index fc32ec30b297..af8527538fe4 100644 --- a/arch/powerpc/kernel/trace/Makefile +++ b/arch/powerpc/kernel/trace/Makefile @@ -14,9 +14,7 @@ obj64-$(CONFIG_FUNCTION_TRACER) += ftrace_mprofile.o else obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace_64_pg.o endif -obj-$(CONFIG_FUNCTION_TRACER) += ftrace_low.o -obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o -obj-$(CONFIG_FUNCTION_GRAPH_TRACER)+= ftrace.o +obj-$(CONFIG_FUNCTION_TRACER) += ftrace_low.o ftrace.o obj-$(CONFIG_TRACING) += trace_clock.o obj-$(CONFIG_PPC64)+= $(obj64-y) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 531da4d93c58..f89bcaa5f0fc 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -28,9 +28,6 @@ #include #include - -#ifdef CONFIG_DYNAMIC_FTRACE - /* * We generally only have a single long_branch tramp and at most 2 or 3 plt * tramps generated. But, we don't use the plt tramps currently. We also allot @@ -783,7 +780,6 @@ int __init ftrace_dyn_arch_init(void) return 0; } #endif -#endif /* CONFIG_DYNAMIC_FTRACE */ #ifdef CONFIG_FUNCTION_GRAPH_TRACER -- 2.35.1
[PATCH v3 15/25] powerpc/ftrace: Use BRANCH_SET_LINK instead of value 1
To make it explicit, use BRANCH_SET_LINK instead of value 1 when calling create_branch(). Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/trace/ftrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 010a8c7ff4ac..c4a68340a351 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -45,7 +45,7 @@ ftrace_call_replace(unsigned long ip, unsigned long addr, int link) addr = ppc_function_entry((void *)addr); /* if (link) set op to 'bl' else 'b' */ - create_branch(, (u32 *)ip, addr, link ? 1 : 0); + create_branch(, (u32 *)ip, addr, link ? BRANCH_SET_LINK : 0); return op; } -- 2.35.1
[PATCH v3 09/25] powerpc: Replace PPC64_ELF_ABI_v{1/2} by CONFIG_PPC64_ELF_ABI_V{1/2}
Replace all uses of PPC64_ELF_ABI_v1 and PPC64_ELF_ABI_v2 by resp CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/code-patching.h | 12 ++-- arch/powerpc/include/asm/ftrace.h| 4 ++-- arch/powerpc/include/asm/linkage.h | 2 +- arch/powerpc/include/asm/ppc_asm.h | 4 ++-- arch/powerpc/include/asm/ptrace.h| 2 +- arch/powerpc/kernel/head_64.S| 2 +- arch/powerpc/kernel/interrupt_64.S | 2 +- arch/powerpc/kernel/kprobes.c| 6 +++--- arch/powerpc/kernel/misc_64.S| 2 +- arch/powerpc/kernel/module.c | 4 ++-- arch/powerpc/kernel/module_64.c | 4 ++-- arch/powerpc/kernel/ptrace/ptrace.c | 2 +- arch/powerpc/kernel/trace/ftrace.c | 4 ++-- arch/powerpc/kvm/book3s_interrupts.S | 2 +- arch/powerpc/kvm/book3s_rmhandlers.S | 2 +- arch/powerpc/net/bpf_jit.h | 2 +- arch/powerpc/net/bpf_jit_comp.c | 2 +- arch/powerpc/net/bpf_jit_comp64.c| 4 ++-- 18 files changed, 31 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 4260e89f62b1..8b1a10868275 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -130,7 +130,7 @@ bool is_conditional_branch(ppc_inst_t instr); static inline unsigned long ppc_function_entry(void *func) { -#ifdef PPC64_ELF_ABI_v2 +#ifdef CONFIG_PPC64_ELF_ABI_V2 u32 *insn = func; /* @@ -155,7 +155,7 @@ static inline unsigned long ppc_function_entry(void *func) return (unsigned long)(insn + 2); else return (unsigned long)func; -#elif defined(PPC64_ELF_ABI_v1) +#elif defined(CONFIG_PPC64_ELF_ABI_V1) /* * On PPC64 ABIv1 the function pointer actually points to the * function's descriptor. The first entry in the descriptor is the @@ -169,7 +169,7 @@ static inline unsigned long ppc_function_entry(void *func) static inline unsigned long ppc_global_function_entry(void *func) { -#ifdef PPC64_ELF_ABI_v2 +#ifdef CONFIG_PPC64_ELF_ABI_V2 /* PPC64 ABIv2 the global entry point is at the address */ return (unsigned long)func; #else @@ -186,7 +186,7 @@ static inline unsigned long ppc_global_function_entry(void *func) static inline unsigned long ppc_kallsyms_lookup_name(const char *name) { unsigned long addr; -#ifdef PPC64_ELF_ABI_v1 +#ifdef CONFIG_PPC64_ELF_ABI_V1 /* check for dot variant */ char dot_name[1 + KSYM_NAME_LEN]; bool dot_appended = false; @@ -207,7 +207,7 @@ static inline unsigned long ppc_kallsyms_lookup_name(const char *name) if (!addr && dot_appended) /* Let's try the original non-dot symbol lookup */ addr = kallsyms_lookup_name(name); -#elif defined(PPC64_ELF_ABI_v2) +#elif defined(CONFIG_PPC64_ELF_ABI_V2) addr = kallsyms_lookup_name(name); if (addr) addr = ppc_function_entry((void *)addr); @@ -224,7 +224,7 @@ static inline unsigned long ppc_kallsyms_lookup_name(const char *name) */ /* This must match the definition of STK_GOT in */ -#ifdef PPC64_ELF_ABI_v2 +#ifdef CONFIG_PPC64_ELF_ABI_V2 #define R2_STACK_OFFSET 24 #else #define R2_STACK_OFFSET 40 diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h index d83758acd1c7..b56166b7ea68 100644 --- a/arch/powerpc/include/asm/ftrace.h +++ b/arch/powerpc/include/asm/ftrace.h @@ -64,7 +64,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip, * those. */ #define ARCH_HAS_SYSCALL_MATCH_SYM_NAME -#ifdef PPC64_ELF_ABI_v1 +#ifdef CONFIG_PPC64_ELF_ABI_V1 static inline bool arch_syscall_match_sym_name(const char *sym, const char *name) { /* We need to skip past the initial dot, and the __se_sys alias */ @@ -83,7 +83,7 @@ static inline bool arch_syscall_match_sym_name(const char *sym, const char *name (!strncmp(sym, "ppc32_", 6) && !strcmp(sym + 6, name + 4)) || (!strncmp(sym, "ppc64_", 6) && !strcmp(sym + 6, name + 4)); } -#endif /* PPC64_ELF_ABI_v1 */ +#endif /* CONFIG_PPC64_ELF_ABI_V1 */ #endif /* CONFIG_FTRACE_SYSCALLS */ #ifdef CONFIG_PPC64 diff --git a/arch/powerpc/include/asm/linkage.h b/arch/powerpc/include/asm/linkage.h index 1f00d2891d69..b71b9582e754 100644 --- a/arch/powerpc/include/asm/linkage.h +++ b/arch/powerpc/include/asm/linkage.h @@ -4,7 +4,7 @@ #include -#ifdef PPC64_ELF_ABI_v1 +#ifdef CONFIG_PPC64_ELF_ABI_V1 #define cond_syscall(x) \ asm ("\t.weak " #x "\n\t.set " #x ", sys_ni_syscall\n" \ "\t.weak ." #x "\n\t.set ." #x ", .sys_ni_syscall\n") diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h index 4dea2d963738..83c02f5a7f2a 100644 --- a/arch/powerpc/include/asm/ppc_asm.h +++
[PATCH v3 04/25] powerpc/ftrace: Use is_offset_in_branch_range()
Use is_offset_in_branch_range() instead of create_branch() to check if a target is within branch range. This patch together with the previous one improves ftrace activation time by 7% Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/trace/ftrace.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 3ce3697e8a7c..41c45b9c7f39 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -89,11 +89,9 @@ ftrace_modify_code(unsigned long ip, ppc_inst_t old, ppc_inst_t new) */ static int test_24bit_addr(unsigned long ip, unsigned long addr) { - ppc_inst_t op; addr = ppc_function_entry((void *)addr); - /* use the create_branch to verify that this offset can be branched */ - return create_branch(, (u32 *)ip, addr, 0) == 0; + return is_offset_in_branch_range(addr - ip); } static int is_bl_op(ppc_inst_t op) @@ -261,7 +259,6 @@ __ftrace_make_nop(struct module *mod, static unsigned long find_ftrace_tramp(unsigned long ip) { int i; - ppc_inst_t instr; /* * We have the compiler generated long_branch tramps at the end @@ -270,8 +267,7 @@ static unsigned long find_ftrace_tramp(unsigned long ip) for (i = NUM_FTRACE_TRAMPS - 1; i >= 0; i--) if (!ftrace_tramps[i]) continue; - else if (create_branch(, (void *)ip, - ftrace_tramps[i], 0) == 0) + else if (is_offset_in_branch_range(ftrace_tramps[i] - ip)) return ftrace_tramps[i]; return 0; -- 2.35.1
Re: [PATCH kernel] powerpc/llvm/lto: Allow LLVM LTO builds
On 5/4/22 07:21, Nick Desaulniers wrote: On Thu, Apr 28, 2022 at 11:46 PM Alexey Kardashevskiy wrote: This enables LTO_CLANG builds on POWER with the upstream version of LLVM. LTO optimizes the output vmlinux binary and this may affect the FTP alternative section if alt branches use "bc" (Branch Conditional) which is limited by 16 bit offsets. This shows up in errors like: ld.lld: error: InputSection too large for range extension thunk vmlinux.o:(__ftr_alt_97+0xF0) This works around the issue by replacing "bc" in FTR_SECTION_ELSE with "b" which allows 26 bit offsets. This catches the problem instructions in vmlinux.o before it LTO'ed: $ objdump -d -M raw -j __ftr_alt_97 vmlinux.o | egrep '\S+\s*\' 30: 00 00 82 40 bc 4,eq,30 <__ftr_alt_97+0x30> f0: 00 00 82 40 bc 4,eq,f0 <__ftr_alt_97+0xf0> This allows LTO builds for ppc64le_defconfig plus LTO options. Note that DYNAMIC_FTRACE/FUNCTION_TRACER is not supported by LTO builds but this is not POWERPC-specific. $ ARCH=powerpc make LLVM=1 -j72 ppc64le_defconfig $ ARCH=powerpc make LLVM=1 -j72 menuconfig $ ARCH=powerpc make LLVM=1 -j72 ... VDSO64L arch/powerpc/kernel/vdso/vdso64.so.dbg /usr/bin/powerpc64le-linux-gnu-ld: /android0/llvm-project/llvm/build/bin/../lib/LLVMgold.so: error loading plugin: /android0/llvm-project/llvm/build/bin/../lib/LLVMgold.so: cannot open shared object file: No such file or directory clang-15: error: linker command failed with exit code 1 (use -v to see invocation) make[1]: *** [arch/powerpc/kernel/vdso/Makefile:67: arch/powerpc/kernel/vdso/vdso64.so.dbg] Error 1 Looks like LLD isn't being invoked correctly to link the vdso. Probably need to revisit https://lore.kernel.org/lkml/20200901222523.1941988-1-ndesaulni...@google.com/ How were you working around this issue? Perhaps you built clang to default to LLD? (there's a cmake option for that) What option is that? I only add -DLLVM_ENABLE_LLD=ON which (I think) tells cmake to use lld to link the LLVM being built but does not seem to tell what the built clang should do. Without -DLLVM_ENABLE_LLD=ON, building just fails: [fstn1-p1 ~/pbuild/llvm/llvm-lto-latest-cleanbuild]$ ninja -j 100 [619/3501] Linking CXX executable bin/not FAILED: bin/not : && /usr/bin/clang++ -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -flto -O3 -DNDEBUG -flto -Wl,-rpath-link,/home/aik/pbuild/llvm/llvm-lto-latest-cleanbuild/./lib -Wl,--gc-sections utils/not/CMakeFiles/not.dir/not.cpp.o -o bin/not -Wl,-rpath,"\$ORIGIN/../lib" -lpthread lib/libLLVMSupport.a -lrt -ldl -lpthread -lm /usr/lib/powerpc64le-linux-gnu/libz.so /usr/lib/powerpc64le-linux-gnu/libtinfo.so lib/libLLVMDemangle.a && : /usr/bin/ld: lib/libLLVMSupport.a: error adding symbols: archive has no index; run ranlib to add one clang: error: linker command failed with exit code 1 (use -v to see invocation) [701/3501] Building CXX object utils/TableGen/CMakeFiles/llvm-tblgen.dir/GlobalISelEmitter.cpp.o ninja: build stopped: subcommand failed. My head hurts :( The above example is running on PPC. Now I am trying x86 box: [2693/3505] Linking CXX shared library lib/libLTO.so.15git FAILED: lib/libLTO.so.15git : && /usr/bin/clang++ -fPIC -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -flto -O3 -DNDEBUG -Wl,-z,defs -Wl,-z,nodelete -fuse-ld=ld -flto -Wl,-rpath-link,/home/aik/llvm-build/./lib -Wl,--gc-sections -Wl,--version-script,"/home/aik/llvm-build/tools/lto/LTO.exports" -shared -Wl,-soname,libLTO.so.15git -o lib/libLTO.so.15git tools/lto/CMakeFiles/LTO.dir/LTODisassembler.cpp.o tools/lto/CMakeFiles/LTO.dir/lto.cpp.o -Wl,-rpath,"\$ORIGIN/../lib" lib/libLLVMPowerPCAsmParser.a lib/libLLVMPowerPCCodeGen.a lib/libLLVMPowerPCDesc.a lib/libLLVMPowerPCDisassembler.a lib/libLLVMPowerPCInfo.a lib/libLLVMBitReader.a lib/libLLVMCore.a lib/libLLVMCodeGen.a lib/libLLVMLTO.a lib/libLLVMMC.a lib/libLLVMMCDisassembler.a lib/libLLVMSupport.a lib/libLLVMTarget.a lib/libLLVMAsmPrinter.a lib/libLLVMGlobalISel.a lib/libLLVMSelectionDAG.a lib/libLLVMCodeGen.a lib/libLLVMExtensions.a lib/libLLVMPasses.a lib/libLLVMTarget.a
Re: [PATCH v4 00/14] kbuild: yet another series of cleanups (modpost, LTO, MODULE_REL_CRCS, export.h)
On Mon, May 9, 2022 at 4:09 AM Masahiro Yamada wrote: > > This is the third batch of cleanups in this development cycle. > > Major changes in v4: > - Move static EXPORT_SYMBOL check to a script > - Some refactoring > > Major changes in v3: > > - Generate symbol CRCs as C code, and remove CONFIG_MODULE_REL_CRCS. > > Major changes in v2: > > - V1 did not work with CONFIG_MODULE_REL_CRCS. >I fixed this for v2. > > - Reflect some review comments in v1 > > - Refactor the code more > > - Avoid too long argument error This series is available at git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git lto-cleanup-v4 > > > Masahiro Yamada (14): > modpost: remove left-over cross_compile declaration > modpost: change the license of EXPORT_SYMBOL to bool type > modpost: split the section mismatch checks into section-check.c > modpost: add sym_find_with_module() helper > modpost: extract symbol versions from *.cmd files > kbuild: link symbol CRCs at final link, removing > CONFIG_MODULE_REL_CRCS > kbuild: stop merging *.symversions > genksyms: adjust the output format to modpost > kbuild: do not create *.prelink.o for Clang LTO or IBT > kbuild: check static EXPORT_SYMBOL* by script instead of modpost > kbuild: make built-in.a rule robust against too long argument error > kbuild: make *.mod rule robust against too long argument error > kbuild: add cmd_and_savecmd macro > kbuild: rebuild multi-object modules when objtool is updated > > arch/powerpc/Kconfig|1 - > arch/s390/Kconfig |1 - > arch/um/Kconfig |1 - > include/asm-generic/export.h| 22 +- > include/linux/export-internal.h | 16 + > include/linux/export.h | 30 +- > init/Kconfig|4 - > kernel/module.c | 10 +- > scripts/Kbuild.include | 10 +- > scripts/Makefile.build | 134 +-- > scripts/Makefile.lib|7 - > scripts/Makefile.modfinal |5 +- > scripts/Makefile.modpost|9 +- > scripts/check-local-export | 48 + > scripts/genksyms/genksyms.c | 18 +- > scripts/link-vmlinux.sh | 33 +- > scripts/mod/Makefile|2 +- > scripts/mod/modpost.c | 1499 --- > scripts/mod/modpost.h | 35 +- > scripts/mod/section-check.c | 1222 + > 20 files changed, 1551 insertions(+), 1556 deletions(-) > create mode 100644 include/linux/export-internal.h > create mode 100755 scripts/check-local-export > create mode 100644 scripts/mod/section-check.c > > -- > 2.32.0 > > -- > You received this message because you are subscribed to the Google Groups > "Clang Built Linux" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clang-built-linux+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/clang-built-linux/20220508190631.2386038-1-masahiroy%40kernel.org. -- Best Regards Masahiro Yamada
Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte
On Sun, May 08, 2022 at 09:09:55PM +0800, Baolin Wang wrote: > > > On 5/8/2022 7:09 PM, Muchun Song wrote: > > On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote: > > > It is incorrect to use ptep_clear_flush() to nuke a hugetlb page > > > table when unmapping or migrating a hugetlb page, and will change > > > to use huge_ptep_clear_flush() instead in the following patches. > > > > > > So this is a preparation patch, which changes the huge_ptep_clear_flush() > > > to return the original pte to help to nuke a hugetlb page table. > > > > > > Signed-off-by: Baolin Wang > > > Acked-by: Mike Kravetz > > > > Reviewed-by: Muchun Song > > Thanks for reviewing. > > > > > But one nit below: > > > > [...] > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > index 8605d7e..61a21af 100644 > > > --- a/mm/hugetlb.c > > > +++ b/mm/hugetlb.c > > > @@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, > > > struct vm_area_struct *vma, > > > ClearHPageRestoreReserve(new_page); > > > /* Break COW or unshare */ > > > - huge_ptep_clear_flush(vma, haddr, ptep); > > > + (void)huge_ptep_clear_flush(vma, haddr, ptep); > > > > Why add a "(void)" here? Is there any warning if no "(void)"? > > IIUC, I think we can remove this, right? > > I did not meet any warning without the casting, but this is per Mike's > comment[1] to make the code consistent with other functions casting to void > type explicitly in hugetlb.c file. > Got it. I see hugetlb.c per this rule, while others do not. > [1] > https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/ >
Re: request_module DoS
On Sat, May 07, 2022 at 12:14:47PM -0700, Luis Chamberlain wrote: > On Sat, May 07, 2022 at 01:02:20AM -0700, Luis Chamberlain wrote: > > You can try to reproduce by using adding a new test type for crypto-aegis256 > > on lib/test_kmod.c. These tests however can try something similar but other > > modules. > > > > /tools/testing/selftests/kmod/kmod.sh -t 0008 > > /tools/testing/selftests/kmod/kmod.sh -t 0009 > > > > I can't decipher this yet. > > Without testing it... but something like this might be an easier > reproducer: > > + config_set_driver crypto-aegis256 If the module is not present though nothing really happens, and so is it possible this is another issue? Below a bogus module request. diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh index afd42387e8b2..a747ad549940 100755 --- a/tools/testing/selftests/kmod/kmod.sh +++ b/tools/testing/selftests/kmod/kmod.sh @@ -65,6 +66,7 @@ ALL_TESTS="$ALL_TESTS 0010:1:1" ALL_TESTS="$ALL_TESTS 0011:1:1" ALL_TESTS="$ALL_TESTS 0012:1:1" ALL_TESTS="$ALL_TESTS 0013:1:1" +ALL_TESTS="$ALL_TESTS 0014:150:1" # Kselftest framework requirement - SKIP code is 4. ksft_skip=4 @@ -504,6 +506,17 @@ kmod_test_0013() "cat /sys/module/${DEFAULT_KMOD_DRIVER}/sections/.*text | head -n1" } +kmod_test_0014() +{ + kmod_defaults_driver + MODPROBE_LIMIT=$(config_get_modprobe_limit) + let EXTRA=$MODPROBE_LIMIT/6 + config_set_driver bogus_module_does_not_exist + config_num_thread_limit_extra $EXTRA + config_trigger ${FUNCNAME[0]} + config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND +} + list_tests() { echo "Test ID list:" @@ -525,6 +538,7 @@ list_tests() echo "0011 x $(get_test_count 0011) - test completely disabling module autoloading" echo "0012 x $(get_test_count 0012) - test /proc/modules address visibility under CAP_SYSLOG" echo "0013 x $(get_test_count 0013) - test /sys/module/*/sections/* visibility under CAP_SYSLOG" + echo "0014 x $(get_test_count 0014) - multithreaded - push kmod_concurrent over max_modprobes for request_module() for a missing module" } usage()
Re: [PATCH v2 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
On Sun, May 08, 2022 at 05:36:40PM +0800, Baolin Wang wrote: > On some architectures (like ARM64), it can support CONT-PTE/PMD size > hugetlb, which means it can support not only PMD/PUD size hugetlb: > 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page > size specified. > > When migrating a hugetlb page, we will get the relevant page table > entry by huge_pte_offset() only once to nuke it and remap it with > a migration pte entry. This is correct for PMD or PUD size hugetlb, > since they always contain only one pmd entry or pud entry in the > page table. > > However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, > since they can contain several continuous pte or pmd entry with > same page table attributes. So we will nuke or remap only one pte > or pmd entry for this CONT-PTE/PMD size hugetlb page, which is > not expected for hugetlb migration. The problem is we can still > continue to modify the subpages' data of a hugetlb page during > migrating a hugetlb page, which can cause a serious data consistent > issue, since we did not nuke the page table entry and set a > migration pte for the subpages of a hugetlb page. > > To fix this issue, we should change to use huge_ptep_clear_flush() > to nuke a hugetlb page table, and remap it with set_huge_pte_at() > and set_huge_swap_pte_at() when migrating a hugetlb page, which > already considered the CONT-PTE or CONT-PMD size hugetlb. > > Signed-off-by: Baolin Wang This looks fine to me. Reviewed-by: Muchun Song Thanks.
Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte
On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote: > It is incorrect to use ptep_clear_flush() to nuke a hugetlb page > table when unmapping or migrating a hugetlb page, and will change > to use huge_ptep_clear_flush() instead in the following patches. > > So this is a preparation patch, which changes the huge_ptep_clear_flush() > to return the original pte to help to nuke a hugetlb page table. > > Signed-off-by: Baolin Wang > Acked-by: Mike Kravetz Reviewed-by: Muchun Song But one nit below: [...] > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 8605d7e..61a21af 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, > struct vm_area_struct *vma, > ClearHPageRestoreReserve(new_page); > > /* Break COW or unshare */ > - huge_ptep_clear_flush(vma, haddr, ptep); > + (void)huge_ptep_clear_flush(vma, haddr, ptep); Why add a "(void)" here? Is there any warning if no "(void)"? IIUC, I think we can remove this, right? > mmu_notifier_invalidate_range(mm, range.start, range.end); > page_remove_rmap(old_page, vma, true); > hugepage_add_new_anon_rmap(new_page, vma, haddr); > -- > 1.8.3.1 > >
Re: [PATCH] ASoC: fsl_sai: fix incorrect mclk number in error message
On Sat, May 7, 2022 at 8:31 PM Pieterjan Camerlynck < pieterjan.camerly...@gmail.com> wrote: > In commit ("ASoC: fsl_sai: add sai master mode support") > the loop was changed to start iterating from 1 instead of 0. The error > message however was not updated, reporting the wrong clock to the user. > > Signed-off-by: Pieterjan Camerlynck > Acked-by: Shengjiu Wang Best Regards Wang Shengjiu > --- > sound/soc/fsl/fsl_sai.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c > index ffc24afb5a7a..f0602077b385 100644 > --- a/sound/soc/fsl/fsl_sai.c > +++ b/sound/soc/fsl/fsl_sai.c > @@ -1054,7 +1054,7 @@ static int fsl_sai_probe(struct platform_device > *pdev) > sai->mclk_clk[i] = devm_clk_get(>dev, tmp); > if (IS_ERR(sai->mclk_clk[i])) { > dev_err(>dev, "failed to get mclk%d clock: > %ld\n", > - i + 1, PTR_ERR(sai->mclk_clk[i])); > + i, PTR_ERR(sai->mclk_clk[i])); > sai->mclk_clk[i] = NULL; > } > } > -- > 2.25.1 > >
[powerpc:fixes-test] BUILD SUCCESS 348c71344111d7a48892e3e52264ff11956fc196
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes-test branch HEAD: 348c71344111d7a48892e3e52264ff11956fc196 powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE elapsed time: 739m configs tested: 153 configs skipped: 100 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm64 defconfig arm64allyesconfig arm allmodconfig arm defconfig arm allyesconfig i386 randconfig-c001 arm imxrt_defconfig arm footbridge_defconfig m68k alldefconfig arm ezx_defconfig powerpc64 defconfig h8300alldefconfig powerpc motionpro_defconfig armshmobile_defconfig sh sh7770_generic_defconfig arm lubbock_defconfig mipsvocore2_defconfig ia64 bigsur_defconfig mips decstation_64_defconfig h8300 h8s-sim_defconfig sh kfr2r09_defconfig sh se7712_defconfig powerpc redwood_defconfig powerpc mpc837x_rdb_defconfig powerpc64alldefconfig arcnsimosci_defconfig um x86_64_defconfig armspear6xx_defconfig powerpc cm5200_defconfig arm iop32x_defconfig armtrizeps4_defconfig sparc sparc64_defconfig ia64 alldefconfig mips bmips_be_defconfig powerpc rainier_defconfig sparc defconfig sh se7750_defconfig sh defconfig mips decstation_r4k_defconfig shecovec24-romimage_defconfig arm lpd270_defconfig sh se7721_defconfig alphaallyesconfig shapsh4ad0a_defconfig arm aspeed_g5_defconfig sh se7343_defconfig powerpc eiger_defconfig arm sunxi_defconfig powerpc mgcoge_defconfig sh se7751_defconfig mips xway_defconfig sh se7619_defconfig arm s3c6400_defconfig arc alldefconfig xtensasmp_lx200_defconfig powerpc pq2fads_defconfig m68k allmodconfig sh sdk7780_defconfig powerpcsam440ep_defconfig sh se7722_defconfig openriscdefconfig arm at91_dt_defconfig arm viper_defconfig x86_64randconfig-c001 arm randconfig-c002-20220508 ia64defconfig m68k allyesconfig m68kdefconfig nios2 defconfig arc allyesconfig cskydefconfig nios2allyesconfig alpha defconfig h8300allyesconfig xtensa allyesconfig arc defconfig sh allmodconfig s390defconfig s390 allmodconfig parisc defconfig parisc64defconfig parisc allyesconfig s390 allyesconfig i386 allyesconfig sparcallyesconfig i386defconfig i386 debian-10.3-kselftests i386 debian-10.3 mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allnoconfig powerpc allmodconfig x86_64randconfig-a006 x86_64randconfig-a004 x86_64randconfig-a002 i386 randconfig-a012 i386 randconfig-a014 i386 randconfig-a016 riscv
[PATCH v4 14/14] kbuild: rebuild multi-object modules when objtool is updated
When CONFIG_LTO_CLANG or CONFIG_X86_KERNEL_IBT is enabled, objtool for multi-object modules is postponed until the objects are linked together. Make sure to re-run objtool and re-link multi-object modules when objtool is updated. Signed-off-by: Masahiro Yamada Reviewed-by: Kees Cook Acked-by: Josh Poimboeuf --- Changes in v4: - New Resent of my previous submission https://lore.kernel.org/linux-kbuild/20210831074004.3195284-11-masahi...@kernel.org/ scripts/Makefile.build | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index f546b5f1f33f..4e6902e099e8 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -404,13 +404,18 @@ $(obj)/modules.order: $(obj-m) FORCE $(obj)/lib.a: $(lib-y) FORCE $(call if_changed,ar) -quiet_cmd_link_multi-m = LD [M] $@ - cmd_link_multi-m = $(LD) $(ld_flags) -r -o $@ @$(patsubst %.o,%.mod,$@) $(cmd_objtool) +quiet_cmd_ld_multi_m = LD [M] $@ + cmd_ld_multi_m = $(LD) $(ld_flags) -r -o $@ @$(patsubst %.o,%.mod,$@) $(cmd_objtool) + +define rule_ld_multi_m + $(call cmd_and_savecmd,ld_multi_m) + $(call cmd,gen_objtooldep) +endef $(multi-obj-m): objtool-enabled := $(delay-objtool) $(multi-obj-m): part-of-module := y $(multi-obj-m): %.o: %.mod FORCE - $(call if_changed,link_multi-m) + $(call if_changed_rule,ld_multi_m) $(call multi_depend, $(multi-obj-m), .o, -objs -y -m) targets := $(filter-out $(PHONY), $(targets)) -- 2.32.0
[PATCH v4 13/14] kbuild: add cmd_and_savecmd macro
Separate out the command execution part of if_changed, as we did for if_changed_dep. This allows us to reuse it in if_changed_rule. define rule_foo $(call cmd_and_savecmd,foo) $(call cmd,bar) endef Signed-off-by: Masahiro Yamada Reviewed-by: Kees Cook --- Changes in v4: - New. Resent of my previous submission. https://lore.kernel.org/all/20210831074004.3195284-10-masahi...@kernel.org/ scripts/Kbuild.include | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include index 455a0a6ce12d..ece44b735061 100644 --- a/scripts/Kbuild.include +++ b/scripts/Kbuild.include @@ -142,9 +142,11 @@ check-FORCE = $(if $(filter FORCE, $^),,$(warning FORCE prerequisite is missing) if-changed-cond = $(newer-prereqs)$(cmd-check)$(check-FORCE) # Execute command if command has changed or prerequisite(s) are updated. -if_changed = $(if $(if-changed-cond),\ +if_changed = $(if $(if-changed-cond),$(cmd_and_savecmd),@:) + +cmd_and_savecmd =\ $(cmd); \ - printf '%s\n' 'cmd_$@ := $(make-cmd)' > $(dot-target).cmd, @:) + printf '%s\n' 'cmd_$@ := $(make-cmd)' > $(dot-target).cmd # Execute the command and also postprocess generated .d dependencies file. if_changed_dep = $(if $(if-changed-cond),$(cmd_and_fixdep),@:) -- 2.32.0
[PATCH v4 07/14] kbuild: stop merging *.symversions
Now modpost reads symbol versions from .*.cmd files. The merged *.symversions are no longer needed. Signed-off-by: Masahiro Yamada Reviewed-by: Nicolas Schier Tested-by: Nathan Chancellor --- (no changes since v1) scripts/Makefile.build | 21 ++--- scripts/link-vmlinux.sh | 15 --- 2 files changed, 2 insertions(+), 34 deletions(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index ddd9080fc028..dff9220135c4 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -390,17 +390,6 @@ $(obj)/%.asn1.c $(obj)/%.asn1.h: $(src)/%.asn1 $(objtree)/scripts/asn1_compiler $(subdir-builtin): $(obj)/%/built-in.a: $(obj)/% ; $(subdir-modorder): $(obj)/%/modules.order: $(obj)/% ; -# combine symversions for later processing -ifeq ($(CONFIG_LTO_CLANG) $(CONFIG_MODVERSIONS),y y) - cmd_update_lto_symversions = \ - rm -f $@.symversions\ - $(foreach n, $(filter-out FORCE,$^),\ - $(if $(shell test -s $(n).symversions && echo y), \ - ; cat $(n).symversions >> $@.symversions)) -else - cmd_update_lto_symversions = echo >/dev/null -endif - # # Rule to compile a set of .o files into one .a file (without symbol table) # @@ -408,11 +397,8 @@ endif quiet_cmd_ar_builtin = AR $@ cmd_ar_builtin = rm -f $@; $(AR) cDPrST $@ $(real-prereqs) -quiet_cmd_ar_and_symver = AR $@ - cmd_ar_and_symver = $(cmd_update_lto_symversions); $(cmd_ar_builtin) - $(obj)/built-in.a: $(real-obj-y) FORCE - $(call if_changed,ar_and_symver) + $(call if_changed,ar_builtin) # # Rule to create modules.order file @@ -432,16 +418,13 @@ $(obj)/modules.order: $(obj-m) FORCE # # Rule to compile a set of .o files into one .a file (with symbol table) # -quiet_cmd_ar_lib = AR $@ - cmd_ar_lib = $(cmd_update_lto_symversions); $(cmd_ar) $(obj)/lib.a: $(lib-y) FORCE - $(call if_changed,ar_lib) + $(call if_changed,ar) ifneq ($(CONFIG_LTO_CLANG)$(CONFIG_X86_KERNEL_IBT),) quiet_cmd_link_multi-m = AR [M] $@ cmd_link_multi-m = \ - $(cmd_update_lto_symversions); \ rm -f $@; \ $(AR) cDPrsT $@ @$(patsubst %.o,%.mod,$@) else diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 6aee2401f3ad..bc94252e920c 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -56,20 +56,6 @@ gen_initcalls() > .tmp_initcalls.lds } -# If CONFIG_LTO_CLANG is selected, collect generated symbol versions into -# .tmp_symversions.lds -gen_symversions() -{ - info GEN .tmp_symversions.lds - rm -f .tmp_symversions.lds - - for o in ${KBUILD_VMLINUX_OBJS} ${KBUILD_VMLINUX_LIBS}; do - if [ -f ${o}.symversions ]; then - cat ${o}.symversions >> .tmp_symversions.lds - fi - done -} - # Link of vmlinux.o used for section mismatch analysis # ${1} output file modpost_link() @@ -303,7 +289,6 @@ cleanup() rm -f .btf.* rm -f .tmp_System.map rm -f .tmp_initcalls.lds - rm -f .tmp_symversions.lds rm -f .tmp_vmlinux* rm -f System.map rm -f vmlinux -- 2.32.0
[PATCH v4 03/14] modpost: split the section mismatch checks into section-check.c
modpost.c is too big, and the half of the code is for section checks. Split it. I fixed some style issues in the moved code. Signed-off-by: Masahiro Yamada --- Changes in v4: - New patch scripts/mod/Makefile|2 +- scripts/mod/modpost.c | 1202 +- scripts/mod/modpost.h | 34 +- scripts/mod/section-check.c | 1222 +++ 4 files changed, 1240 insertions(+), 1220 deletions(-) create mode 100644 scripts/mod/section-check.c diff --git a/scripts/mod/Makefile b/scripts/mod/Makefile index c9e38ad937fd..ca739c6c68a1 100644 --- a/scripts/mod/Makefile +++ b/scripts/mod/Makefile @@ -5,7 +5,7 @@ CFLAGS_REMOVE_empty.o += $(CC_FLAGS_LTO) hostprogs-always-y += modpost mk_elfconfig always-y += empty.o -modpost-objs := modpost.o file2alias.o sumversion.o +modpost-objs := modpost.o section-check.o file2alias.o sumversion.o devicetable-offsets-file := devicetable-offsets.h diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index a78b75f0eeb0..e7e2c70a98f5 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -31,7 +31,7 @@ static bool external_module; /* Only warn about unresolved symbols */ static bool warn_unresolved; -static int sec_mismatch_count; +int sec_mismatch_count; static bool sec_mismatch_warn_only = true; /* ignore missing files */ static bool ignore_missing_files; @@ -310,8 +310,8 @@ static void add_namespace(struct list_head *head, const char *namespace) } } -static void *sym_get_data_by_offset(const struct elf_info *info, - unsigned int secindex, unsigned long offset) +void *sym_get_data_by_offset(const struct elf_info *info, +unsigned int secindex, unsigned long offset) { Elf_Shdr *sechdr = >sechdrs[secindex]; @@ -327,19 +327,17 @@ static void *sym_get_data(const struct elf_info *info, const Elf_Sym *sym) sym->st_value); } -static const char *sech_name(const struct elf_info *info, Elf_Shdr *sechdr) +const char *sech_name(const struct elf_info *info, Elf_Shdr *sechdr) { return sym_get_data_by_offset(info, info->secindex_strings, sechdr->sh_name); } -static const char *sec_name(const struct elf_info *info, int secindex) +const char *sec_name(const struct elf_info *info, int secindex) { return sech_name(info, >sechdrs[secindex]); } -#define strstarts(str, prefix) (strncmp(str, prefix, strlen(prefix)) == 0) - static void sym_update_namespace(const char *symname, const char *namespace) { struct symbol *s = find_symbol(symname); @@ -741,1196 +739,6 @@ static char *get_modinfo(struct elf_info *info, const char *tag) return get_next_modinfo(info, tag, NULL); } -/** - * Test if string s ends in string sub - * return 0 if match - **/ -static int strrcmp(const char *s, const char *sub) -{ - int slen, sublen; - - if (!s || !sub) - return 1; - - slen = strlen(s); - sublen = strlen(sub); - - if ((slen == 0) || (sublen == 0)) - return 1; - - if (sublen > slen) - return 1; - - return memcmp(s + slen - sublen, sub, sublen); -} - -static const char *sym_name(struct elf_info *elf, Elf_Sym *sym) -{ - if (sym) - return elf->strtab + sym->st_name; - else - return "(unknown)"; -} - -/* The pattern is an array of simple patterns. - * "foo" will match an exact string equal to "foo" - * "*foo" will match a string that ends with "foo" - * "foo*" will match a string that begins with "foo" - * "*foo*" will match a string that contains "foo" - */ -static int match(const char *sym, const char * const pat[]) -{ - const char *p; - while (*pat) { - const char *endp; - - p = *pat++; - endp = p + strlen(p) - 1; - - /* "*foo*" */ - if (*p == '*' && *endp == '*') { - char *bare = NOFAIL(strndup(p + 1, strlen(p) - 2)); - char *here = strstr(sym, bare); - - free(bare); - if (here != NULL) - return 1; - } - /* "*foo" */ - else if (*p == '*') { - if (strrcmp(sym, p + 1) == 0) - return 1; - } - /* "foo*" */ - else if (*endp == '*') { - if (strncmp(sym, p, strlen(p) - 1) == 0) - return 1; - } - /* no wildcards */ - else { - if (strcmp(p, sym) == 0) - return 1; - } - } - /* no match */ - return 0; -} - -/* sections that we do not want to do full section mismatch check
[PATCH v4 11/14] kbuild: make built-in.a rule robust against too long argument error
Kbuild runs at the top of objtree instead of changing the working directory to subdirectories. I think this design is nice overall but some commands have a scalability issue. The build command of built-in.a is one of them whose length scales with: O(D * N) Here, D is the length of the directory path (i.e. $(obj)/ prefix), N is the number of objects in the Makefile, O() is the big O notation. The deeper directory the Makefile directory is located, the more easily it will hit the too long argument error. We can make it better. Trim the $(obj)/ by Make's builtin function, and restore it by a shell command (sed). With this, the command length scales with: O(D + N) In-tree modules still have some room to the limit (ARG_MAX=2097152), but this is more future-proof for big modules in a deep directory. For example, you can build i915 as builtin (CONFIG_DRM_I915=y) and compare drivers/gpu/drm/i915/.built-in.a.cmd with/without this commit. Signed-off-by: Masahiro Yamada Reviewed-by: Nicolas Schier Tested-by: Nathan Chancellor --- (no changes since v2) Changes in v2: - New patch scripts/Makefile.build | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index c2a173b3fd60..8f1a355df7aa 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -374,7 +374,10 @@ $(subdir-modorder): $(obj)/%/modules.order: $(obj)/% ; # quiet_cmd_ar_builtin = AR $@ - cmd_ar_builtin = rm -f $@; $(AR) cDPrST $@ $(real-prereqs) + cmd_ar_builtin = rm -f $@; \ + echo $(patsubst $(obj)/%,%,$(real-prereqs)) | \ + sed -E 's:([^ ]+):$(obj)/\1:g' | \ + xargs $(AR) cDPrST $@ $(obj)/built-in.a: $(real-obj-y) FORCE $(call if_changed,ar_builtin) -- 2.32.0
[PATCH v4 10/14] kbuild: check static EXPORT_SYMBOL* by script instead of modpost
The 'static' specifier and EXPORT_SYMBOL() are an odd combination. Commit 15bfc2348d54 ("modpost: check for static EXPORT_SYMBOL* functions") tried to detect it, but this check has false negatives. Here is the sample code. Makefile: obj-y += foo1.o foo2.o foo1.c: #include static void foo(void) {} EXPORT_SYMBOL(foo); foo2.c: void foo(void) {} foo1.c exports the static symbol 'foo', but modpost cannot catch it because it is fooled by foo2.c, which has a global symbol with the same name. s->is_static is cleared if a global symbol with the same name is found somewhere, but EXPORT_SYMBOL() and the global symbol do not necessarily belong to the same compilation unit. This check should be done per compilation unit, but I do not know how to do it in modpost. modpost runs against vmlinux.o or modules, which merges multiple objects, then forgets their origin. It is true modpost gets access to the lists of all the member objects (.vmlinux.objs and *.mod), but it is impossible to parse individual objects in modpost; they might be LLVM IR instead of ELF when CONFIG_LTO_CLANG=y. Add a simple bash script to parse the output from ${NM}. This works for CONFIG_LTO_CLANG=y because llvm-nm can dump symbols of LLVM bitcode. Revert 15bfc2348d54. Signed-off-by: Masahiro Yamada --- Changes in v4: - New patch scripts/Makefile.build | 4 scripts/check-local-export | 48 ++ scripts/mod/modpost.c | 28 +- 3 files changed, 53 insertions(+), 27 deletions(-) create mode 100755 scripts/check-local-export diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 838ea5e83174..c2a173b3fd60 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -244,9 +244,12 @@ cmd_gen_ksymdeps = \ $(CONFIG_SHELL) $(srctree)/scripts/gen_ksymdeps.sh $@ >> $(dot-target).cmd endif +cmd_check_local_export = $(srctree)/scripts/check-local-export $@ + define rule_cc_o_c $(call cmd_and_fixdep,cc_o_c) $(call cmd,gen_ksymdeps) + $(call cmd,check_local_export) $(call cmd,checksrc) $(call cmd,checkdoc) $(call cmd,gen_objtooldep) @@ -257,6 +260,7 @@ endef define rule_as_o_S $(call cmd_and_fixdep,as_o_S) $(call cmd,gen_ksymdeps) + $(call cmd,check_local_export) $(call cmd,gen_objtooldep) $(call cmd,gen_symversions_S) endef diff --git a/scripts/check-local-export b/scripts/check-local-export new file mode 100755 index ..d1721fa63057 --- /dev/null +++ b/scripts/check-local-export @@ -0,0 +1,48 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0-only +# +# Copyright (C) 2022 Masahiro Yamada + +set -e +set -o pipefail + +declare -A symbol_types +declare -a export_symbols + +exit_code=0 + +while read value type name +do + # to avoid error for clang LTO; $name may be empty + if [[ $value = -* && -z $name ]]; then + continue + fi + + # The first field (value) may be empty. If so, fix it up. + if [[ -z $name ]]; then + name=${type} + type=${value} + fi + + # save (name, type) in the associative array + symbol_types[$name]=$type + + # append the exported symbol to the array + if [[ $name == __ksymtab_* ]]; then + export_symbols+=(${name#__ksymtab_}) + fi +done < <(${NM} ${1} 2>/dev/null) + +# Catch error in the process substitution +wait $! + +for name in "${export_symbols[@]}" +do + # nm(3) says "If lowercase, the symbol is usually local" + if [[ ${symbol_types[$name]} =~ [a-z] ]]; then + echo "$@: error: local symbol '${name}' was exported" >&2 + exit_code=1 + fi +done + +exit ${exit_code} diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 018527d96680..fa73ddb6a6cf 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -212,7 +212,6 @@ struct symbol { unsigned int crc; bool crc_valid; bool weak; - bool is_static; /* true if symbol is not global */ bool is_gpl_only; /* exported by EXPORT_SYMBOL_GPL */ char name[]; }; @@ -242,7 +241,7 @@ static struct symbol *alloc_symbol(const char *name) memset(s, 0, sizeof(*s)); strcpy(s->name, name); - s->is_static = true; + return s; } @@ -875,20 +874,6 @@ static void read_symbols(const char *modname) sym_get_data(, sym)); } - // check for static EXPORT_SYMBOL_* functions && global vars - for (sym = info.symtab_start; sym < info.symtab_stop; sym++) { - unsigned char bind = ELF_ST_BIND(sym->st_info); - - if (bind == STB_GLOBAL || bind == STB_WEAK) { - struct symbol *s = - find_symbol(remove_dot(info.strtab + -
[PATCH v4 12/14] kbuild: make *.mod rule robust against too long argument error
Like built-in.a, the command length of the *.mod rule scales with the depth of the directory times the number of objects in the Makefile. Add $(obj)/ by the shell command (awk) instead of by Make's builtin function. In-tree modules still have some room to the limit (ARG_MAX=2097152), but this is more future-proof for big modules in a deep directory. For example, you can build i915 as a module (CONFIG_DRM_I915=m) and compare drivers/gpu/drm/i915/.i915.mod.cmd with/without this commit. The issue is more critical for external modules because the M= path can be very long as Jeff Johnson reported before [1]. [1] https://lore.kernel.org/linux-kbuild/4c02050c4e95e4cb8cc04282695f8...@codeaurora.org/ Signed-off-by: Masahiro Yamada Reviewed-by: Nicolas Schier Tested-by: Nathan Chancellor --- (no changes since v2) Changes in v2: - New patch scripts/Makefile.build | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 8f1a355df7aa..f546b5f1f33f 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -270,8 +270,8 @@ $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE $(call if_changed_rule,cc_o_c) $(call cmd,force_checksrc) -cmd_mod = echo $(addprefix $(obj)/, $(call real-search, $*.o, .o, -objs -y -m)) | \ - $(AWK) -v RS='( |\n)' '!x[$$0]++' > $@ +cmd_mod = echo $(call real-search, $*.o, .o, -objs -y -m) | \ + $(AWK) -v RS='( |\n)' '!x[$$0]++ { print("$(obj)/"$$0) }' > $@ $(obj)/%.mod: FORCE $(call if_changed,mod) -- 2.32.0
[PATCH v4 06/14] kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS
include/{linux,asm-generic}/export.h defines a weak symbol, __crc_* as a placeholder. Genksyms writes the version CRCs into the linker script, which will be used for filling the __crc_* symbols. The linker script format depends on CONFIG_MODULE_REL_CRCS. If it is enabled, __crc_* holds the offset to the reference of CRC. It is time to get rid of this complexity. Now that modpost parses text files (.*.cmd) to collect all the CRCs, it can generate C code that will be linked to the vmlinux or modules. Generate a new C file, .vmlinux.export.c, which contains the CRCs of symbols exported by vmlinux. It is compiled and linked to vmlinux in scripts/link-vmlinux.sh. Put the CRCs of symbols exported by modules into the existing *.mod.c files. No additional build step is needed for modules. As before, *.mod.c are compiled and linked to *.ko in scripts/Makefile.modfinal. No linker magic is used here. The new C implementation works in the same way, whether CONFIG_RELOCATABLE is enabled or not. CONFIG_MODULE_REL_CRCS is no longer needed. Previously, Kbuild invoked additional $(LD) to update the CRCs in objects, but this step is unneeded too. Signed-off-by: Masahiro Yamada Tested-by: Nathan Chancellor --- Changes in v4: - Rename .vmlinux-symver.c to .vmlinux.export.c because I notice this approach is useful for further cleanups, not only for modversioning but also for overall EXPORT_SYMBOL. Changes in v3: - New patch arch/powerpc/Kconfig| 1 - arch/s390/Kconfig | 1 - arch/um/Kconfig | 1 - include/asm-generic/export.h| 22 -- include/linux/export-internal.h | 16 include/linux/export.h | 30 -- init/Kconfig| 4 kernel/module.c | 10 +- scripts/Makefile.build | 27 --- scripts/genksyms/genksyms.c | 17 - scripts/link-vmlinux.sh | 18 +- scripts/mod/modpost.c | 28 12 files changed, 78 insertions(+), 97 deletions(-) create mode 100644 include/linux/export-internal.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 174edabb74fa..a4e8dd889e29 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -566,7 +566,6 @@ config RELOCATABLE bool "Build a relocatable kernel" depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE)) select NONSTATIC_KERNEL - select MODULE_REL_CRCS if MODVERSIONS help This builds a kernel image that is capable of running at the location the kernel is loaded at. For ppc32, there is no any diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 77b5a03de13a..aa5848004c76 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -567,7 +567,6 @@ endchoice config RELOCATABLE bool "Build a relocatable kernel" - select MODULE_REL_CRCS if MODVERSIONS default y help This builds a kernel image that retains relocation information diff --git a/arch/um/Kconfig b/arch/um/Kconfig index 4d398b80aea8..e8983d098e73 100644 --- a/arch/um/Kconfig +++ b/arch/um/Kconfig @@ -106,7 +106,6 @@ config LD_SCRIPT_DYN bool default y depends on !LD_SCRIPT_STATIC - select MODULE_REL_CRCS if MODVERSIONS config LD_SCRIPT_DYN_RPATH bool "set rpath in the binary" if EXPERT diff --git a/include/asm-generic/export.h b/include/asm-generic/export.h index 07a36a874dca..51ce72ce80fa 100644 --- a/include/asm-generic/export.h +++ b/include/asm-generic/export.h @@ -2,6 +2,14 @@ #ifndef __ASM_GENERIC_EXPORT_H #define __ASM_GENERIC_EXPORT_H +/* + * This comment block is used by fixdep. Please do not remove. + * + * When CONFIG_MODVERSIONS is changed from n to y, all source files having + * EXPORT_SYMBOL variants must be re-compiled because genksyms is run as a + * side effect of the .o build rule. + */ + #ifndef KSYM_FUNC #define KSYM_FUNC(x) x #endif @@ -12,9 +20,6 @@ #else #define KSYM_ALIGN 4 #endif -#ifndef KCRC_ALIGN -#define KCRC_ALIGN 4 -#endif .macro __put, val, name #ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS @@ -43,17 +48,6 @@ __ksymtab_\name: __kstrtab_\name: .asciz "\name" .previous -#ifdef CONFIG_MODVERSIONS - .section ___kcrctab\sec+\name,"a" - .balign KCRC_ALIGN -#if defined(CONFIG_MODULE_REL_CRCS) - .long __crc_\name - . -#else - .long __crc_\name -#endif - .weak __crc_\name - .previous -#endif #endif .endm diff --git a/include/linux/export-internal.h b/include/linux/export-internal.h new file mode 100644 index ..77175d561058 --- /dev/null +++ b/include/linux/export-internal.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Please do not include this explicitly. + * This is used by C files generated by modpost. + */ + +#ifndef __LINUX_EXPORT_INTERNAL_H__
[PATCH v4 08/14] genksyms: adjust the output format to modpost
Make genksyms output symbol versions in the format modpost expects, so the 'sed' is unneeded. This commit makes *.symversions completely unneeded. I will keep *.symversions in .gitignore and 'make clean' for a while. Otherwise, 'git status' might be surprising. Signed-off-by: Masahiro Yamada Reviewed-by: Nicolas Schier Tested-by: Nathan Chancellor --- (no changes since v2) Changes in v2: - New patch scripts/Makefile.build | 6 -- scripts/genksyms/genksyms.c | 3 +-- 2 files changed, 1 insertion(+), 8 deletions(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index dff9220135c4..461998a2ad2b 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -165,16 +165,10 @@ ifdef CONFIG_MODVERSIONS # o modpost will extract versions from that file and create *.c files that will # be compiled and linked to the kernel and/or modules. -genksyms_format := __crc_\(.*\) = \(.*\); - gen_symversions = \ if $(NM) $@ 2>/dev/null | grep -q __ksymtab; then \ $(call cmd_gensymtypes_$(1),$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) \ - > $@.symversions; \ - sed -n 's/$(genksyms_format)/$(pound)SYMVER \1 \2/p' $@.symversions \ >> $(dot-target).cmd; \ - else \ - rm -f $@.symversions; \ fi cmd_gen_symversions_c =$(call gen_symversions,c) diff --git a/scripts/genksyms/genksyms.c b/scripts/genksyms/genksyms.c index 6e6933ae7911..f5dfdb9d80e9 100644 --- a/scripts/genksyms/genksyms.c +++ b/scripts/genksyms/genksyms.c @@ -680,8 +680,7 @@ void export_symbol(const char *name) if (flag_dump_defs) fputs(">\n", debugfile); - /* Used as a linker script. */ - printf("__crc_%s = 0x%08lx;\n", name, crc); + printf("#SYMVER %s 0x%08lx\n", name, crc); } } -- 2.32.0
[PATCH v4 09/14] kbuild: do not create *.prelink.o for Clang LTO or IBT
When CONFIG_LTO_CLANG=y, additional intermediate *.prelink.o is created for each module. Also, objtool is postponed until LLVM bitcode is converted to ELF. CONFIG_X86_KERNEL_IBT works in a similar way to postpone objtool until objects are merged together. This commit stops generating *.prelink.o, so the build flow will look the same with/without LTO. The following figures show how the LTO build currently works, and how this commit is changing it. Current build flow == [1] single-object module $(LD) $(CC) +objtool $(LD) foo.c > foo.o -> foo.prelink.o -> foo.ko (LLVM bitcode)(ELF) | | foo.mod.o --/ [2] multi-object module $(LD) $(CC) $(AR) +objtool $(LD) foo1.c -> foo1.o -> foo.o -> foo.prelink.o -> foo.ko | (archive) (ELF) | foo2.c -> foo2.o --/ | (LLVM bitcode) foo.mod.o --/ One confusion is foo.o in multi-object module is an archive despite of its suffix. New build flow == [1] single-object module Since there is only one object, we do not need to have the LLVM bitcode stage. Use $(CC)+$(LD) to generate an ELF object in one build rule. When LTO is disabled, $(LD) is unneeded because $(CC) produces an ELF object. $(CC)+$(LD)+objtool $(LD) foo.c > foo.o ---> foo.ko (ELF)| | foo.mod.o --/ [2] multi-object module Previously, $(AR) was used to combine LLVM bitcode into an archive, but there was no technical reason to do so. This commit just uses $(LD) to combine and convert them into a single ELF object. $(LD) $(CC) +objtool$(LD) foo1.c ---> foo1.o ---> foo.o ---> foo.ko | (ELF)| foo2.c ---> foo2.o ---/ | (LLVM bitcode) foo.mod.o --/ Signed-off-by: Masahiro Yamada Reviewed-by: Nicolas Schier Tested-by: Nathan Chancellor --- (no changes since v2) Changes in v2: - replace the chain of $(if ...) with $(and ) scripts/Kbuild.include| 4 +++ scripts/Makefile.build| 58 --- scripts/Makefile.lib | 7 - scripts/Makefile.modfinal | 5 ++-- scripts/Makefile.modpost | 9 ++ scripts/mod/modpost.c | 7 - 6 files changed, 25 insertions(+), 65 deletions(-) diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include index 3514c2149e9d..455a0a6ce12d 100644 --- a/scripts/Kbuild.include +++ b/scripts/Kbuild.include @@ -15,6 +15,10 @@ pound := \# # Name of target with a '.' as filename prefix. foo/bar.o => foo/.bar.o dot-target = $(dir $@).$(notdir $@) +### +# Name of target with a '.tmp_' as filename prefix. foo/bar.o => foo/.tmp_bar.o +tmp-target = $(dir $@).tmp_$(notdir $@) + ### # The temporary file to save gcc -MMD generated dependencies must not # contain a comma diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 461998a2ad2b..838ea5e83174 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -88,10 +88,6 @@ endif targets-for-modules := $(foreach x, o mod $(if $(CONFIG_TRIM_UNUSED_KSYMS), usyms), \ $(patsubst %.o, %.$x, $(filter %.o, $(obj-m -ifneq ($(CONFIG_LTO_CLANG)$(CONFIG_X86_KERNEL_IBT),) -targets-for-modules += $(patsubst %.o, %.prelink.o, $(filter %.o, $(obj-m))) -endif - ifdef need-modorder targets-for-modules += $(obj)/modules.order endif @@ -152,8 +148,16 @@ $(obj)/%.ll: $(src)/%.c FORCE # The C file is compiled and updated dependency information is generated. # (See cmd_cc_o_c + relevant part of rule_cc_o_c) +is-single-obj-m = $(and $(part-of-module),$(filter $@, $(obj-m)),y) + +ifdef CONFIG_LTO_CLANG +cmd_ld_single_m = $(if $(is-single-obj-m), ; $(LD) $(ld_flags) -r -o $(tmp-target) $@; mv $(tmp-target) $@) +endif + quiet_cmd_cc_o_c = CC $(quiet_modtag) $@ - cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< $(cmd_objtool) + cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< \ + $(cmd_ld_single_m) \ + $(cmd_objtool) ifdef CONFIG_MODVERSIONS # When module versioning is enabled the following steps are executed: @@ -224,21 +228,16 @@ cmd_gen_objtooldep = $(if $(objtool-enabled), { echo ; echo '$@: $$(wildcard $(o endif # CONFIG_STACK_VALIDATION -ifneq ($(CONFIG_LTO_CLANG)$(CONFIG_X86_KERNEL_IBT),) - -# Skip objtool for LLVM bitcode -$(obj)/%.o: objtool-enabled := -
[PATCH v4 02/14] modpost: change the license of EXPORT_SYMBOL to bool type
There were more EXPORT_SYMBOL types in the past. The following commits removed unused ones. - f1c3d73e973c ("module: remove EXPORT_SYMBOL_GPL_FUTURE") - 367948220fce ("module: remove EXPORT_UNUSED_SYMBOL*") There are 3 remaining in enum export, but export_unknown does not make any sense because we never expect such a situation like "we do not know how it was exported". If the symbol name starts with "__ksymtab_", but the section name does not start with "___ksymtab+" or "___ksymtab_gpl+", it is not an exported symbol. It occurs when a variable starting with "__ksymtab_" is directly defined: int __ksymtab_foo; Presumably, there is no practical issue for using such a weird variable name (but there is no good reason for doing so, either). Anyway, that is not an exported symbol. Setting export_unknown is not the right thing to do. Do not call sym_add_exported() in this case. With pointless export_unknown removed, the export type finally becomes boolean (either EXPORT_SYMBOL or EXPORT_SYMBOL_GPL). I renamed the field name to is_gpl_only. EXPORT_SYMBOL_GPL sets it true. Only GPL-compatible modules can use it. I removed the orphan comment, "How a symbol is exported", which is unrelated to sec_mismatch_count. It is about enum export. See commit bd5cbcedf446 ("kbuild: export-type enhancement to modpost.c") Signed-off-by: Masahiro Yamada Reviewed-by: Nicolas Schier Tested-by: Nathan Chancellor --- Changes in v4: - Rebase again because I dropped https://patchwork.kernel.org/project/linux-kbuild/patch/20220501084032.1025918-11-masahi...@kernel.org/ - Remove warning message because I plan to change this hunk again in a later commit - Remove orphan comment Changes in v3: - New patch scripts/mod/modpost.c | 108 -- 1 file changed, 30 insertions(+), 78 deletions(-) diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index d9efbd5b31a6..a78b75f0eeb0 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -30,7 +30,7 @@ static bool all_versions; static bool external_module; /* Only warn about unresolved symbols */ static bool warn_unresolved; -/* How a symbol is exported */ + static int sec_mismatch_count; static bool sec_mismatch_warn_only = true; /* ignore missing files */ @@ -47,12 +47,6 @@ static bool error_occurred; #define MAX_UNRESOLVED_REPORTS 10 static unsigned int nr_unresolved; -enum export { - export_plain, - export_gpl, - export_unknown -}; - /* In kernel, this size is defined in linux/module.h; * here we use Elf_Addr instead of long for covering cross-compile */ @@ -219,7 +213,7 @@ struct symbol { bool crc_valid; bool weak; bool is_static; /* true if symbol is not global */ - enum export export; /* Type of export */ + bool is_gpl_only; /* exported by EXPORT_SYMBOL_GPL */ char name[]; }; @@ -316,34 +310,6 @@ static void add_namespace(struct list_head *head, const char *namespace) } } -static const struct { - const char *str; - enum export export; -} export_list[] = { - { .str = "EXPORT_SYMBOL",.export = export_plain }, - { .str = "EXPORT_SYMBOL_GPL",.export = export_gpl }, - { .str = "(unknown)",.export = export_unknown }, -}; - - -static const char *export_str(enum export ex) -{ - return export_list[ex].str; -} - -static enum export export_no(const char *s) -{ - int i; - - if (!s) - return export_unknown; - for (i = 0; export_list[i].export != export_unknown; i++) { - if (strcmp(export_list[i].str, s) == 0) - return export_list[i].export; - } - return export_unknown; -} - static void *sym_get_data_by_offset(const struct elf_info *info, unsigned int secindex, unsigned long offset) { @@ -374,18 +340,6 @@ static const char *sec_name(const struct elf_info *info, int secindex) #define strstarts(str, prefix) (strncmp(str, prefix, strlen(prefix)) == 0) -static enum export export_from_secname(struct elf_info *elf, unsigned int sec) -{ - const char *secname = sec_name(elf, sec); - - if (strstarts(secname, "___ksymtab+")) - return export_plain; - else if (strstarts(secname, "___ksymtab_gpl+")) - return export_gpl; - else - return export_unknown; -} - static void sym_update_namespace(const char *symname, const char *namespace) { struct symbol *s = find_symbol(symname); @@ -405,7 +359,7 @@ static void sym_update_namespace(const char *symname, const char *namespace) } static struct symbol *sym_add_exported(const char *name, struct module *mod, - enum export export) + bool gpl_only) { struct symbol *s = find_symbol(name); @@ -417,7 +371,7 @@ static struct symbol
[PATCH v4 05/14] modpost: extract symbol versions from *.cmd files
Currently, CONFIG_MODVERSIONS needs extra link to embed the symbol versions into ELF objects. Then, modpost extracts the version CRCs from them. The following figures show how it currently works, and how I am trying to change it. Current implementation == |--| embed CRC -->| final| $(CC) $(LD) / |-| | link for | -> *.o ---> *.o -->| modpost | | vmlinux | / /| |-- *.mod.c -->| or | / genksyms / |-| | module | *.c --> *.symversions|--| Genksyms outputs the calculated CRCs in the form of linker script (*.symversions), which is used by $(LD) to update the object. If CONFIG_LTO_CLANG=y, the build process is much more complex. Embedding the CRCs is postponed until the LLVM bitcode is converted into ELF, creating another intermediate *.prelink.o. However, this complexity is unneeded. There is no reason why we must embed version CRCs in objects so early. There is final link stage for vmlinux (scripts/link-vmlinux.sh) and modules (scripts/Makefile.modfinal). We can link CRCs at the very last moment. New implementation == |--| --->| final| $(CC) /|-| | link for | -> *.o >| | | vmlinux | /| modpost |--- .vmlinux.export.c -->| or | / genksyms| |--- *.mod.c >| module | *.c --> *.cmd -->|-| |--| Pass the symbol versions to modpost as separate text data, which are available in *.cmd files. This commit changes modpost to extract CRCs from *.cmd files instead of from ELF objects. Signed-off-by: Masahiro Yamada Reviewed-by: Nicolas Schier Tested-by: Nathan Chancellor --- (no changes since v2) Changes in v2: - Simplify the implementation (parse .cmd files after ELF) scripts/mod/modpost.c | 177 ++ 1 file changed, 129 insertions(+), 48 deletions(-) diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index fc5db1f73cf1..54f957952723 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -381,19 +381,10 @@ static struct symbol *sym_add_exported(const char *name, struct module *mod, return s; } -static void sym_set_crc(const char *name, unsigned int crc) +static void sym_set_crc(struct symbol *sym, unsigned int crc) { - struct symbol *s = find_symbol(name); - - /* -* Ignore stand-alone __crc_*, which might be auto-generated symbols -* such as __*_veneer in ARM ELF. -*/ - if (!s) - return; - - s->crc = crc; - s->crc_valid = true; + sym->crc = crc; + sym->crc_valid = true; } static void *grab_file(const char *filename, size_t *size) @@ -616,33 +607,6 @@ static int ignore_undef_symbol(struct elf_info *info, const char *symname) return 0; } -static void handle_modversion(const struct module *mod, - const struct elf_info *info, - const Elf_Sym *sym, const char *symname) -{ - unsigned int crc; - - if (sym->st_shndx == SHN_UNDEF) { - warn("EXPORT symbol \"%s\" [%s%s] version generation failed, symbol will not be versioned.\n" -"Is \"%s\" prototyped in ?\n", -symname, mod->name, mod->is_vmlinux ? "" : ".ko", -symname); - - return; - } - - if (sym->st_shndx == SHN_ABS) { - crc = sym->st_value; - } else { - unsigned int *crcp; - - /* symbol points to the CRC in the ELF object */ - crcp = sym_get_data(info, sym); - crc = TO_NATIVE(*crcp); - } - sym_set_crc(symname, crc); -} - static void handle_symbol(struct module *mod, struct elf_info *info, const Elf_Sym *sym, const char *symname) { @@ -760,6 +724,102 @@ static char *remove_dot(char *s) return s; } +/* + * The CRCs are recorded in .*.cmd files in the form of: + * #SYMVER + */ +static void extract_crcs_for_object(const char *object, struct module *mod) +{ + char cmd_file[PATH_MAX]; + char *buf, *p; + const char *base; + int dirlen, ret; + + base = strrchr(object, '/'); + if (base) { + base++; + dirlen = base - object; + } else { + dirlen = 0; + base = object; + } + + ret = snprintf(cmd_file, sizeof(cmd_file), "%.*s.%s.cmd", +
[PATCH v4 00/14] kbuild: yet another series of cleanups (modpost, LTO, MODULE_REL_CRCS, export.h)
This is the third batch of cleanups in this development cycle. Major changes in v4: - Move static EXPORT_SYMBOL check to a script - Some refactoring Major changes in v3: - Generate symbol CRCs as C code, and remove CONFIG_MODULE_REL_CRCS. Major changes in v2: - V1 did not work with CONFIG_MODULE_REL_CRCS. I fixed this for v2. - Reflect some review comments in v1 - Refactor the code more - Avoid too long argument error Masahiro Yamada (14): modpost: remove left-over cross_compile declaration modpost: change the license of EXPORT_SYMBOL to bool type modpost: split the section mismatch checks into section-check.c modpost: add sym_find_with_module() helper modpost: extract symbol versions from *.cmd files kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS kbuild: stop merging *.symversions genksyms: adjust the output format to modpost kbuild: do not create *.prelink.o for Clang LTO or IBT kbuild: check static EXPORT_SYMBOL* by script instead of modpost kbuild: make built-in.a rule robust against too long argument error kbuild: make *.mod rule robust against too long argument error kbuild: add cmd_and_savecmd macro kbuild: rebuild multi-object modules when objtool is updated arch/powerpc/Kconfig|1 - arch/s390/Kconfig |1 - arch/um/Kconfig |1 - include/asm-generic/export.h| 22 +- include/linux/export-internal.h | 16 + include/linux/export.h | 30 +- init/Kconfig|4 - kernel/module.c | 10 +- scripts/Kbuild.include | 10 +- scripts/Makefile.build | 134 +-- scripts/Makefile.lib|7 - scripts/Makefile.modfinal |5 +- scripts/Makefile.modpost|9 +- scripts/check-local-export | 48 + scripts/genksyms/genksyms.c | 18 +- scripts/link-vmlinux.sh | 33 +- scripts/mod/Makefile|2 +- scripts/mod/modpost.c | 1499 --- scripts/mod/modpost.h | 35 +- scripts/mod/section-check.c | 1222 + 20 files changed, 1551 insertions(+), 1556 deletions(-) create mode 100644 include/linux/export-internal.h create mode 100755 scripts/check-local-export create mode 100644 scripts/mod/section-check.c -- 2.32.0
[PATCH v4 01/14] modpost: remove left-over cross_compile declaration
This is a remnant of commit 6543becf26ff ("mod/file2alias: make modalias generation safe for cross compiling"). Signed-off-by: Masahiro Yamada --- Changes in v4: - New patch scripts/mod/modpost.h | 1 - 1 file changed, 1 deletion(-) diff --git a/scripts/mod/modpost.h b/scripts/mod/modpost.h index cfa127d2bb8f..d9daeff07b83 100644 --- a/scripts/mod/modpost.h +++ b/scripts/mod/modpost.h @@ -174,7 +174,6 @@ static inline unsigned int get_secindex(const struct elf_info *info, } /* file2alias.c */ -extern unsigned int cross_build; void handle_moddevtable(struct module *mod, struct elf_info *info, Elf_Sym *sym, const char *symname); void add_moddevtable(struct buffer *buf, struct module *mod); -- 2.32.0
[PATCH v4 04/14] modpost: add sym_find_with_module() helper
find_symbol() returns the first symbol found in the hash table. This table is global, so it may return a symbol from an unexpected module. There is a case where we want to search for a symbol with a given name in a specified module. Add sym_find_with_module(), which receives the module pointer as the second argument. It is equivalent to find_module() if NULL is passed as the module pointer. Signed-off-by: Masahiro Yamada Reviewed-by: Nicolas Schier Tested-by: Nathan Chancellor --- Changes in v4: - Only takes the new helper from https://patchwork.kernel.org/project/linux-kbuild/patch/20220505072244.1155033-2-masahi...@kernel.org/ Changes in v2: - Rename the new func to sym_find_with_module() scripts/mod/modpost.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index e7e2c70a98f5..fc5db1f73cf1 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -266,7 +266,7 @@ static void sym_add_unresolved(const char *name, struct module *mod, bool weak) list_add_tail(>list, >unresolved_symbols); } -static struct symbol *find_symbol(const char *name) +static struct symbol *sym_find_with_module(const char *name, struct module *mod) { struct symbol *s; @@ -275,12 +275,17 @@ static struct symbol *find_symbol(const char *name) name++; for (s = symbolhash[tdb_hash(name) % SYMBOL_HASH_SIZE]; s; s = s->next) { - if (strcmp(s->name, name) == 0) + if (strcmp(s->name, name) == 0 && (!mod || s->module == mod)) return s; } return NULL; } +static struct symbol *find_symbol(const char *name) +{ + return sym_find_with_module(name, NULL); +} + struct namespace_list { struct list_head list; char namespace[]; -- 2.32.0
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.18-4 tag
The pull request you sent on Sun, 08 May 2022 22:13:14 +1000: > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git > tags/powerpc-5.18-4 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/e3de3a1cda5fdc3ac42cb0d45321fb254500595f Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
Re: [PATCH v3 00/15] kbuild: yet another series of cleanups (modpost, LTO, MODULE_REL_CRCS)
On Thu, May 5, 2022 at 4:24 PM Masahiro Yamada wrote: > > > This is the third batch of cleanups in this development cycle. > > Major changes in v3: > > - Generate symbol CRCs as C code, and remove CONFIG_MODULE_REL_CRCS. > > Major changes in v2: > > - V1 did not work with CONFIG_MODULE_REL_CRCS. >I fixed this for v2. > > - Reflect some review comments in v1 > > - Refactor the code more > > - Avoid too long argument error > > > Masahiro Yamada (15): > modpost: mitigate false-negatives for static EXPORT_SYMBOL checks > modpost: change the license of EXPORT_SYMBOL to bool type > modpost: merge add_{intree_flag,retpoline,staging_flag} to add_header > modpost: move *.mod.c generation to write_mod_c_files() > kbuild: generate a list of objects in vmlinux > kbuild: record symbol versions in *.cmd files > modpost: extract symbol versions from *.cmd files > kbuild: link symbol CRCs at final link, removing > CONFIG_MODULE_REL_CRCS > kbuild: stop merging *.symversions > genksyms: adjust the output format to modpost > kbuild: do not create *.prelink.o for Clang LTO or IBT > modpost: simplify the ->is_static initialization > modpost: use hlist for hash table implementation > kbuild: make built-in.a rule robust against too long argument error > kbuild: make *.mod rule robust against too long argument error Only 03-06 were applied. I will send v4 for the rest. (I rewrote the static EXPORT checks). > > arch/powerpc/Kconfig | 1 - > arch/s390/Kconfig| 1 - > arch/um/Kconfig | 1 - > include/asm-generic/export.h | 22 +- > include/linux/export.h | 30 +-- > include/linux/symversion.h | 13 + > init/Kconfig | 4 - > kernel/module.c | 10 +- > scripts/Kbuild.include | 4 + > scripts/Makefile.build | 118 +++-- > scripts/Makefile.lib | 7 - > scripts/Makefile.modfinal| 5 +- > scripts/Makefile.modpost | 9 +- > scripts/genksyms/genksyms.c | 18 +- > scripts/link-vmlinux.sh | 46 ++-- > scripts/mod/file2alias.c | 2 - > scripts/mod/list.h | 52 > scripts/mod/modpost.c| 449 --- > scripts/mod/modpost.h| 2 + > 19 files changed, 402 insertions(+), 392 deletions(-) > create mode 100644 include/linux/symversion.h > > -- > 2.32.0 > > -- > You received this message because you are subscribed to the Google Groups > "Clang Built Linux" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clang-built-linux+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/clang-built-linux/20220505072244.1155033-1-masahiroy%40kernel.org. -- Best Regards Masahiro Yamada
Re: [PATCH v6 00/23] Rust support
On Sat, May 07, 2022 at 01:06:18AM -0700, Kees Cook wrote: > On Sat, May 07, 2022 at 07:23:58AM +0200, Miguel Ojeda wrote: > > ## Patch series status > > > > The Rust support is still to be considered experimental. However, > > support is good enough that kernel developers can start working on the > > Rust abstractions for subsystems and write drivers and other modules. > > I'd really like to see this landed for a few reasons: > > - It's under active development, and I'd rather review the changes > "normally", incrementally, etc. Right now it can be hard to re-review > some of the "mostly the same each version" patches in the series. > > - I'd like to break the catch-22 of "ask for a new driver to be > written in rust but the rust support isn't landed" vs "the rust > support isn't landed because there aren't enough drivers". It > really feels like "release early, release often" is needed here; > it's hard to develop against -next. :) > > Should we give it a try for this coming merge window? I'm broadly in favour of that. It's just code, we can always drop it again or fix it. There's sufficient development community around it that it's hardly going to become abandonware.
Re: [PATCH v2 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
Hi, On 5/8/2022 8:01 PM, kernel test robot wrote: Hi Baolin, I love your patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on next-20220506] [cannot apply to hnaz-mm/master arm64/for-next/core linus/master v5.18-rc5] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220508-174036 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything config: x86_64-randconfig-a013 (https://download.01.org/0day-ci/archive/20220508/202205081910.mstoc5rj-...@intel.com/config) compiler: gcc-11 (Debian 11.2.0-20) 11.2.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/907981b27213707fdb2f8a24c107d6752a09a773 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220508-174036 git checkout 907981b27213707fdb2f8a24c107d6752a09a773 # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): mm/rmap.c: In function 'try_to_migrate_one': mm/rmap.c:1931:34: error: implicit declaration of function 'huge_ptep_clear_flush'; did you mean 'ptep_clear_flush'? [-Werror=implicit-function-declaration] 1931 | pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); | ^ | ptep_clear_flush mm/rmap.c:1931:34: error: incompatible types when assigning to type 'pte_t' from type 'int' mm/rmap.c:2023:41: error: implicit declaration of function 'set_huge_pte_at'; did you mean 'set_huge_swap_pte_at'? [-Werror=implicit-function-declaration] 2023 | set_huge_pte_at(mm, address, pvmw.pte, pteval); | ^~~ | set_huge_swap_pte_at cc1: some warnings being treated as errors Thanks for reporting. I think I should add some dummy functions in hugetlb.h file if the CONFIG_HUGETLB_PAGE is not selected. I can pass the building with below changes and your config file. diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 306d6ef..9f71043 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr pte_t *ptep, pte_t pte, unsigned long sz) { } + +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return ptep_get(ptep); +} + +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ +} #endif /* CONFIG_HUGETLB_PAGE */
Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte
On 5/8/2022 7:09 PM, Muchun Song wrote: On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote: It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. Signed-off-by: Baolin Wang Acked-by: Mike Kravetz Reviewed-by: Muchun Song Thanks for reviewing. But one nit below: [...] diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8605d7e..61a21af 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, ClearHPageRestoreReserve(new_page); /* Break COW or unshare */ - huge_ptep_clear_flush(vma, haddr, ptep); + (void)huge_ptep_clear_flush(vma, haddr, ptep); Why add a "(void)" here? Is there any warning if no "(void)"? IIUC, I think we can remove this, right? I did not meet any warning without the casting, but this is per Mike's comment[1] to make the code consistent with other functions casting to void type explicitly in hugetlb.c file. [1] https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/
[GIT PULL] Please pull powerpc/linux.git powerpc-5.18-4 tag
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Linus, Please pull some more powerpc fixes for 5.18: The following changes since commit bb82c574691daf8f7fa9a160264d15c5804cb769: powerpc/perf: Fix 32bit compile (2022-04-21 23:26:47 +1000) are available in the git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-5.18-4 for you to fetch changes up to 348c71344111d7a48892e3e52264ff11956fc196: powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE (2022-05-06 12:44:03 +1000) - -- powerpc fixes for 5.18 #4 - Fix the DWARF CFI in our VDSO time functions, allowing gdb to backtrace through them correctly. - Fix a buffer overflow in the papr_scm driver, only triggerable by hypervisor input. - A fix in the recently added QoS handling for VAS (used for communicating with coprocessors). Thanks to: Alan Modra, Haren Myneni, Kajol Jain, Segher Boessenkool. - -- Haren Myneni (1): powerpc/pseries/vas: Use QoS credits from the userspace Kajol Jain (1): powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE Michael Ellerman (1): powerpc/vdso: Fix incorrect CFI in gettimeofday.S arch/powerpc/kernel/vdso/gettimeofday.S| 9 ++-- arch/powerpc/platforms/pseries/papr_scm.c | 7 ++ arch/powerpc/platforms/pseries/vas-sysfs.c | 19 +++- arch/powerpc/platforms/pseries/vas.c | 23 ++-- arch/powerpc/platforms/pseries/vas.h | 2 +- 5 files changed, 36 insertions(+), 24 deletions(-) -BEGIN PGP SIGNATURE- iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmJ3s60ACgkQUevqPMjh pYCE+xAAk+ButiF8vXxyO0/sWvW8F2qkGDvUlGn8Dwo8q8AaA70nCvzztcnBMScE KrUjJFOEAiQUKXCVsczWAcxQwPAkD6myTaoseUBNTc+fdeLiWzpAGRY9FTMR54M6 UtPtiSCUnz2UJnU4gIfAEYGGsnF2PMKnBnEV4ROFNqqIAihmQjW7oU7iLq4kNSX6 YOE5UPUpPSuyJgI1/KlseUuEsH/Hz0Fc3AvSEel+/pfTdPaIxed7Oxr116HsOHqJ Lda88F+4Tdk0OSC9Q9gzbyqQsvpIe2OTt9FQEuBbSAEV+eUbWuwBI44UVkpDDg/C HlcmxAGAoulLXTKrnt3RkjonLZuVwGCTgCJe9zTzWG00n1XzO6mvEuphyixlPsow 7Ej5QLSWkGMZhZO+wTcJpgcCcZ4TEYtpf3T5iBR2DlcftIgmlJtmSS99mwgMZ7ct LaHYJDOlSRCtxQipAeHBtybe/ngsxYIdCjNlumbEbYY6tUg5+6jY8DMkJ6KFHAfk 82h241dByF0YDW1HpG5D+RGpEvxTpQrFYhE9XPdOqQ07mwOzIg9DMmCLXrIofETV Ywb5+jY3DlpCZz0nxOHA+5SO1fealq8ZC4ZDKO3FErgqsUUCjuZJUbSLtFHGGRsF HIg+xDoXRpiGWwpIqrgozu2xxYE4AbDhe+sOVvF4APHTXIuP0+U= =Lyfd -END PGP SIGNATURE-
Re: [PATCH v2 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
Hi Baolin, I love your patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on next-20220506] [cannot apply to hnaz-mm/master arm64/for-next/core linus/master v5.18-rc5] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220508-174036 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything config: x86_64-randconfig-a014 (https://download.01.org/0day-ci/archive/20220508/202205081950.ipkfnyip-...@intel.com/config) compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project a385645b470e2d3a1534aae618ea56b31177639f) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/907981b27213707fdb2f8a24c107d6752a09a773 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220508-174036 git checkout 907981b27213707fdb2f8a24c107d6752a09a773 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): >> mm/rmap.c:1931:13: error: call to undeclared function >> 'huge_ptep_clear_flush'; ISO C99 and later do not support implicit function >> declarations [-Wimplicit-function-declaration] pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); ^ mm/rmap.c:1931:13: note: did you mean 'ptep_clear_flush'? include/linux/pgtable.h:431:14: note: 'ptep_clear_flush' declared here extern pte_t ptep_clear_flush(struct vm_area_struct *vma, ^ >> mm/rmap.c:1931:11: error: assigning to 'pte_t' from incompatible type 'int' pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); ^ ~ >> mm/rmap.c:2023:6: error: call to undeclared function 'set_huge_pte_at'; ISO >> C99 and later do not support implicit function declarations >> [-Wimplicit-function-declaration] set_huge_pte_at(mm, address, pvmw.pte, pteval); ^ mm/rmap.c:2035:6: error: call to undeclared function 'set_huge_pte_at'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] set_huge_pte_at(mm, address, pvmw.pte, pteval); ^ 4 errors generated. vim +/huge_ptep_clear_flush +1931 mm/rmap.c 1883 1884 /* Unexpected PMD-mapped THP? */ 1885 VM_BUG_ON_FOLIO(!pvmw.pte, folio); 1886 1887 subpage = folio_page(folio, 1888 pte_pfn(*pvmw.pte) - folio_pfn(folio)); 1889 address = pvmw.address; 1890 anon_exclusive = folio_test_anon(folio) && 1891 PageAnonExclusive(subpage); 1892 1893 if (folio_test_hugetlb(folio)) { 1894 /* 1895 * huge_pmd_unshare may unmap an entire PMD page. 1896 * There is no way of knowing exactly which PMDs may 1897 * be cached for this mm, so we must flush them all. 1898 * start/end were already adjusted above to cover this 1899 * range. 1900 */ 1901 flush_cache_range(vma, range.start, range.end); 1902 1903 if (!folio_test_anon(folio)) { 1904 /* 1905 * To call huge_pmd_unshare, i_mmap_rwsem must be 1906 * held in write mode. Caller needs to explicitly 1907 * do this outside rmap routines. 1908 */ 1909 VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); 1910 1911 if (huge_pmd_unshare(mm, vma, , pvmw.pte)) { 1912
Re: [PATCH v2 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
Hi Baolin, I love your patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on next-20220506] [cannot apply to hnaz-mm/master arm64/for-next/core linus/master v5.18-rc5] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220508-174036 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything config: x86_64-randconfig-a013 (https://download.01.org/0day-ci/archive/20220508/202205081910.mstoc5rj-...@intel.com/config) compiler: gcc-11 (Debian 11.2.0-20) 11.2.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/907981b27213707fdb2f8a24c107d6752a09a773 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220508-174036 git checkout 907981b27213707fdb2f8a24c107d6752a09a773 # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): mm/rmap.c: In function 'try_to_migrate_one': >> mm/rmap.c:1931:34: error: implicit declaration of function >> 'huge_ptep_clear_flush'; did you mean 'ptep_clear_flush'? >> [-Werror=implicit-function-declaration] 1931 | pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); | ^ | ptep_clear_flush >> mm/rmap.c:1931:34: error: incompatible types when assigning to type 'pte_t' >> from type 'int' >> mm/rmap.c:2023:41: error: implicit declaration of function >> 'set_huge_pte_at'; did you mean 'set_huge_swap_pte_at'? >> [-Werror=implicit-function-declaration] 2023 | set_huge_pte_at(mm, address, pvmw.pte, pteval); | ^~~ | set_huge_swap_pte_at cc1: some warnings being treated as errors vim +1931 mm/rmap.c 1883 1884 /* Unexpected PMD-mapped THP? */ 1885 VM_BUG_ON_FOLIO(!pvmw.pte, folio); 1886 1887 subpage = folio_page(folio, 1888 pte_pfn(*pvmw.pte) - folio_pfn(folio)); 1889 address = pvmw.address; 1890 anon_exclusive = folio_test_anon(folio) && 1891 PageAnonExclusive(subpage); 1892 1893 if (folio_test_hugetlb(folio)) { 1894 /* 1895 * huge_pmd_unshare may unmap an entire PMD page. 1896 * There is no way of knowing exactly which PMDs may 1897 * be cached for this mm, so we must flush them all. 1898 * start/end were already adjusted above to cover this 1899 * range. 1900 */ 1901 flush_cache_range(vma, range.start, range.end); 1902 1903 if (!folio_test_anon(folio)) { 1904 /* 1905 * To call huge_pmd_unshare, i_mmap_rwsem must be 1906 * held in write mode. Caller needs to explicitly 1907 * do this outside rmap routines. 1908 */ 1909 VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); 1910 1911 if (huge_pmd_unshare(mm, vma, , pvmw.pte)) { 1912 flush_tlb_range(vma, range.start, range.end); 1913 mmu_notifier_invalidate_range(mm, range.start, 1914 range.end); 1915 1916 /* 1917 * The ref count of the PMD page was dropped 1918 * which is part of the way map counting 1919 * is done for shared PMDs. Return 'true' 1920 * here. When there is no other sharing, 1921
Re: [PATCH] powerpc/pseries/vas: Use QoS credits from the userspace
On Sat, 19 Mar 2022 02:28:09 -0700, Haren Myneni wrote: > The user can change the QoS credits dynamically with the > management console interface which notifies OS with sysfs. After > returning from the OS interface successfully, the management > console updates the hypervisor. Since the VAS capabilities in > the hypervisor is not updated when the OS gets the update, > the kernel is using the old total credits value from the > hypervisor. Fix this issue by using the new QoS credits > from the userspace instead of depending on VAS capabilities > from the hypervisor. > > [...] Applied to powerpc/fixes. [1/1] powerpc/pseries/vas: Use QoS credits from the userspace https://git.kernel.org/powerpc/c/57831bfb5e78777dc399e351ed68ef77c3aee385 cheers
Re: [PATCH] powerpc/vdso: Fix incorrect CFI in gettimeofday.S
On Mon, 2 May 2022 22:50:10 +1000, Michael Ellerman wrote: > As reported by Alan, the CFI (Call Frame Information) in the VDSO time > routines is incorrect since commit ce7d8056e38b ("powerpc/vdso: Prepare > for switching VDSO to generic C implementation."). > > In particular the changes to the frame address register (r1) are not > properly described, which prevents gdb from being able to generate a > backtrace from inside VDSO functions, eg: > > [...] Applied to powerpc/fixes. [1/1] powerpc/vdso: Fix incorrect CFI in gettimeofday.S https://git.kernel.org/powerpc/c/6d65028eb67dbb7627651adfc460d64196d38bd8 cheers
Re: [PATCH] powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE
On Thu, 5 May 2022 21:04:51 +0530, Kajol Jain wrote: > With CONFIG_FORTIFY_SOURCE enabled, string functions will also perform > dynamic checks for string size which can panic the kernel, > like incase of overflow detection. > > In papr_scm, papr_scm_pmu_check_events function uses stat->stat_id > with string operations, to populate the nvdimm_events_map array. > Since stat_id variable is not NULL terminated, the kernel panics > with CONFIG_FORTIFY_SOURCE enabled at boot time. > > [...] Applied to powerpc/fixes. [1/1] powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE https://git.kernel.org/powerpc/c/348c71344111d7a48892e3e52264ff11956fc196 cheers
[PATCH v2 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When migrating a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it and remap it with a migration pte entry. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes. So we will nuke or remap only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page, which is not expected for hugetlb migration. The problem is we can still continue to modify the subpages' data of a hugetlb page during migrating a hugetlb page, which can cause a serious data consistent issue, since we did not nuke the page table entry and set a migration pte for the subpages of a hugetlb page. To fix this issue, we should change to use huge_ptep_clear_flush() to nuke a hugetlb page table, and remap it with set_huge_pte_at() and set_huge_swap_pte_at() when migrating a hugetlb page, which already considered the CONT-PTE or CONT-PMD size hugetlb. Signed-off-by: Baolin Wang --- mm/rmap.c | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 6fdd198..7cf2408 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, break; } } + + /* Nuke the hugetlb page table entry */ + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + pteval = ptep_clear_flush(vma, address, pvmw.pte); } - /* Nuke the page table entry. */ - pteval = ptep_clear_flush(vma, address, pvmw.pte); - /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); @@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, pte_t swp_pte; if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(); break; @@ -2024,7 +2029,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, !anon_exclusive, subpage); if (anon_exclusive && page_try_share_anon_rmap(subpage)) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(); break; @@ -2050,7 +2058,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, swp_pte = pte_swp_mksoft_dirty(swp_pte); if (pte_uffd_wp(pteval)) swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, address, pvmw.pte, swp_pte); + if (folio_test_hugetlb(folio)) + set_huge_swap_pte_at(mm, address, pvmw.pte, +swp_pte, vma_mmu_pagesize(vma)); + else + set_pte_at(mm, address, pvmw.pte, swp_pte); trace_set_migration_pte(address, pte_val(swp_pte), compound_order(>page)); /* -- 1.8.3.1
[PATCH v2 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When unmapping a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes, so we will nuke only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page. And now try_to_unmap() is only passed a hugetlb page in the case where the hugetlb page is poisoned. Which means now we will unmap only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb page, and we can still access other subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page, which will cause serious issues possibly. So we should change to use huge_ptep_clear_flush() to nuke the hugetlb page table to fix this issue, which already considered CONT-PTE and CONT-PMD size hugetlb. We've already used set_huge_swap_pte_at() to set a poisoned swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON() to make sure the passed hugetlb page is poisoned in try_to_unmap(). Signed-off-by: Baolin Wang --- mm/rmap.c | 39 ++- 1 file changed, 22 insertions(+), 17 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 7cf2408..37c8fd2 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1530,6 +1530,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (folio_test_hugetlb(folio)) { /* +* The try_to_unmap() is only passed a hugetlb page +* in the case where the hugetlb page is poisoned. +*/ + VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); + /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may * be cached for this mm, so we must flush them all. @@ -1564,28 +1569,28 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, break; } } + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); - } - - /* -* Nuke the page table entry. When having to clear -* PageAnonExclusive(), we always have to flush. -*/ - if (should_defer_flush(mm, flags) && !anon_exclusive) { /* -* We clear the PTE but do not flush so potentially -* a remote CPU could still be writing to the folio. -* If the entry was previously clean then the -* architecture must guarantee that a clear->dirty -* transition on a cached TLB entry is written through -* and traps if the PTE is unmapped. +* Nuke the page table entry. When having to clear +* PageAnonExclusive(), we always have to flush. */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); + if (should_defer_flush(mm, flags) && !anon_exclusive) { + /* +* We clear the PTE but do not flush so potentially +* a remote CPU could still be writing to the folio. +* If the entry was previously clean then the +* architecture must guarantee that a clear->dirty +* transition on a cached TLB entry is written through +* and traps if the PTE is unmapped. +*/ + pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); - } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + } else { + pteval = ptep_clear_flush(vma, address, pvmw.pte); + } } /* -- 1.8.3.1
[PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. Signed-off-by: Baolin Wang Acked-by: Mike Kravetz --- arch/arm64/include/asm/hugetlb.h | 4 ++-- arch/arm64/mm/hugetlbpage.c| 12 +--- arch/ia64/include/asm/hugetlb.h| 4 ++-- arch/mips/include/asm/hugetlb.h| 9 ++--- arch/parisc/include/asm/hugetlb.h | 4 ++-- arch/powerpc/include/asm/hugetlb.h | 9 ++--- arch/s390/include/asm/hugetlb.h| 6 +++--- arch/sh/include/asm/hugetlb.h | 4 ++-- arch/sparc/include/asm/hugetlb.h | 4 ++-- include/asm-generic/hugetlb.h | 4 ++-- mm/hugetlb.c | 2 +- 11 files changed, 33 insertions(+), 29 deletions(-) diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h index 1242f71..616b2ca 100644 --- a/arch/arm64/include/asm/hugetlb.h +++ b/arch/arm64/include/asm/hugetlb.h @@ -39,8 +39,8 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm, extern void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep); #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -extern void huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep); +extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep); #define __HAVE_ARCH_HUGE_PTE_CLEAR extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz); diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index cbace1c..ca8e65c 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -486,19 +486,17 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot)); } -void huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep) +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { size_t pgsize; int ncontig; - if (!pte_cont(READ_ONCE(*ptep))) { - ptep_clear_flush(vma, addr, ptep); - return; - } + if (!pte_cont(READ_ONCE(*ptep))) + return ptep_clear_flush(vma, addr, ptep); ncontig = find_num_contig(vma->vm_mm, addr, ptep, ); - clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); + return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); } static int __init hugetlbpage_init(void) diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h index 7e46ebd..65d3811 100644 --- a/arch/ia64/include/asm/hugetlb.h +++ b/arch/ia64/include/asm/hugetlb.h @@ -23,8 +23,8 @@ static inline int is_hugepage_only_range(struct mm_struct *mm, #define is_hugepage_only_range is_hugepage_only_range #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, -unsigned long addr, pte_t *ptep) +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { } diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h index c214440..fd69c88 100644 --- a/arch/mips/include/asm/hugetlb.h +++ b/arch/mips/include/asm/hugetlb.h @@ -43,16 +43,19 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, } #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, -unsigned long addr, pte_t *ptep) +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { + pte_t pte; + /* * clear the huge pte entry firstly, so that the other smp threads will * not get old pte entry after finishing flush_tlb_page and before * setting new huge pte entry */ - huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); + pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); flush_tlb_page(vma, addr); + return pte; } #define __HAVE_ARCH_HUGE_PTE_NONE diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h index a69cf9e..25bc560 100644 --- a/arch/parisc/include/asm/hugetlb.h +++ b/arch/parisc/include/asm/hugetlb.h @@ -28,8 +28,8 @@ static inline int prepare_hugepage_range(struct file *file, } #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void
[PATCH v2 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating
Hi, Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb page, which will cause potential data consistent issue. This patch set will change to use hugetlb related APIs to fix this issue, please find details in each patch. Thanks. Note: Mike pointed out the huge_ptep_get() will only return the one specific value, and it would not take into account the dirty or young bits of CONT-PTE/PMDs like the huge_ptep_get_and_clear() [1]. This inconsistent issue is not introduced by this patch set, and will address this issue in another thread [2]. Meanwhile the uffd for hugetlb case [3] pointed by Gerald also need another patch to address. [1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f...@oracle.com/ [2] https://lore.kernel.org/all/cover.1651998586.git.baolin.w...@linux.alibaba.com/ [3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/ Changes from v1: - Add acked tag from Mike. - Update some commit message. - Add VM_BUG_ON in try_to_unmap() for hugetlb case. - Add an explict void casting for huge_ptep_clear_flush() in hugetlb.c. Baolin Wang (3): mm: change huge_ptep_clear_flush() to return the original pte mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping arch/arm64/include/asm/hugetlb.h | 4 +-- arch/arm64/mm/hugetlbpage.c| 12 +++- arch/ia64/include/asm/hugetlb.h| 4 +-- arch/mips/include/asm/hugetlb.h| 9 -- arch/parisc/include/asm/hugetlb.h | 4 +-- arch/powerpc/include/asm/hugetlb.h | 9 -- arch/s390/include/asm/hugetlb.h| 6 ++-- arch/sh/include/asm/hugetlb.h | 4 +-- arch/sparc/include/asm/hugetlb.h | 4 +-- include/asm-generic/hugetlb.h | 4 +-- mm/hugetlb.c | 2 +- mm/rmap.c | 63 -- 12 files changed, 73 insertions(+), 52 deletions(-) -- 1.8.3.1
Re: [PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
On 5/7/2022 10:33 AM, Baolin Wang wrote: On 5/7/2022 1:56 AM, Mike Kravetz wrote: On 5/5/22 20:39, Baolin Wang wrote: On 5/6/2022 7:53 AM, Mike Kravetz wrote: On 4/29/22 01:14, Baolin Wang wrote: On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. diff --git a/mm/rmap.c b/mm/rmap.c index 6fdd198..7cf2408 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, break; } } + + /* Nuke the hugetlb page table entry */ + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + pteval = ptep_clear_flush(vma, address, pvmw.pte); } On arm64 with CONT-PTE/PMD the returned pteval will have dirty or young set if ANY of the PTE/PMDs had dirty or young set. Right. - /* Nuke the page table entry. */ - pteval = ptep_clear_flush(vma, address, pvmw.pte); - /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); @@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, pte_t swp_pte; if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); And, we will use that pteval for ALL the PTE/PMDs here. So, we would set the dirty or young bit in ALL PTE/PMDs. Could that cause any issues? May be more of a question for the arm64 people. I don't think this will cause any issues. Since the hugetlb can not be split, and we should not lose the the dirty or young state if any subpages were set. Meanwhile we already did like this in hugetlb.c: pte = huge_ptep_get_and_clear(mm, address, ptep); tlb_remove_huge_tlb_entry(h, tlb, ptep, address); if (huge_pte_dirty(pte)) set_page_dirty(page); Agree that it 'should not' cause issues. It just seems inconsistent. This is not a problem specifically with your patch, just the handling of CONT-PTE/PMD entries. There does not appear to be an arm64 specific version of huge_ptep_get() that takes CONT-PTE/PMD into account. So, huge_ptep_get() would only return the one specific value. It would not take into account the dirty or young bits of CONT-PTE/PMDs like your new version of huge_ptep_get_and_clear. Is that correct? Or, am I missing something. Yes, you are right. If I am correct, then code like the following may not work: static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long end, struct mm_walk *walk) { pte_t huge_pte = huge_ptep_get(pte); struct numa_maps *md; struct page *page; if (!pte_present(huge_pte)) return 0; page = pte_page(huge_pte); md = walk->private; gather_stats(page, md, pte_dirty(huge_pte), 1); return 0; } Right, this is inconsistent with current huge_ptep_get() interface like you said. So I think we can define an ARCH-specific huge_ptep_get() interface for arm64, and some sample code like below. How do you think? After some investigation, I send out a RFC patch set[1] to address this issue. We can talk about this issue in that thread. Thanks. [1] https://lore.kernel.org/all/cover.1651998586.git.baolin.w...@linux.alibaba.com/