Re: [PATCH V2 1/2] powerpc/mm/slice: Move slice_mask struct definition to slice.c
On Tuesday 14 February 2017 11:55 AM, Michael Ellerman wrote: "Aneesh Kumar K.V"writes: diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c index b3f45e413a60..08ac27eae408 100644 --- a/arch/powerpc/mm/slice.c +++ b/arch/powerpc/mm/slice.c @@ -37,7 +37,16 @@ #include static DEFINE_SPINLOCK(slice_convert_lock); - +/* + * One bit per slice. We have lower slices which cover 256MB segments + * upto 4G range. That gets us 16 low slices. For the rest we track slices + * in 1TB size. Can we tighten this comment up a bit. What about: + * One bit per slice. The low slices cover the range 0 - 4GB, each * slice being 256MB in size, for 16 low slices. The high slices * cover the rest of the address space at 1TB granularity, with the * exception of high slice 0 which covers the range 4GB - 1TB. OK? good. + * 64 below is actually SLICE_NUM_HIGH to fixup complie errros That line is bogus AFAICS, it refers to the old hardcoded value (prior to 512), I'll drop it. Thanks -aneesh
Re: [PATCH V2 1/2] powerpc/mm/slice: Move slice_mask struct definition to slice.c
"Aneesh Kumar K.V"writes: > diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c > index b3f45e413a60..08ac27eae408 100644 > --- a/arch/powerpc/mm/slice.c > +++ b/arch/powerpc/mm/slice.c > @@ -37,7 +37,16 @@ > #include > > static DEFINE_SPINLOCK(slice_convert_lock); > - > +/* > + * One bit per slice. We have lower slices which cover 256MB segments > + * upto 4G range. That gets us 16 low slices. For the rest we track slices > + * in 1TB size. Can we tighten this comment up a bit. What about: > + * One bit per slice. The low slices cover the range 0 - 4GB, each > * slice being 256MB in size, for 16 low slices. The high slices > * cover the rest of the address space at 1TB granularity, with the > * exception of high slice 0 which covers the range 4GB - 1TB. OK? > + * 64 below is actually SLICE_NUM_HIGH to fixup complie errros That line is bogus AFAICS, it refers to the old hardcoded value (prior to 512), I'll drop it. > + */ > +struct slice_mask { > + u64 low_slices; > + DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH); > +}; cheers
Re: [PATCH] KVM: PPC: Book3S: Ratelimit copy data failure error messages
Forwarded same patch to k...@vger.kernel.org and kvm-...@vger.kernel.org too. On Tuesday 14 February 2017 12:26 AM, Vipin K Parashar wrote: kvm_ppc_mmu_book3s_32/64 xlat() log "KVM can't copy data" error upon failing to copy user data to kernel space. This floods kernel log once such fails occur in short time period. Ratelimit this error to avoid flooding kernel logs upon copy data failures. Signed-off-by: Vipin K Parashar--- arch/powerpc/kvm/book3s_32_mmu.c | 3 ++- arch/powerpc/kvm/book3s_64_mmu.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index a2eb6d3..ca8f960 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -224,7 +224,8 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr, ptem = kvmppc_mmu_book3s_32_get_ptem(sre, eaddr, primary); if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) { - printk(KERN_ERR "KVM: Can't copy data from 0x%lx!\n", ptegp); + if (printk_ratelimit()) + printk(KERN_ERR "KVM: Can't copy data from 0x%lx!\n", ptegp); goto no_page_found; } diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c index b9131aa..b420aca 100644 --- a/arch/powerpc/kvm/book3s_64_mmu.c +++ b/arch/powerpc/kvm/book3s_64_mmu.c @@ -265,7 +265,8 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, goto no_page_found; if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) { - printk(KERN_ERR "KVM can't copy data from 0x%lx!\n", ptegp); + if (printk_ratelimit()) + printk(KERN_ERR "KVM can't copy data from 0x%lx!\n", ptegp); goto no_page_found; }
linux-next: build failure after merge of the akpm-current tree
Hi Andrew, After merging the akpm-current tree, today's linux-next build (powerpc ppc64_defconfig) failed like this: arch/powerpc/lib/code-patching.c:61:16: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'is_conditional_branch' bool __kprobes is_conditional_branch(unsigned int instr) ^ Caused by commit 916c821aaf13 ("kprobes: move kprobe declarations to asm-generic/kprobes.h") interacting with commit 51c9c0843993 ("powerpc/kprobes: Implement Optprobes") from the powerpc tree. I have applied this merge fix patch for today: From: Stephen RothwellDate: Tue, 14 Feb 2017 16:56:11 +1100 Subject: [PATCH] powerpc/kprobes: fixup for kprobes declarations moving Signed-off-by: Stephen Rothwell --- arch/powerpc/lib/code-patching.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 0899315e1434..0d3002b7e2b4 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -14,6 +14,7 @@ #include #include #include +#include int patch_instruction(unsigned int *addr, unsigned int instr) -- 2.10.2 -- Cheers, Stephen Rothwell
Re: [PATCH V2 1/2] mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte.
On Tuesday 14 February 2017 11:19 AM, Michael Ellerman wrote: "Aneesh Kumar K.V"writes: Autonuma preserves the write permission across numa fault to avoid taking a writefault after a numa fault (Commit: b191f9b106ea " mm: numa: preserve PTE write permissions across a NUMA hinting fault"). Architecture can implement protnone in different ways and some may choose to implement that by clearing Read/ Write/Exec bit of pte. Setting the write bit on such pte can result in wrong behaviour. Fix this up by allowing arch to override how to save the write bit on a protnone pte. This is pretty obviously a nop on arches that don't implement the new hooks, but it'd still be good to get an ack from someone in mm land before I merge it. To get it apply cleanly you may need http://ozlabs.org/~akpm/mmots/broken-out/mm-autonuma-dont-use-set_pte_at-when-updating-protnone-ptes.patch http://ozlabs.org/~akpm/mmots/broken-out/mm-autonuma-dont-use-set_pte_at-when-updating-protnone-ptes-fix.patch They are strictly not needed after the saved write patch. But I didn't request to drop them, because the patch helps us to get closer to the goal of no ste_pte_at() call on present ptes. -aneesh
Re: [PATCH V2 1/2] mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte.
"Aneesh Kumar K.V"writes: > Autonuma preserves the write permission across numa fault to avoid taking > a writefault after a numa fault (Commit: b191f9b106ea " mm: numa: preserve PTE > write permissions across a NUMA hinting fault"). Architecture can implement > protnone in different ways and some may choose to implement that by clearing > Read/ > Write/Exec bit of pte. Setting the write bit on such pte can result in wrong > behaviour. Fix this up by allowing arch to override how to save the write bit > on a protnone pte. This is pretty obviously a nop on arches that don't implement the new hooks, but it'd still be good to get an ack from someone in mm land before I merge it. cheers > Acked-By: Michael Neuling > Signed-off-by: Aneesh Kumar K.V > --- > include/asm-generic/pgtable.h | 16 > mm/huge_memory.c | 4 ++-- > mm/memory.c | 2 +- > mm/mprotect.c | 4 ++-- > 4 files changed, 21 insertions(+), 5 deletions(-) > > diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h > index 18af2bcefe6a..b6f3a8a4b738 100644 > --- a/include/asm-generic/pgtable.h > +++ b/include/asm-generic/pgtable.h > @@ -192,6 +192,22 @@ static inline void ptep_set_wrprotect(struct mm_struct > *mm, unsigned long addres > } > #endif > > +#ifndef pte_savedwrite > +#define pte_savedwrite pte_write > +#endif > + > +#ifndef pte_mk_savedwrite > +#define pte_mk_savedwrite pte_mkwrite > +#endif > + > +#ifndef pmd_savedwrite > +#define pmd_savedwrite pmd_write > +#endif > + > +#ifndef pmd_mk_savedwrite > +#define pmd_mk_savedwrite pmd_mkwrite > +#endif > + > #ifndef __HAVE_ARCH_PMDP_SET_WRPROTECT > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > static inline void pmdp_set_wrprotect(struct mm_struct *mm, > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 9a6bd6c8d55a..2f0f855ec911 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1300,7 +1300,7 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t > pmd) > goto out; > clear_pmdnuma: > BUG_ON(!PageLocked(page)); > - was_writable = pmd_write(pmd); > + was_writable = pmd_savedwrite(pmd); > pmd = pmd_modify(pmd, vma->vm_page_prot); > pmd = pmd_mkyoung(pmd); > if (was_writable) > @@ -1555,7 +1555,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t > *pmd, > entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd); > entry = pmd_modify(entry, newprot); > if (preserve_write) > - entry = pmd_mkwrite(entry); > + entry = pmd_mk_savedwrite(entry); > ret = HPAGE_PMD_NR; > set_pmd_at(mm, addr, pmd, entry); > BUG_ON(vma_is_anonymous(vma) && !preserve_write && > diff --git a/mm/memory.c b/mm/memory.c > index e78bf72f30dd..88c24f89d6d3 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3388,7 +3388,7 @@ static int do_numa_page(struct vm_fault *vmf) > int target_nid; > bool migrated = false; > pte_t pte; > - bool was_writable = pte_write(vmf->orig_pte); > + bool was_writable = pte_savedwrite(vmf->orig_pte); > int flags = 0; > > /* > diff --git a/mm/mprotect.c b/mm/mprotect.c > index f9c07f54dd62..15f5c174a7c1 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -113,13 +113,13 @@ static unsigned long change_pte_range(struct > vm_area_struct *vma, pmd_t *pmd, > ptent = ptep_modify_prot_start(mm, addr, pte); > ptent = pte_modify(ptent, newprot); > if (preserve_write) > - ptent = pte_mkwrite(ptent); > + ptent = pte_mk_savedwrite(ptent); > > /* Avoid taking write faults for known dirty pages */ > if (dirty_accountable && pte_dirty(ptent) && > (pte_soft_dirty(ptent) || >!(vma->vm_flags & VM_SOFTDIRTY))) { > - ptent = pte_mkwrite(ptent); > + ptent = pte_mk_savedwrite(ptent); > } > ptep_modify_prot_commit(mm, addr, pte, ptent); > pages++; > -- > 2.7.4
[PATCH V2 2/2] powerpc/mm/autonuma: Switch ppc64 to its own implementeation of saved write
With this our protnone becomes a present pte with READ/WRITE/EXEC bit cleared. By default we also set _PAGE_PRIVILEGED on such pte. This is now used to help us identify a protnone pte that as saved write bit. For such pte, we will clear the _PAGE_PRIVILEGED bit. The pte still remain non-accessible from both user and kernel. Acked-By: Michael NeulingSigned-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 3 +++ arch/powerpc/include/asm/book3s/64/pgtable.h | 32 +-- 2 files changed, 33 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index 0735d5a8049f..8720a406bbbe 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -16,6 +16,9 @@ #include #include +#ifndef __ASSEMBLY__ +#include +#endif /* * This is necessary to get the definition of PGTABLE_RANGE which we * need for various slices related matters. Note that this isn't the diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index fef738229a68..c684ef6cbd10 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -441,8 +441,8 @@ static inline pte_t pte_clear_soft_dirty(pte_t pte) */ static inline int pte_protnone(pte_t pte) { - return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED)) == - cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED); + return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX)) == + cpu_to_be64(_PAGE_PRESENT); } #endif /* CONFIG_NUMA_BALANCING */ @@ -512,6 +512,32 @@ static inline pte_t pte_mkhuge(pte_t pte) return pte; } +#define pte_mk_savedwrite pte_mk_savedwrite +static inline pte_t pte_mk_savedwrite(pte_t pte) +{ + /* +* Used by Autonuma subsystem to preserve the write bit +* while marking the pte PROT_NONE. Only allow this +* on PROT_NONE pte +*/ + VM_BUG_ON((pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX | _PAGE_PRIVILEGED)) != + cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED)); + return __pte(pte_val(pte) & ~_PAGE_PRIVILEGED); +} + +#define pte_savedwrite pte_savedwrite +static inline bool pte_savedwrite(pte_t pte) +{ + /* +* Saved write ptes are prot none ptes that doesn't have +* privileged bit sit. We mark prot none as one which has +* present and pviliged bit set and RWX cleared. To mark +* protnone which used to have _PAGE_WRITE set we clear +* the privileged bit. +*/ + return !(pte_raw(pte) & cpu_to_be64(_PAGE_RWX | _PAGE_PRIVILEGED)); +} + static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { /* FIXME!! check whether this need to be a conditional */ @@ -873,6 +899,7 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd) #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) #define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) +#define pmd_mk_savedwrite(pmd) pte_pmd(pte_mk_savedwrite(pmd_pte(pmd))) #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY #define pmd_soft_dirty(pmd)pte_soft_dirty(pmd_pte(pmd)) @@ -889,6 +916,7 @@ static inline int pmd_protnone(pmd_t pmd) #define __HAVE_ARCH_PMD_WRITE #define pmd_write(pmd) pte_write(pmd_pte(pmd)) +#define pmd_savedwrite(pmd)pte_savedwrite(pmd_pte(pmd)) #ifdef CONFIG_TRANSPARENT_HUGEPAGE extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot); -- 2.7.4
[PATCH V2 1/2] mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte.
Autonuma preserves the write permission across numa fault to avoid taking a writefault after a numa fault (Commit: b191f9b106ea " mm: numa: preserve PTE write permissions across a NUMA hinting fault"). Architecture can implement protnone in different ways and some may choose to implement that by clearing Read/ Write/Exec bit of pte. Setting the write bit on such pte can result in wrong behaviour. Fix this up by allowing arch to override how to save the write bit on a protnone pte. Acked-By: Michael NeulingSigned-off-by: Aneesh Kumar K.V --- include/asm-generic/pgtable.h | 16 mm/huge_memory.c | 4 ++-- mm/memory.c | 2 +- mm/mprotect.c | 4 ++-- 4 files changed, 21 insertions(+), 5 deletions(-) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 18af2bcefe6a..b6f3a8a4b738 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -192,6 +192,22 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres } #endif +#ifndef pte_savedwrite +#define pte_savedwrite pte_write +#endif + +#ifndef pte_mk_savedwrite +#define pte_mk_savedwrite pte_mkwrite +#endif + +#ifndef pmd_savedwrite +#define pmd_savedwrite pmd_write +#endif + +#ifndef pmd_mk_savedwrite +#define pmd_mk_savedwrite pmd_mkwrite +#endif + #ifndef __HAVE_ARCH_PMDP_SET_WRPROTECT #ifdef CONFIG_TRANSPARENT_HUGEPAGE static inline void pmdp_set_wrprotect(struct mm_struct *mm, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9a6bd6c8d55a..2f0f855ec911 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1300,7 +1300,7 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) goto out; clear_pmdnuma: BUG_ON(!PageLocked(page)); - was_writable = pmd_write(pmd); + was_writable = pmd_savedwrite(pmd); pmd = pmd_modify(pmd, vma->vm_page_prot); pmd = pmd_mkyoung(pmd); if (was_writable) @@ -1555,7 +1555,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd); entry = pmd_modify(entry, newprot); if (preserve_write) - entry = pmd_mkwrite(entry); + entry = pmd_mk_savedwrite(entry); ret = HPAGE_PMD_NR; set_pmd_at(mm, addr, pmd, entry); BUG_ON(vma_is_anonymous(vma) && !preserve_write && diff --git a/mm/memory.c b/mm/memory.c index e78bf72f30dd..88c24f89d6d3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3388,7 +3388,7 @@ static int do_numa_page(struct vm_fault *vmf) int target_nid; bool migrated = false; pte_t pte; - bool was_writable = pte_write(vmf->orig_pte); + bool was_writable = pte_savedwrite(vmf->orig_pte); int flags = 0; /* diff --git a/mm/mprotect.c b/mm/mprotect.c index f9c07f54dd62..15f5c174a7c1 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -113,13 +113,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, ptent = ptep_modify_prot_start(mm, addr, pte); ptent = pte_modify(ptent, newprot); if (preserve_write) - ptent = pte_mkwrite(ptent); + ptent = pte_mk_savedwrite(ptent); /* Avoid taking write faults for known dirty pages */ if (dirty_accountable && pte_dirty(ptent) && (pte_soft_dirty(ptent) || !(vma->vm_flags & VM_SOFTDIRTY))) { - ptent = pte_mkwrite(ptent); + ptent = pte_mk_savedwrite(ptent); } ptep_modify_prot_commit(mm, addr, pte, ptent); pages++; -- 2.7.4
[PATCH V2 0/2] Numabalancing preserve write fix
This patch series address an issue w.r.t THP migration and autonuma preserve write feature. migrate_misplaced_transhuge_page() cannot deal with concurrent modification of the page. It does a page copy without following the migration pte sequence. IIUC, this was done to keep the migration simpler and at the time of implemenation we didn't had THP page cache which would have required a more elaborate migration scheme. That means thp autonuma migration expect the protnone with saved write to be done such that both kernel and user cannot update the page content. This patch series enables archs like ppc64 to do that. We are good with the hash translation mode with the current code, because we never create a hardware page table entry for a protnone pte. Changes from V1: * Update the patch so that it apply cleanly to upstream. * Add acked-by from Michael Neuling Aneesh Kumar K.V (2): mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte. powerpc/mm/autonuma: Switch ppc64 to its own implementeation of saved write arch/powerpc/include/asm/book3s/64/mmu-hash.h | 3 +++ arch/powerpc/include/asm/book3s/64/pgtable.h | 32 +-- include/asm-generic/pgtable.h | 16 ++ mm/huge_memory.c | 4 ++-- mm/memory.c | 2 +- mm/mprotect.c | 4 ++-- 6 files changed, 54 insertions(+), 7 deletions(-) -- 2.7.4
Re: [PATCH 3/5] selftests: Fix the .S and .S -> .o rules
Tested-by: Bamvor Jian ZhangOn 9 February 2017 at 16:56, Michael Ellerman wrote: > Both these rules incorrectly use $< (first prerequisite) rather than > $^ (all prerequisites), meaning they don't work if we're using more than > one .S file as input. Switch them to using $^. > > They also don't include $(CPPFLAGS) and other variables used in the > default rules, which breaks targets that require those. Fix that by > using the builtin $(COMPILE.S) and $(LINK.S) rules. > > Fixes: a8ba798bc8ec ("selftests: enable O and KBUILD_OUTPUT") > Signed-off-by: Michael Ellerman > --- > tools/testing/selftests/lib.mk | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk > index 98841c54763a..ce96d80ad64f 100644 > --- a/tools/testing/selftests/lib.mk > +++ b/tools/testing/selftests/lib.mk > @@ -54,9 +54,9 @@ $(OUTPUT)/%:%.c > $(LINK.c) $^ $(LDLIBS) -o $@ > > $(OUTPUT)/%.o:%.S > - $(CC) $(ASFLAGS) -c $< -o $@ > + $(COMPILE.S) $^ -o $@ > > $(OUTPUT)/%:%.S > - $(CC) $(ASFLAGS) $< -o $@ > + $(LINK.S) $^ $(LDLIBS) -o $@ > > .PHONY: run_tests all clean install emit_tests > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-api" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] selftests: Fix the .c linking rule
Tested-by: Bamvor Jian ZhangOn 9 February 2017 at 16:56, Michael Ellerman wrote: > Currently we can't build some tests, for example: > > $ make -C tools/testing/selftests/ TARGETS=vm > ... > gcc -Wall -I ../../../../usr/include -lrt -lpthread > ../../../../usr/include/linux/kernel.h userfaultfd.c -o > tools/testing/selftests/vm/userfaultfd > /tmp/ccmOkQSM.o: In function `stress': > userfaultfd.c:(.text+0xc60): undefined reference to `pthread_create' > userfaultfd.c:(.text+0xca5): undefined reference to `pthread_create' > userfaultfd.c:(.text+0xcee): undefined reference to `pthread_create' > userfaultfd.c:(.text+0xd30): undefined reference to `pthread_create' > userfaultfd.c:(.text+0xd77): undefined reference to `pthread_join' > userfaultfd.c:(.text+0xe7d): undefined reference to `pthread_join' > userfaultfd.c:(.text+0xe9f): undefined reference to `pthread_cancel' > userfaultfd.c:(.text+0xec6): undefined reference to `pthread_join' > userfaultfd.c:(.text+0xf14): undefined reference to `pthread_join' > /tmp/ccmOkQSM.o: In function `userfaultfd_stress': > userfaultfd.c:(.text+0x13e2): undefined reference to > `pthread_attr_setstacksize' > collect2: error: ld returned 1 exit status > > This is because the rule for linking .c files to binaries is incorrect. > > The first bug is that it uses $< (first prerequisite) instead of $^ (all > preqrequisites), fix it by using ^$. > > Secondly the ordering of the prerequisites vs $(LDLIBS) is wrong, > meaning on toolchains that use --as-needed we fail to link (as above). > Fix that by placing $(LDLIBS) *after* ^$. > > Finally switch to using the default rule $(LINK.c), so that we get > $(CPPFLAGS) etc. included. > > Fixes: a8ba798bc8ec ("selftests: enable O and KBUILD_OUTPUT") > Signed-off-by: Michael Ellerman > --- > tools/testing/selftests/lib.mk | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk > index 17ed4bbe3963..98841c54763a 100644 > --- a/tools/testing/selftests/lib.mk > +++ b/tools/testing/selftests/lib.mk > @@ -51,7 +51,7 @@ clean: > $(RM) -r $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) > $(TEST_GEN_FILES) $(EXTRA_CLEAN) > > $(OUTPUT)/%:%.c > - $(CC) $(CFLAGS) $(LDFLAGS) $(LDLIBS) $< -o $@ > + $(LINK.c) $^ $(LDLIBS) -o $@ > > $(OUTPUT)/%.o:%.S > $(CC) $(ASFLAGS) -c $< -o $@ > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-api" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] selftests: Fix selftests build to just build, not run tests
Tested by: Bamvor Jian ZhangOn 9 February 2017 at 16:56, Michael Ellerman wrote: > In commit 88baa78d1f31 ("selftests: remove duplicated all and clean > target"), the "all" target was removed from individual Makefiles and > added to lib.mk. > > However the "all" target was added to lib.mk *after* the existing > "runtests" target. This means "runtests" becomes the first (default) > target for most of our Makefiles. > > This has the effect of causing a plain "make" to build *and run* the > tests. Which is at best rude, but depending on which tests are run could > oops someone's build machine. > > $ make -C tools/testing/selftests/ > ... > make[1]: Entering directory 'tools/testing/selftests/bpf' > gcc -Wall -O2 -I../../../../usr/include test_verifier.c -o > tools/testing/selftests/bpf/test_verifier > gcc -Wall -O2 -I../../../../usr/include test_maps.c -o > tools/testing/selftests/bpf/test_maps > gcc -Wall -O2 -I../../../../usr/include test_lru_map.c -o > tools/testing/selftests/bpf/test_lru_map > #0 add+sub+mul FAIL > Failed to load prog 'Function not implemented'! > #1 unreachable FAIL > Unexpected error message! > #2 unreachable2 FAIL > ... > > Fix it by moving the "all" target to the start of lib.mk, making it the > default target. > > Fixes: 88baa78d1f31 ("selftests: remove duplicated all and clean target") > Signed-off-by: Michael Ellerman > --- > tools/testing/selftests/lib.mk | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk > index 01bb7782a35e..17ed4bbe3963 100644 > --- a/tools/testing/selftests/lib.mk > +++ b/tools/testing/selftests/lib.mk > @@ -2,6 +2,11 @@ > # Makefile can operate with or without the kbuild infrastructure. > CC := $(CROSS_COMPILE)gcc > > +TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS)) > +TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES)) > + > +all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) > + > define RUN_TESTS > @for TEST in $(TEST_GEN_PROGS) $(TEST_PROGS); do \ > BASENAME_TEST=`basename $$TEST`;\ > @@ -42,11 +47,6 @@ endef > emit_tests: > $(EMIT_TESTS) > > -TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS)) > -TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES)) > - > -all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) > - > clean: > $(RM) -r $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) > $(TEST_GEN_FILES) $(EXTRA_CLEAN) > > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-api" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] powerpc/mm/radix: Skip ptesync in pte update helpers
On Thu, 2017-02-09 at 08:28 +0530, Aneesh Kumar K.V wrote: > We do them at the start of tlb flush, and we are sure a pte update will be > followed by a tlbflush. Hence we can skip the ptesync in pte update helpers. > > Signed-off-by: Aneesh Kumar K.VTested-by: Michael Neuling > --- > arch/powerpc/include/asm/book3s/64/radix.h | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h > b/arch/powerpc/include/asm/book3s/64/radix.h > index fcf822d6c204..77e590c77299 100644 > --- a/arch/powerpc/include/asm/book3s/64/radix.h > +++ b/arch/powerpc/include/asm/book3s/64/radix.h > @@ -144,13 +144,11 @@ static inline unsigned long radix__pte_update(struct > mm_struct *mm, > * new value of pte > */ > new_pte = (old_pte | set) & ~clr; > - asm volatile("ptesync" : : : "memory"); > radix__flush_tlb_pte_p9_dd1(old_pte, mm, addr); > if (new_pte) > __radix_pte_update(ptep, 0, new_pte); > } else > old_pte = __radix_pte_update(ptep, clr, set); > - asm volatile("ptesync" : : : "memory"); > if (!huge) > assert_pte_locked(mm, addr); > > @@ -195,7 +193,6 @@ static inline void radix__ptep_set_access_flags(struct > mm_struct *mm, > unsigned long old_pte, new_pte; > > old_pte = __radix_pte_update(ptep, ~0, 0); > - asm volatile("ptesync" : : : "memory"); > /* > * new value of pte > */
Re: [PATCH 2/3] powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full mm
On Thu, 2017-02-09 at 08:28 +0530, Aneesh Kumar K.V wrote: > This helps us to do some optimization for application exit case, where we can > skip the DD1 style pte update sequence. > > Signed-off-by: Aneesh Kumar K.VTested-by: Michael Neuling > --- > arch/powerpc/include/asm/book3s/64/pgtable.h | 17 + > arch/powerpc/include/asm/book3s/64/radix.h | 23 ++- > 2 files changed, 39 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > b/arch/powerpc/include/asm/book3s/64/pgtable.h > index 6f15bde94da2..e91ada786d48 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > @@ -373,6 +373,23 @@ static inline pte_t ptep_get_and_clear(struct mm_struct > *mm, > return __pte(old); > } > > +#define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL > +static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, > + unsigned long addr, > + pte_t *ptep, int full) > +{ > + if (full && radix_enabled()) { > + /* > + * Let's skip the DD1 style pte update here. We know that > + * this is a full mm pte clear and hence can be sure there is > + * no parallel set_pte. > + */ > + return radix__ptep_get_and_clear_full(mm, addr, ptep, full); > + } > + return ptep_get_and_clear(mm, addr, ptep); > +} > + > + > static inline void pte_clear(struct mm_struct *mm, unsigned long addr, > pte_t * ptep) > { > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h > b/arch/powerpc/include/asm/book3s/64/radix.h > index 70a3cdcdbe47..fcf822d6c204 100644 > --- a/arch/powerpc/include/asm/book3s/64/radix.h > +++ b/arch/powerpc/include/asm/book3s/64/radix.h > @@ -139,7 +139,7 @@ static inline unsigned long radix__pte_update(struct > mm_struct *mm, > > unsigned long new_pte; > > - old_pte = __radix_pte_update(ptep, ~0, 0); > + old_pte = __radix_pte_update(ptep, ~0ul, 0); > /* > * new value of pte > */ > @@ -157,6 +157,27 @@ static inline unsigned long radix__pte_update(struct > mm_struct *mm, > return old_pte; > } > > +static inline pte_t radix__ptep_get_and_clear_full(struct mm_struct *mm, > + unsigned long addr, > + pte_t *ptep, int full) > +{ > + unsigned long old_pte; > + > + if (full) { > + /* > + * If we are trying to clear the pte, we can skip > + * the DD1 pte update sequence and batch the tlb flush. The > + * tlb flush batching is done by mmu gather code. We > + * still keep the cmp_xchg update to make sure we get > + * correct R/C bit which might be updated via Nest MMU. > + */ > + old_pte = __radix_pte_update(ptep, ~0ul, 0); > + } else > + old_pte = radix__pte_update(mm, addr, ptep, ~0ul, 0, 0); > + > + return __pte(old_pte); > +} > + > /* > * Set the dirty and/or accessed bits atomically in a linux PTE, this > * function doesn't need to invalidate tlb.
Re: [PATCH 1/3] powerpc/mm/radix: Update pte update sequence for pte clear case
On Thu, 2017-02-09 at 08:28 +0530, Aneesh Kumar K.V wrote: > In the kernel we do follow the below sequence in different code paths. > pte = ptep_get_clear(ptep) > > set_pte_at(ptep, pte) > > We do that for mremap, autonuma protection update and softdirty clearing. This > implies our optimization to skip a tlb flush when clearing a pte update is > not valid, because for DD1 system that followup set_pte_at will be done witout > doing the required tlbflush. Fix that by always doing the dd1 style pte update > irrespective of new_pte value. In a later patch we will optimize the > application > exit case. > > Signed-off-by: Benjamin Herrenschmidt> Signed-off-by: Aneesh Kumar K.V Tested-by: Michael Neuling > --- > arch/powerpc/include/asm/book3s/64/radix.h | 12 +++- > 1 file changed, 3 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h > b/arch/powerpc/include/asm/book3s/64/radix.h > index b4d1302387a3..70a3cdcdbe47 100644 > --- a/arch/powerpc/include/asm/book3s/64/radix.h > +++ b/arch/powerpc/include/asm/book3s/64/radix.h > @@ -144,16 +144,10 @@ static inline unsigned long radix__pte_update(struct > mm_struct *mm, > * new value of pte > */ > new_pte = (old_pte | set) & ~clr; > - /* > - * If we are trying to clear the pte, we can skip > - * the below sequence and batch the tlb flush. The > - * tlb flush batching is done by mmu gather code > - */ > - if (new_pte) { > - asm volatile("ptesync" : : : "memory"); > - radix__flush_tlb_pte_p9_dd1(old_pte, mm, addr); > + asm volatile("ptesync" : : : "memory"); > + radix__flush_tlb_pte_p9_dd1(old_pte, mm, addr); > + if (new_pte) > __radix_pte_update(ptep, 0, new_pte); > - } > } else > old_pte = __radix_pte_update(ptep, clr, set); > asm volatile("ptesync" : : : "memory");
Re: [PATCH] powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=n
Michael Ellermanwrites: > If we enable RADIX but disable HUGETLBFS, the build breaks with: > > arch/powerpc/mm/pgtable-radix.c:557:7: error: implicit declaration of > function 'pmd_huge' > arch/powerpc/mm/pgtable-radix.c:588:7: error: implicit declaration of > function 'pud_huge' > > Fix it by stubbing those functions when HUGETLBFS=n. > Reviewed-by: Aneesh Kumar K.V > Fixes: 4b5d62ca17a1 ("powerpc/mm: add radix__remove_section_mapping()") > Signed-off-by: Michael Ellerman > --- > arch/powerpc/include/asm/book3s/64/pgtable-4k.h | 5 + > arch/powerpc/include/asm/book3s/64/pgtable-64k.h | 3 +++ > 2 files changed, 8 insertions(+) > > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h > b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h > index 9db83b4e017d..8708a0239a56 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h > @@ -47,7 +47,12 @@ static inline int hugepd_ok(hugepd_t hpd) > return hash__hugepd_ok(hpd); > } > #define is_hugepd(hpd) (hugepd_ok(hpd)) > + > +#else /* !CONFIG_HUGETLB_PAGE */ > +static inline int pmd_huge(pmd_t pmd) { return 0; } > +static inline int pud_huge(pud_t pud) { return 0; } > #endif /* CONFIG_HUGETLB_PAGE */ > + > #endif /* __ASSEMBLY__ */ > > #endif /*_ASM_POWERPC_BOOK3S_64_PGTABLE_4K_H */ > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h > b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h > index 198aff33c380..2ce4209399ed 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h > @@ -46,6 +46,9 @@ static inline int hugepd_ok(hugepd_t hpd) > } > #define is_hugepd(pdep) 0 > > +#else /* !CONFIG_HUGETLB_PAGE */ > +static inline int pmd_huge(pmd_t pmd) { return 0; } > +static inline int pud_huge(pud_t pud) { return 0; } > #endif /* CONFIG_HUGETLB_PAGE */ > > static inline int remap_4k_pfn(struct vm_area_struct *vma, unsigned long > addr, > -- > 2.7.4
Re: [PATCH] powerpc/xmon: add debugfs entry for xmon
在 2017/2/14 10:35, Nicholas Piggin 写道: On Mon, 13 Feb 2017 19:00:42 -0200 "Guilherme G. Piccoli"wrote: Currently the xmon debugger is set only via kernel boot command-line. It's disabled by default, and can be enabled with "xmon=on" on the command-line. Also, xmon may be accessed via sysrq mechanism, but once we enter xmon via sysrq, it's kept enabled until system is rebooted, even if we exit the debugger. A kernel crash will then lead to xmon instance, instead of triggering a kdump procedure (if configured), for example. This patch introduces a debugfs entry for xmon, allowing user to query its current state and change it if desired. Basically, the "xmon" file to read from/write to is under the debugfs mount point, on powerpc directory. Reading this file will provide the current state of the debugger, one of the following: "on", "off", "early" or "nobt". Writing one of these states to the file will take immediate effect on the debugger. Signed-off-by: Guilherme G. Piccoli --- * I had this patch partially done for some time, and after a discussion at the kernel slack channel latest week, I decided to rebase and fix some remaining bugs. I'd change 'x' option to always disable the debugger, since with this patch we can always re-enable xmon, but today I noticed Pan's patch on the mailing list, so perhaps his approach of adding a flag to 'x' option is preferable. I can change this in a V2, if requested. Thanks in advance! xmon state changing after the first sysrq+x violates principle of least astonishment, so I think that should be fixed. hi, Nick yes, as long as xmon is disabled during boot, it should still be disabled after existing xmon. My patch does not fix that as it need people add one more char 'z' following 'x'. I will provide a new patch to fix that. Then the question is, is it worth making it runtime configurable with xmon command or debugfs tunables? They are options for people to turn xmon features on or off. Maybe people needn't this. However I am not a fan of debugfs this time as I am used to using xmon cmds. :) Hi, Guilherme So in the end, my thought is that: 1) cmd x|X will exit xmon and keep xmon in the original state(indicated by var xmon_off). 2) Then add options to turn some features on/off. And debugfs maybe not fit for this. But I am also wondering at same time, are people needing this? thanks xinhui Thanks, Nick
Re: [PATCH 2/2] powerpc/mm/autonuma: Switch ppc64 to its own implementeation of saved write
On Thu, 2017-02-09 at 08:30 +0530, Aneesh Kumar K.V wrote: > With this our protnone becomes a present pte with READ/WRITE/EXEC bit cleared. > By default we also set _PAGE_PRIVILEGED on such pte. This is now used to help > us identify a protnone pte that as saved write bit. For such pte, we will > clear > the _PAGE_PRIVILEGED bit. The pte still remain non-accessible from both user > and kernel. > > Signed-off-by: Aneesh Kumar K.VFWIW I've tested this, so: Acked-By: Michael Neuling > --- > arch/powerpc/include/asm/book3s/64/mmu-hash.h | 3 +++ > arch/powerpc/include/asm/book3s/64/pgtable.h | 32 +- > - > 2 files changed, 33 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h > b/arch/powerpc/include/asm/book3s/64/mmu-hash.h > index 0735d5a8049f..8720a406bbbe 100644 > --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h > +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h > @@ -16,6 +16,9 @@ > #include > #include > > +#ifndef __ASSEMBLY__ > +#include > +#endif > /* > * This is necessary to get the definition of PGTABLE_RANGE which we > * need for various slices related matters. Note that this isn't the > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > b/arch/powerpc/include/asm/book3s/64/pgtable.h > index e91ada786d48..efff910a84b1 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > @@ -443,8 +443,8 @@ static inline pte_t pte_clear_soft_dirty(pte_t pte) > */ > static inline int pte_protnone(pte_t pte) > { > - return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED)) > == > - cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED); > + return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX)) == > + cpu_to_be64(_PAGE_PRESENT); > } > #endif /* CONFIG_NUMA_BALANCING */ > > @@ -514,6 +514,32 @@ static inline pte_t pte_mkhuge(pte_t pte) > return pte; > } > > +#define pte_mk_savedwrite pte_mk_savedwrite > +static inline pte_t pte_mk_savedwrite(pte_t pte) > +{ > + /* > + * Used by Autonuma subsystem to preserve the write bit > + * while marking the pte PROT_NONE. Only allow this > + * on PROT_NONE pte > + */ > + VM_BUG_ON((pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX | > _PAGE_PRIVILEGED)) != > + cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED)); > + return __pte(pte_val(pte) & ~_PAGE_PRIVILEGED); > +} > + > +#define pte_savedwrite pte_savedwrite > +static inline bool pte_savedwrite(pte_t pte) > +{ > + /* > + * Saved write ptes are prot none ptes that doesn't have > + * privileged bit sit. We mark prot none as one which has > + * present and pviliged bit set and RWX cleared. To mark > + * protnone which used to have _PAGE_WRITE set we clear > + * the privileged bit. > + */ > + return !(pte_raw(pte) & cpu_to_be64(_PAGE_RWX | _PAGE_PRIVILEGED)); > +} > + > static inline pte_t pte_mkdevmap(pte_t pte) > { > return __pte(pte_val(pte) | _PAGE_SPECIAL|_PAGE_DEVMAP); > @@ -885,6 +911,7 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd) > #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) > #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) > #define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) > +#define pmd_mk_savedwrite(pmd) pte_pmd(pte_mk_savedwrite(pmd_pte(pmd)) > ) > > #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY > #define pmd_soft_dirty(pmd)pte_soft_dirty(pmd_pte(pmd)) > @@ -901,6 +928,7 @@ static inline int pmd_protnone(pmd_t pmd) > > #define __HAVE_ARCH_PMD_WRITE > #define pmd_write(pmd) pte_write(pmd_pte(pmd)) > +#define pmd_savedwrite(pmd) pte_savedwrite(pmd_pte(pmd)) > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
Re: [PATCH 1/2] mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte.
On Thu, 2017-02-09 at 08:30 +0530, Aneesh Kumar K.V wrote: > Autonuma preserves the write permission across numa fault to avoid taking > a writefault after a numa fault (Commit: b191f9b106ea " mm: numa: preserve PTE > write permissions across a NUMA hinting fault"). Architecture can implement > protnone in different ways and some may choose to implement that by clearing > Read/ > Write/Exec bit of pte. Setting the write bit on such pte can result in wrong > behaviour. Fix this up by allowing arch to override how to save the write bit > on a protnone pte. > > Signed-off-by: Aneesh Kumar K.VFWIW this is pretty simple and helps with us in powerpc... Acked-By: Michael Neuling > --- > include/asm-generic/pgtable.h | 16 > mm/huge_memory.c | 4 ++-- > mm/memory.c | 2 +- > mm/mprotect.c | 4 ++-- > 4 files changed, 21 insertions(+), 5 deletions(-) > > diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h > index 18af2bcefe6a..b6f3a8a4b738 100644 > --- a/include/asm-generic/pgtable.h > +++ b/include/asm-generic/pgtable.h > @@ -192,6 +192,22 @@ static inline void ptep_set_wrprotect(struct mm_struct > *mm, unsigned long addres > } > #endif > > +#ifndef pte_savedwrite > +#define pte_savedwrite pte_write > +#endif > + > +#ifndef pte_mk_savedwrite > +#define pte_mk_savedwrite pte_mkwrite > +#endif > + > +#ifndef pmd_savedwrite > +#define pmd_savedwrite pmd_write > +#endif > + > +#ifndef pmd_mk_savedwrite > +#define pmd_mk_savedwrite pmd_mkwrite > +#endif > + > #ifndef __HAVE_ARCH_PMDP_SET_WRPROTECT > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > static inline void pmdp_set_wrprotect(struct mm_struct *mm, > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 9a6bd6c8d55a..2f0f855ec911 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1300,7 +1300,7 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t > pmd) > goto out; > clear_pmdnuma: > BUG_ON(!PageLocked(page)); > - was_writable = pmd_write(pmd); > + was_writable = pmd_savedwrite(pmd); > pmd = pmd_modify(pmd, vma->vm_page_prot); > pmd = pmd_mkyoung(pmd); > if (was_writable) > @@ -1555,7 +1555,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t > *pmd, > entry = pmdp_huge_get_and_clear_notify(mm, addr, > pmd); > entry = pmd_modify(entry, newprot); > if (preserve_write) > - entry = pmd_mkwrite(entry); > + entry = pmd_mk_savedwrite(entry); > ret = HPAGE_PMD_NR; > set_pmd_at(mm, addr, pmd, entry); > BUG_ON(vma_is_anonymous(vma) && !preserve_write && > diff --git a/mm/memory.c b/mm/memory.c > index e78bf72f30dd..88c24f89d6d3 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3388,7 +3388,7 @@ static int do_numa_page(struct vm_fault *vmf) > int target_nid; > bool migrated = false; > pte_t pte; > - bool was_writable = pte_write(vmf->orig_pte); > + bool was_writable = pte_savedwrite(vmf->orig_pte); > int flags = 0; > > /* > diff --git a/mm/mprotect.c b/mm/mprotect.c > index f9c07f54dd62..15f5c174a7c1 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -113,13 +113,13 @@ static unsigned long change_pte_range(struct > vm_area_struct *vma, pmd_t *pmd, > ptent = ptep_modify_prot_start(mm, addr, pte); > ptent = pte_modify(ptent, newprot); > if (preserve_write) > - ptent = pte_mkwrite(ptent); > + ptent = pte_mk_savedwrite(ptent); > > /* Avoid taking write faults for known dirty pages */ > if (dirty_accountable && pte_dirty(ptent) && > (pte_soft_dirty(ptent) || > !(vma->vm_flags & VM_SOFTDIRTY))) { > - ptent = pte_mkwrite(ptent); > + ptent = pte_mk_savedwrite(ptent); > } > ptep_modify_prot_commit(mm, addr, pte, ptent); > pages++;
linux-next: manual merge of the kvm tree with the powerpc tree
Hi all, Today's linux-next merge of the kvm tree got a conflict in: arch/powerpc/platforms/pseries/lpar.c between commit: dbcf929c0062 ("powerpc/pseries: Add support for hash table resizing") from the powerpc tree and commit: cc3d2940133d ("powerpc/64: Enable use of radix MMU under hypervisor on POWER9") from the kvm tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc arch/powerpc/platforms/pseries/lpar.c index c2e13a51f369,0587655aea69.. --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@@ -611,112 -609,29 +611,135 @@@ static int __init disable_bulk_remove(c __setup("bulk_remove=", disable_bulk_remove); +#define HPT_RESIZE_TIMEOUT1 /* ms */ + +struct hpt_resize_state { + unsigned long shift; + int commit_rc; +}; + +static int pseries_lpar_resize_hpt_commit(void *data) +{ + struct hpt_resize_state *state = data; + + state->commit_rc = plpar_resize_hpt_commit(0, state->shift); + if (state->commit_rc != H_SUCCESS) + return -EIO; + + /* Hypervisor has transitioned the HTAB, update our globals */ + ppc64_pft_size = state->shift; + htab_size_bytes = 1UL << ppc64_pft_size; + htab_hash_mask = (htab_size_bytes >> 7) - 1; + + return 0; +} + +/* Must be called in user context */ +static int pseries_lpar_resize_hpt(unsigned long shift) +{ + struct hpt_resize_state state = { + .shift = shift, + .commit_rc = H_FUNCTION, + }; + unsigned int delay, total_delay = 0; + int rc; + ktime_t t0, t1, t2; + + might_sleep(); + + if (!firmware_has_feature(FW_FEATURE_HPT_RESIZE)) + return -ENODEV; + + printk(KERN_INFO "lpar: Attempting to resize HPT to shift %lu\n", + shift); + + t0 = ktime_get(); + + rc = plpar_resize_hpt_prepare(0, shift); + while (H_IS_LONG_BUSY(rc)) { + delay = get_longbusy_msecs(rc); + total_delay += delay; + if (total_delay > HPT_RESIZE_TIMEOUT) { + /* prepare with shift==0 cancels an in-progress resize */ + rc = plpar_resize_hpt_prepare(0, 0); + if (rc != H_SUCCESS) + printk(KERN_WARNING + "lpar: Unexpected error %d cancelling timed out HPT resize\n", + rc); + return -ETIMEDOUT; + } + msleep(delay); + rc = plpar_resize_hpt_prepare(0, shift); + }; + + switch (rc) { + case H_SUCCESS: + /* Continue on */ + break; + + case H_PARAMETER: + return -EINVAL; + case H_RESOURCE: + return -EPERM; + default: + printk(KERN_WARNING + "lpar: Unexpected error %d from H_RESIZE_HPT_PREPARE\n", + rc); + return -EIO; + } + + t1 = ktime_get(); + + rc = stop_machine(pseries_lpar_resize_hpt_commit, , NULL); + + t2 = ktime_get(); + + if (rc != 0) { + switch (state.commit_rc) { + case H_PTEG_FULL: + printk(KERN_WARNING + "lpar: Hash collision while resizing HPT\n"); + return -ENOSPC; + + default: + printk(KERN_WARNING + "lpar: Unexpected error %d from H_RESIZE_HPT_COMMIT\n", + state.commit_rc); + return -EIO; + }; + } + + printk(KERN_INFO + "lpar: HPT resize to shift %lu complete (%lld ms / %lld ms)\n", + shift, (long long) ktime_ms_delta(t1, t0), + (long long) ktime_ms_delta(t2, t1)); + + return 0; +} + + /* Actually only used for radix, so far */ + static int pseries_lpar_register_process_table(unsigned long base, + unsigned long page_size, unsigned long table_size) + { + long rc; + unsigned long flags = PROC_TABLE_NEW; + + if (radix_enabled()) + flags |= PROC_TABLE_RADIX | PROC_TABLE_GTSE; + for (;;) { + rc = plpar_hcall_norets(H_REGISTER_PROC_TBL, flags, base, + page_size, table_size); + if (!H_IS_LONG_BUSY(rc)) + break; + mdelay(get_longbusy_msecs(rc)); + } +
linux-next: manual merge of the kvm tree with the powerpc tree
Hi all, Today's linux-next merge of the kvm tree got a conflict in: arch/powerpc/include/asm/prom.h between commit: 0de0fb09bbce ("powerpc/pseries: Advertise HPT resizing support via CAS") from the powerpc tree and commit: 3f4ab2f83b4e ("powerpc/pseries: Fixes for the "ibm,architecture-vec-5" options") from the kvm tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc arch/powerpc/include/asm/prom.h index 00fcfcbdd053,8af2546ea593.. --- a/arch/powerpc/include/asm/prom.h +++ b/arch/powerpc/include/asm/prom.h @@@ -151,11 -153,17 +153,18 @@@ struct of_drconf_cell #define OV5_XCMO 0x0440 /* Page Coalescing */ #define OV5_TYPE1_AFFINITY0x0580 /* Type 1 NUMA affinity */ #define OV5_PRRN 0x0540 /* Platform Resource Reassignment */ +#define OV5_RESIZE_HPT0x0601 /* Hash Page Table resizing */ - #define OV5_PFO_HW_RNG0x0E80 /* PFO Random Number Generator */ - #define OV5_PFO_HW_8420x0E40 /* PFO Compression Accelerator */ - #define OV5_PFO_HW_ENCR 0x0E20 /* PFO Encryption Accelerator */ - #define OV5_SUB_PROCESSORS0x0F01 /* 1,2,or 4 Sub-Processors supported */ + #define OV5_PFO_HW_RNG0x1180 /* PFO Random Number Generator */ + #define OV5_PFO_HW_8420x1140 /* PFO Compression Accelerator */ + #define OV5_PFO_HW_ENCR 0x1120 /* PFO Encryption Accelerator */ + #define OV5_SUB_PROCESSORS0x1501 /* 1,2,or 4 Sub-Processors supported */ + #define OV5_XIVE_EXPLOIT 0x1701 /* XIVE exploitation supported */ + #define OV5_MMU_RADIX_300 0x1880 /* ISA v3.00 radix MMU supported */ + #define OV5_MMU_HASH_300 0x1840 /* ISA v3.00 hash MMU supported */ + #define OV5_MMU_SEGM_RADIX0x1820 /* radix mode (no segmentation) */ + #define OV5_MMU_PROC_TBL 0x1810 /* hcall selects SLB or proc table */ + #define OV5_MMU_SLB 0x1800 /* always use SLB */ + #define OV5_MMU_GTSE 0x1808 /* Guest translation shootdown */ /* Option Vector 6: IBM PAPR hints */ #define OV6_LINUX 0x02/* Linux is our OS */
Re: [PATCH 1/2] net: fs_enet: Fix an error handling path
From: Christophe JAILLETDate: Fri, 10 Feb 2017 21:17:06 +0100 > 'of_node_put(fpi->phy_node)' should also be called if we branch to > 'out_deregister_fixed_link' error handling path. > > Signed-off-by: Christophe JAILLET Applied.
Re: [PATCH 2/2] net: fs_enet: Simplify code
From: Christophe JAILLETDate: Fri, 10 Feb 2017 21:17:19 +0100 > There is no need to use an intermediate variable to handle an error code > in this case. > > Signed-off-by: Christophe JAILLET Applied.
linux-next: manual merge of the kvm tree with the powerpc tree
Hi all, Today's linux-next merge of the kvm tree got a conflict in: arch/powerpc/include/asm/hvcall.h between commit: 64b40ffbc830 ("powerpc/pseries: Add hypercall wrappers for hash page table resizing") from the powerpc tree and commit: cc3d2940133d ("powerpc/64: Enable use of radix MMU under hypervisor on POWER9") from the kvm tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc arch/powerpc/include/asm/hvcall.h index 490c4b9f4e3a,54d11b3a6bf7.. --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@@ -276,8 -276,7 +276,9 @@@ #define H_GET_MPP_X 0x314 #define H_SET_MODE0x31C #define H_CLEAR_HPT 0x358 +#define H_RESIZE_HPT_PREPARE 0x36C +#define H_RESIZE_HPT_COMMIT 0x370 + #define H_REGISTER_PROC_TBL 0x37C #define H_SIGNAL_SYS_RESET0x380 #define MAX_HCALL_OPCODE H_SIGNAL_SYS_RESET
Re: [RFC][PATCH] powerpc/64s: optimise syscall entry with relon hypercalls
On Mon, 13 Feb 2017 11:04:06 + David Laightwrote: > From: Nicholas Piggin > > Sent: 10 February 2017 18:23 > > After bc3551257a ("powerpc/64: Allow for relocation-on interrupts from > > guest to host"), a getppid() system call goes from 307 cycles to 358 > > cycles (+17%). This is due significantly to the scratch SPR used by the > > hypercall. > > > > It turns out there are a some volatile registers common to both system > > call and hypercall (in particular, r12, cr0, ctr), which can be used to > > avoid the SPR and some other overheads for the system call case. This > > brings getppid to 320 cycles (+4%). > ... > > + * syscall register convention is in > > Documentation/powerpc/syscall64-abi.txt > > + * > > + * For hypercalls, the register convention is as follows: > > + * r0 volatile > > + * r1-2 nonvolatile > > + * r3 volatile parameter and return value for status > > + * r4-r10 volatile input and output value > > + * r11 volatile hypercall number and output value > > + * r12 volatile > > + * r13-r31 nonvolatile > > + * LR nonvolatile > > + * CTR volatile > > + * XER volatile > > + * CR0-1 CR5-7 volatile > > + * CR2-4 nonvolatile > > + * Other registers nonvolatile > > + * > > + * The intersection of volatile registers that don't contain possible > > + * inputs is: r12, cr0, xer, ctr. We may use these as scratch regs > > + * upon entry without saving. > > Except that they must surely be set to some known value on exit in order > to avoid leaking information to the guest. True. I don't see why that's a problem for the entry code though. Thanks, Nick
Re: [PATCH] powerpc/xmon: add debugfs entry for xmon
On Mon, 13 Feb 2017 19:00:42 -0200 "Guilherme G. Piccoli"wrote: > Currently the xmon debugger is set only via kernel boot command-line. > It's disabled by default, and can be enabled with "xmon=on" on the > command-line. Also, xmon may be accessed via sysrq mechanism, but once > we enter xmon via sysrq, it's kept enabled until system is rebooted, > even if we exit the debugger. A kernel crash will then lead to xmon > instance, instead of triggering a kdump procedure (if configured), for > example. > > This patch introduces a debugfs entry for xmon, allowing user to query > its current state and change it if desired. Basically, the "xmon" file > to read from/write to is under the debugfs mount point, on powerpc > directory. Reading this file will provide the current state of the > debugger, one of the following: "on", "off", "early" or "nobt". Writing > one of these states to the file will take immediate effect on the debugger. > > Signed-off-by: Guilherme G. Piccoli > --- > * I had this patch partially done for some time, and after a discussion > at the kernel slack channel latest week, I decided to rebase and fix > some remaining bugs. I'd change 'x' option to always disable the debugger, > since with this patch we can always re-enable xmon, but today I noticed > Pan's patch on the mailing list, so perhaps his approach of adding a flag > to 'x' option is preferable. I can change this in a V2, if requested. > Thanks in advance! xmon state changing after the first sysrq+x violates principle of least astonishment, so I think that should be fixed. Then the question is, is it worth making it runtime configurable with xmon command or debugfs tunables? Thanks, Nick
Re: [PATCH v2] powerpc: Blacklist GCC 5.4 6.1 and 6.2
On Tue, Feb 14, 2017 at 11:25:43AM +1100, Cyril Bur wrote: > > > At the time of writing GCC 5.4 is the most recent and is affected. GCC > > > 6.3 contains the backported fix, has been tested and appears safe to > > > use. > > > > 6.3 is (of course) the newer release; 5.4 is a maintenance release of > > a compiler that is a year older. > > Yes. I think the point I was trying to make is that since they > backported the fix to 5.x and 6.x then I expect that 5.5 will have the > fix but since it doesn't exist yet, I can't be sure. I'll add something > to that effect. The patch has been backported to the GCC 5 branch; it will be part of any future GCC 5 release (5.5 and later, if any later will happen; 5.5 will). Don't be so unsure about these things, we aren't *that* incompetent ;-) > > Please mention the GCC PR # somewhere in the code, too? > > Sure. Thanks! Segher
[PATCH] powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=n
If we enable RADIX but disable HUGETLBFS, the build breaks with: arch/powerpc/mm/pgtable-radix.c:557:7: error: implicit declaration of function 'pmd_huge' arch/powerpc/mm/pgtable-radix.c:588:7: error: implicit declaration of function 'pud_huge' Fix it by stubbing those functions when HUGETLBFS=n. Fixes: 4b5d62ca17a1 ("powerpc/mm: add radix__remove_section_mapping()") Signed-off-by: Michael Ellerman--- arch/powerpc/include/asm/book3s/64/pgtable-4k.h | 5 + arch/powerpc/include/asm/book3s/64/pgtable-64k.h | 3 +++ 2 files changed, 8 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h index 9db83b4e017d..8708a0239a56 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h @@ -47,7 +47,12 @@ static inline int hugepd_ok(hugepd_t hpd) return hash__hugepd_ok(hpd); } #define is_hugepd(hpd) (hugepd_ok(hpd)) + +#else /* !CONFIG_HUGETLB_PAGE */ +static inline int pmd_huge(pmd_t pmd) { return 0; } +static inline int pud_huge(pud_t pud) { return 0; } #endif /* CONFIG_HUGETLB_PAGE */ + #endif /* __ASSEMBLY__ */ #endif /*_ASM_POWERPC_BOOK3S_64_PGTABLE_4K_H */ diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h index 198aff33c380..2ce4209399ed 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h @@ -46,6 +46,9 @@ static inline int hugepd_ok(hugepd_t hpd) } #define is_hugepd(pdep)0 +#else /* !CONFIG_HUGETLB_PAGE */ +static inline int pmd_huge(pmd_t pmd) { return 0; } +static inline int pud_huge(pud_t pud) { return 0; } #endif /* CONFIG_HUGETLB_PAGE */ static inline int remap_4k_pfn(struct vm_area_struct *vma, unsigned long addr, -- 2.7.4
Re: [PATCH 1/5] selftests: Fix selftests build to just build, not run tests
Michael Ellermanwrites: > In commit 88baa78d1f31 ("selftests: remove duplicated all and clean > target"), the "all" target was removed from individual Makefiles and > added to lib.mk. > > However the "all" target was added to lib.mk *after* the existing > "runtests" target. This means "runtests" becomes the first (default) > target for most of our Makefiles. ... > > Fix it by moving the "all" target to the start of lib.mk, making it the > default target. > > Fixes: 88baa78d1f31 ("selftests: remove duplicated all and clean target") > Signed-off-by: Michael Ellerman Hi Shuah, Can you please merge this series into linux-next? The selftests are badly broken otherwise. cheers
[PATCH v5 15/15] livepatch: allow removal of a disabled patch
From: Miroslav BenesCurrently we do not allow patch module to unload since there is no method to determine if a task is still running in the patched code. The consistency model gives us the way because when the unpatching finishes we know that all tasks were marked as safe to call an original function. Thus every new call to the function calls the original code and at the same time no task can be somewhere in the patched code, because it had to leave that code to be marked as safe. We can safely let the patch module go after that. Completion is used for synchronization between module removal and sysfs infrastructure in a similar way to commit 942e443127e9 ("module: Fix mod->mkobj.kobj potentially freed too early"). Note that we still do not allow the removal for immediate model, that is no consistency model. The module refcount may increase in this case if somebody disables and enables the patch several times. This should not cause any harm. With this change a call to try_module_get() is moved to __klp_enable_patch from klp_register_patch to make module reference counting symmetric (module_put() is in a patch disable path) and to allow to take a new reference to a disabled module when being enabled. Finally, we need to be very careful about possible races between klp_unregister_patch(), kobject_put() functions and operations on the related sysfs files. kobject_put(>kobj) must be called without klp_mutex. Otherwise, it might be blocked by enabled_store() that needs the mutex as well. In addition, enabled_store() must check if the patch was not unregisted in the meantime. There is no need to do the same for other kobject_put() callsites at the moment. Their sysfs operations neither take the lock nor they access any data that might be freed in the meantime. There was an attempt to use kobjects the right way and prevent these races by design. But it made the patch definition more complicated and opened another can of worms. See https://lkml.kernel.org/r/1464018848-4303-1-git-send-email-pmla...@suse.com [Thanks to Petr Mladek for improving the commit message.] Signed-off-by: Miroslav Benes Signed-off-by: Josh Poimboeuf Reviewed-by: Petr Mladek --- Documentation/livepatch/livepatch.txt | 28 include/linux/livepatch.h | 3 ++ kernel/livepatch/core.c | 80 ++- kernel/livepatch/transition.c | 12 +- samples/livepatch/livepatch-sample.c | 1 - 5 files changed, 72 insertions(+), 52 deletions(-) diff --git a/Documentation/livepatch/livepatch.txt b/Documentation/livepatch/livepatch.txt index 4f2aec8..ecdb181 100644 --- a/Documentation/livepatch/livepatch.txt +++ b/Documentation/livepatch/livepatch.txt @@ -316,8 +316,15 @@ section "Livepatch life-cycle" below for more details about these two operations. Module removal is only safe when there are no users of the underlying -functions. The immediate consistency model is not able to detect this; -therefore livepatch modules cannot be removed. See "Limitations" below. +functions. The immediate consistency model is not able to detect this. The +code just redirects the functions at the very beginning and it does not +check if the functions are in use. In other words, it knows when the +functions get called but it does not know when the functions return. +Therefore it cannot be decided when the livepatch module can be safely +removed. This is solved by a hybrid consistency model. When the system is +transitioned to a new patch state (patched/unpatched) it is guaranteed that +no task sleeps or runs in the old code. + 5. Livepatch life-cycle === @@ -469,23 +476,6 @@ The current Livepatch implementation has several limitations: by "notrace". - + Livepatch modules can not be removed. - -The current implementation just redirects the functions at the very -beginning. It does not check if the functions are in use. In other -words, it knows when the functions get called but it does not -know when the functions return. Therefore it can not decide when -the livepatch module can be safely removed. - -This will get most likely solved once a more complex consistency model -is supported. The idea is that a safe state for patching should also -mean a safe state for removing the patch. - -Note that the patch itself might get disabled by writing zero -to /sys/kernel/livepatch//enabled. It causes that the new -code will not longer get called. But it does not guarantee -that anyone is not sleeping anywhere in the new code. - + Livepatch works reliably only when the dynamic ftrace is located at the very beginning of the function. diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h index ed90ad1..194991e 100644 --- a/include/linux/livepatch.h +++ b/include/linux/livepatch.h @@ -23,6 +23,7 @@ #include
[PATCH v5 14/15] livepatch: add /proc//patch_state
Expose the per-task patch state value so users can determine which tasks are holding up completion of a patching operation. Signed-off-by: Josh PoimboeufReviewed-by: Petr Mladek Reviewed-by: Miroslav Benes --- Documentation/filesystems/proc.txt | 18 ++ fs/proc/base.c | 15 +++ 2 files changed, 33 insertions(+) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index c94b467..9036dbf 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -44,6 +44,7 @@ Table of Contents 3.8 /proc//fdinfo/ - Information about opened file 3.9 /proc//map_files - Information about memory mapped files 3.10 /proc//timerslack_ns - Task timerslack value + 3.11 /proc//patch_state - Livepatch patch operation state 4Configuring procfs 4.1 Mount options @@ -1887,6 +1888,23 @@ Valid values are from 0 - ULLONG_MAX An application setting the value must have PTRACE_MODE_ATTACH_FSCREDS level permissions on the task specified to change its timerslack_ns value. +3.11 /proc//patch_state - Livepatch patch operation state +- +When CONFIG_LIVEPATCH is enabled, this file displays the value of the +patch state for the task. + +A value of '-1' indicates that no patch is in transition. + +A value of '0' indicates that a patch is in transition and the task is +unpatched. If the patch is being enabled, then the task hasn't been +patched yet. If the patch is being disabled, then the task has already +been unpatched. + +A value of '1' indicates that a patch is in transition and the task is +patched. If the patch is being enabled, then the task has already been +patched. If the patch is being disabled, then the task hasn't been +unpatched yet. + -- Configuring procfs diff --git a/fs/proc/base.c b/fs/proc/base.c index 6e86558..5145f40 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2828,6 +2828,15 @@ static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns, return err; } +#ifdef CONFIG_LIVEPATCH +static int proc_pid_patch_state(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) +{ + seq_printf(m, "%d\n", task->patch_state); + return 0; +} +#endif /* CONFIG_LIVEPATCH */ + /* * Thread groups */ @@ -2927,6 +2936,9 @@ static const struct pid_entry tgid_base_stuff[] = { REG("timers", S_IRUGO, proc_timers_operations), #endif REG("timerslack_ns", S_IRUGO|S_IWUGO, proc_pid_set_timerslack_ns_operations), +#ifdef CONFIG_LIVEPATCH + ONE("patch_state", S_IRUSR, proc_pid_patch_state), +#endif }; static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx) @@ -3309,6 +3321,9 @@ static const struct pid_entry tid_base_stuff[] = { REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations), REG("setgroups", S_IRUGO|S_IWUSR, proc_setgroups_operations), #endif +#ifdef CONFIG_LIVEPATCH + ONE("patch_state", S_IRUSR, proc_pid_patch_state), +#endif }; static int proc_tid_base_readdir(struct file *file, struct dir_context *ctx) -- 2.7.4
[PATCH v5 13/15] livepatch: change to a per-task consistency model
Change livepatch to use a basic per-task consistency model. This is the foundation which will eventually enable us to patch those ~10% of security patches which change function or data semantics. This is the biggest remaining piece needed to make livepatch more generally useful. This code stems from the design proposal made by Vojtech [1] in November 2014. It's a hybrid of kGraft and kpatch: it uses kGraft's per-task consistency and syscall barrier switching combined with kpatch's stack trace switching. There are also a number of fallback options which make it quite flexible. Patches are applied on a per-task basis, when the task is deemed safe to switch over. When a patch is enabled, livepatch enters into a transition state where tasks are converging to the patched state. Usually this transition state can complete in a few seconds. The same sequence occurs when a patch is disabled, except the tasks converge from the patched state to the unpatched state. An interrupt handler inherits the patched state of the task it interrupts. The same is true for forked tasks: the child inherits the patched state of the parent. Livepatch uses several complementary approaches to determine when it's safe to patch tasks: 1. The first and most effective approach is stack checking of sleeping tasks. If no affected functions are on the stack of a given task, the task is patched. In most cases this will patch most or all of the tasks on the first try. Otherwise it'll keep trying periodically. This option is only available if the architecture has reliable stacks (HAVE_RELIABLE_STACKTRACE). 2. The second approach, if needed, is kernel exit switching. A task is switched when it returns to user space from a system call, a user space IRQ, or a signal. It's useful in the following cases: a) Patching I/O-bound user tasks which are sleeping on an affected function. In this case you have to send SIGSTOP and SIGCONT to force it to exit the kernel and be patched. b) Patching CPU-bound user tasks. If the task is highly CPU-bound then it will get patched the next time it gets interrupted by an IRQ. c) In the future it could be useful for applying patches for architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In this case you would have to signal most of the tasks on the system. However this isn't supported yet because there's currently no way to patch kthreads without HAVE_RELIABLE_STACKTRACE. 3. For idle "swapper" tasks, since they don't ever exit the kernel, they instead have a klp_update_patch_state() call in the idle loop which allows them to be patched before the CPU enters the idle state. (Note there's not yet such an approach for kthreads.) All the above approaches may be skipped by setting the 'immediate' flag in the 'klp_patch' struct, which will disable per-task consistency and patch all tasks immediately. This can be useful if the patch doesn't change any function or data semantics. Note that, even with this flag set, it's possible that some tasks may still be running with an old version of the function, until that function returns. There's also an 'immediate' flag in the 'klp_func' struct which allows you to specify that certain functions in the patch can be applied without per-task consistency. This might be useful if you want to patch a common function like schedule(), and the function change doesn't need consistency but the rest of the patch does. For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user must set patch->immediate which causes all tasks to be patched immediately. This option should be used with care, only when the patch doesn't change any function or data semantics. In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE may be allowed to use per-task consistency if we can come up with another way to patch kthreads. The /sys/kernel/livepatch//transition file shows whether a patch is in transition. Only a single patch (the topmost patch on the stack) can be in transition at a given time. A patch can remain in transition indefinitely, if any of the tasks are stuck in the initial patch state. A transition can be reversed and effectively canceled by writing the opposite value to the /sys/kernel/livepatch//enabled file while the transition is in progress. Then all the tasks will attempt to converge back to the original patch state. [1] https://lkml.kernel.org/r/20141107140458.ga21...@suse.cz Signed-off-by: Josh Poimboeuf--- Documentation/ABI/testing/sysfs-kernel-livepatch | 8 + Documentation/livepatch/livepatch.txt| 186 +++- include/linux/init_task.h| 9 + include/linux/livepatch.h| 42 +- include/linux/sched.h| 3 + kernel/fork.c| 3 + kernel/livepatch/Makefile|
[PATCH v5 12/15] livepatch: store function sizes
For the consistency model we'll need to know the sizes of the old and new functions to determine if they're on the stacks of any tasks. Signed-off-by: Josh PoimboeufAcked-by: Miroslav Benes Reviewed-by: Petr Mladek Reviewed-by: Kamalesh Babulal --- include/linux/livepatch.h | 3 +++ kernel/livepatch/core.c | 16 2 files changed, 19 insertions(+) diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h index 9787a63..6602b34 100644 --- a/include/linux/livepatch.h +++ b/include/linux/livepatch.h @@ -37,6 +37,8 @@ * @old_addr: the address of the function being patched * @kobj: kobject for sysfs resources * @stack_node:list node for klp_ops func_stack list + * @old_size: size of the old function + * @new_size: size of the new function * @patched: the func has been added to the klp_ops list */ struct klp_func { @@ -56,6 +58,7 @@ struct klp_func { unsigned long old_addr; struct kobject kobj; struct list_head stack_node; + unsigned long old_size, new_size; bool patched; }; diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index 83c4949..10ba3a1 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -584,6 +584,22 @@ static int klp_init_object_loaded(struct klp_patch *patch, >old_addr); if (ret) return ret; + + ret = kallsyms_lookup_size_offset(func->old_addr, + >old_size, NULL); + if (!ret) { + pr_err("kallsyms size lookup failed for '%s'\n", + func->old_name); + return -ENOENT; + } + + ret = kallsyms_lookup_size_offset((unsigned long)func->new_func, + >new_size, NULL); + if (!ret) { + pr_err("kallsyms size lookup failed for '%s' replacement\n", + func->old_name); + return -ENOENT; + } } return 0; -- 2.7.4
[PATCH v5 11/15] livepatch: use kstrtobool() in enabled_store()
The sysfs enabled value is a boolean, so kstrtobool() is a better fit for parsing the input string since it does the range checking for us. Suggested-by: Petr MladekSigned-off-by: Josh Poimboeuf Acked-by: Miroslav Benes Reviewed-by: Petr Mladek --- kernel/livepatch/core.c | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index 6a137e1..83c4949 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -408,26 +408,23 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr, { struct klp_patch *patch; int ret; - unsigned long val; + bool enabled; - ret = kstrtoul(buf, 10, ); + ret = kstrtobool(buf, ); if (ret) - return -EINVAL; - - if (val > 1) - return -EINVAL; + return ret; patch = container_of(kobj, struct klp_patch, kobj); mutex_lock(_mutex); - if (patch->enabled == val) { + if (patch->enabled == enabled) { /* already in requested state */ ret = -EINVAL; goto err; } - if (val) { + if (enabled) { ret = __klp_enable_patch(patch); if (ret) goto err; -- 2.7.4
[PATCH v5 10/15] livepatch: move patching functions into patch.c
Move functions related to the actual patching of functions and objects into a new patch.c file. Signed-off-by: Josh PoimboeufAcked-by: Miroslav Benes Reviewed-by: Petr Mladek Reviewed-by: Kamalesh Babulal --- kernel/livepatch/Makefile | 2 +- kernel/livepatch/core.c | 202 +-- kernel/livepatch/patch.c | 213 ++ kernel/livepatch/patch.h | 32 +++ 4 files changed, 247 insertions(+), 202 deletions(-) create mode 100644 kernel/livepatch/patch.c create mode 100644 kernel/livepatch/patch.h diff --git a/kernel/livepatch/Makefile b/kernel/livepatch/Makefile index e8780c0..e136dad 100644 --- a/kernel/livepatch/Makefile +++ b/kernel/livepatch/Makefile @@ -1,3 +1,3 @@ obj-$(CONFIG_LIVEPATCH) += livepatch.o -livepatch-objs := core.o +livepatch-objs := core.o patch.o diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index 47ed643..6a137e1 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -24,32 +24,13 @@ #include #include #include -#include #include #include #include #include #include #include - -/** - * struct klp_ops - structure for tracking registered ftrace ops structs - * - * A single ftrace_ops is shared between all enabled replacement functions - * (klp_func structs) which have the same old_addr. This allows the switch - * between function versions to happen instantaneously by updating the klp_ops - * struct's func_stack list. The winner is the klp_func at the top of the - * func_stack (front of the list). - * - * @node: node for the global klp_ops list - * @func_stack:list head for the stack of klp_func's (active func is on top) - * @fops: registered ftrace ops struct - */ -struct klp_ops { - struct list_head node; - struct list_head func_stack; - struct ftrace_ops fops; -}; +#include "patch.h" /* * The klp_mutex protects the global lists and state transitions of any @@ -60,28 +41,12 @@ struct klp_ops { static DEFINE_MUTEX(klp_mutex); static LIST_HEAD(klp_patches); -static LIST_HEAD(klp_ops); static struct kobject *klp_root_kobj; /* TODO: temporary stub */ void klp_update_patch_state(struct task_struct *task) {} -static struct klp_ops *klp_find_ops(unsigned long old_addr) -{ - struct klp_ops *ops; - struct klp_func *func; - - list_for_each_entry(ops, _ops, node) { - func = list_first_entry(>func_stack, struct klp_func, - stack_node); - if (func->old_addr == old_addr) - return ops; - } - - return NULL; -} - static bool klp_is_module(struct klp_object *obj) { return obj->name; @@ -314,171 +279,6 @@ static int klp_write_object_relocations(struct module *pmod, return ret; } -static void notrace klp_ftrace_handler(unsigned long ip, - unsigned long parent_ip, - struct ftrace_ops *fops, - struct pt_regs *regs) -{ - struct klp_ops *ops; - struct klp_func *func; - - ops = container_of(fops, struct klp_ops, fops); - - rcu_read_lock(); - func = list_first_or_null_rcu(>func_stack, struct klp_func, - stack_node); - if (WARN_ON_ONCE(!func)) - goto unlock; - - klp_arch_set_pc(regs, (unsigned long)func->new_func); -unlock: - rcu_read_unlock(); -} - -/* - * Convert a function address into the appropriate ftrace location. - * - * Usually this is just the address of the function, but on some architectures - * it's more complicated so allow them to provide a custom behaviour. - */ -#ifndef klp_get_ftrace_location -static unsigned long klp_get_ftrace_location(unsigned long faddr) -{ - return faddr; -} -#endif - -static void klp_unpatch_func(struct klp_func *func) -{ - struct klp_ops *ops; - - if (WARN_ON(!func->patched)) - return; - if (WARN_ON(!func->old_addr)) - return; - - ops = klp_find_ops(func->old_addr); - if (WARN_ON(!ops)) - return; - - if (list_is_singular(>func_stack)) { - unsigned long ftrace_loc; - - ftrace_loc = klp_get_ftrace_location(func->old_addr); - if (WARN_ON(!ftrace_loc)) - return; - - WARN_ON(unregister_ftrace_function(>fops)); - WARN_ON(ftrace_set_filter_ip(>fops, ftrace_loc, 1, 0)); - - list_del_rcu(>stack_node); - list_del(>node); - kfree(ops); - } else { - list_del_rcu(>stack_node); - } - - func->patched = false; -} - -static int klp_patch_func(struct klp_func *func) -{ - struct klp_ops *ops; - int ret; - -
[PATCH v5 09/15] livepatch: remove unnecessary object loaded check
klp_patch_object()'s callers already ensure that the object is loaded, so its call to klp_is_object_loaded() is unnecessary. This will also make it possible to move the patching code into a separate file. Signed-off-by: Josh PoimboeufAcked-by: Miroslav Benes Reviewed-by: Petr Mladek Reviewed-by: Kamalesh Babulal --- kernel/livepatch/core.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index 2dbd355..47ed643 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -467,9 +467,6 @@ static int klp_patch_object(struct klp_object *obj) if (WARN_ON(obj->patched)) return -EINVAL; - if (WARN_ON(!klp_is_object_loaded(obj))) - return -EINVAL; - klp_for_each_func(obj, func) { ret = klp_patch_func(func); if (ret) { -- 2.7.4
[PATCH v5 08/15] livepatch: separate enabled and patched states
Once we have a consistency model, patches and their objects will be enabled and disabled at different times. For example, when a patch is disabled, its loaded objects' funcs can remain registered with ftrace indefinitely until the unpatching operation is complete and they're no longer in use. It's less confusing if we give them different names: patches can be enabled or disabled; objects (and their funcs) can be patched or unpatched: - Enabled means that a patch is logically enabled (but not necessarily fully applied). - Patched means that an object's funcs are registered with ftrace and added to the klp_ops func stack. Also, since these states are binary, represent them with booleans instead of ints. Signed-off-by: Josh PoimboeufAcked-by: Miroslav Benes Reviewed-by: Petr Mladek Reviewed-by: Kamalesh Babulal --- include/linux/livepatch.h | 17 --- kernel/livepatch/core.c | 72 +++ 2 files changed, 42 insertions(+), 47 deletions(-) diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h index 5cc20e5..9787a63 100644 --- a/include/linux/livepatch.h +++ b/include/linux/livepatch.h @@ -28,11 +28,6 @@ #include -enum klp_state { - KLP_DISABLED, - KLP_ENABLED -}; - /** * struct klp_func - function structure for live patching * @old_name: name of the function to be patched @@ -41,8 +36,8 @@ enum klp_state { * can be found (optional) * @old_addr: the address of the function being patched * @kobj: kobject for sysfs resources - * @state: tracks function-level patch application state * @stack_node:list node for klp_ops func_stack list + * @patched: the func has been added to the klp_ops list */ struct klp_func { /* external */ @@ -60,8 +55,8 @@ struct klp_func { /* internal */ unsigned long old_addr; struct kobject kobj; - enum klp_state state; struct list_head stack_node; + bool patched; }; /** @@ -71,7 +66,7 @@ struct klp_func { * @kobj: kobject for sysfs resources * @mod: kernel module associated with the patched object * (NULL for vmlinux) - * @state: tracks object-level patch application state + * @patched: the object's funcs have been added to the klp_ops list */ struct klp_object { /* external */ @@ -81,7 +76,7 @@ struct klp_object { /* internal */ struct kobject kobj; struct module *mod; - enum klp_state state; + bool patched; }; /** @@ -90,7 +85,7 @@ struct klp_object { * @objs: object entries for kernel objects to be patched * @list: list node for global list of registered patches * @kobj: kobject for sysfs resources - * @state: tracks patch-level application state + * @enabled: the patch is enabled (but operation may be incomplete) */ struct klp_patch { /* external */ @@ -100,7 +95,7 @@ struct klp_patch { /* internal */ struct list_head list; struct kobject kobj; - enum klp_state state; + bool enabled; }; #define klp_for_each_object(patch, obj) \ diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index 217b39d..2dbd355 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -348,11 +348,11 @@ static unsigned long klp_get_ftrace_location(unsigned long faddr) } #endif -static void klp_disable_func(struct klp_func *func) +static void klp_unpatch_func(struct klp_func *func) { struct klp_ops *ops; - if (WARN_ON(func->state != KLP_ENABLED)) + if (WARN_ON(!func->patched)) return; if (WARN_ON(!func->old_addr)) return; @@ -378,10 +378,10 @@ static void klp_disable_func(struct klp_func *func) list_del_rcu(>stack_node); } - func->state = KLP_DISABLED; + func->patched = false; } -static int klp_enable_func(struct klp_func *func) +static int klp_patch_func(struct klp_func *func) { struct klp_ops *ops; int ret; @@ -389,7 +389,7 @@ static int klp_enable_func(struct klp_func *func) if (WARN_ON(!func->old_addr)) return -EINVAL; - if (WARN_ON(func->state != KLP_DISABLED)) + if (WARN_ON(func->patched)) return -EINVAL; ops = klp_find_ops(func->old_addr); @@ -437,7 +437,7 @@ static int klp_enable_func(struct klp_func *func) list_add_rcu(>stack_node, >func_stack); } - func->state = KLP_ENABLED; + func->patched = true; return 0; @@ -448,36 +448,36 @@ static int klp_enable_func(struct klp_func *func) return ret; } -static void klp_disable_object(struct klp_object *obj) +static void klp_unpatch_object(struct klp_object *obj) { struct klp_func *func; klp_for_each_func(obj, func) - if
[PATCH v5 07/15] livepatch/s390: add TIF_PATCH_PENDING thread flag
From: Miroslav BenesUpdate a task's patch state when returning from a system call or user space interrupt, or after handling a signal. This greatly increases the chances of a patch operation succeeding. If a task is I/O bound, it can be patched when returning from a system call. If a task is CPU bound, it can be patched when returning from an interrupt. If a task is sleeping on a to-be-patched function, the user can send SIGSTOP and SIGCONT to force it to switch. Since there are two ways the syscall can be restarted on return from a signal handling process, it is important to clear the flag before do_signal() is called. Otherwise we could miss the migration if we used SIGSTOP/SIGCONT procedure or fake signal to migrate patching blocking tasks. If we place our hook to sysc_work label in entry before TIF_SIGPENDING is evaluated we kill two birds with one stone. The task is correctly migrated in all return paths from a syscall. Signed-off-by: Miroslav Benes Signed-off-by: Josh Poimboeuf --- arch/s390/include/asm/thread_info.h | 2 ++ arch/s390/kernel/entry.S| 31 ++- 2 files changed, 32 insertions(+), 1 deletion(-) diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h index 4977668..646845e 100644 --- a/arch/s390/include/asm/thread_info.h +++ b/arch/s390/include/asm/thread_info.h @@ -56,6 +56,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); #define TIF_SIGPENDING 1 /* signal pending */ #define TIF_NEED_RESCHED 2 /* rescheduling necessary */ #define TIF_UPROBE 3 /* breakpointed or single-stepping */ +#define TIF_PATCH_PENDING 4 /* pending live patching update */ #define TIF_31BIT 16 /* 32bit process */ #define TIF_MEMDIE 17 /* is terminating due to OOM killer */ @@ -74,6 +75,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); #define _TIF_SIGPENDING_BITUL(TIF_SIGPENDING) #define _TIF_NEED_RESCHED _BITUL(TIF_NEED_RESCHED) #define _TIF_UPROBE_BITUL(TIF_UPROBE) +#define _TIF_PATCH_PENDING _BITUL(TIF_PATCH_PENDING) #define _TIF_31BIT _BITUL(TIF_31BIT) #define _TIF_SINGLE_STEP _BITUL(TIF_SINGLE_STEP) diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S index 34ab7e8..9a15eac 100644 --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -47,7 +47,7 @@ STACK_SIZE = 1 << STACK_SHIFT STACK_INIT = STACK_SIZE - STACK_FRAME_OVERHEAD - __PT_SIZE _TIF_WORK = (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED | \ - _TIF_UPROBE) + _TIF_UPROBE | _TIF_PATCH_PENDING) _TIF_TRACE = (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SECCOMP | \ _TIF_SYSCALL_TRACEPOINT) _CIF_WORK = (_CIF_MCCK_PENDING | _CIF_ASCE | _CIF_FPU) @@ -333,6 +333,11 @@ ENTRY(system_call) #endif TSTMSK __PT_FLAGS(%r11),_PIF_PER_TRAP jo .Lsysc_singlestep +#ifdef CONFIG_LIVEPATCH + TSTMSK __TI_flags(%r12),_TIF_PATCH_PENDING + jo .Lsysc_patch_pending# handle live patching just before + # signals and possible syscall restart +#endif TSTMSK __TI_flags(%r12),_TIF_SIGPENDING jo .Lsysc_sigpending TSTMSK __TI_flags(%r12),_TIF_NOTIFY_RESUME @@ -405,6 +410,16 @@ ENTRY(system_call) #endif # +# _TIF_PATCH_PENDING is set, call klp_update_patch_state +# +#ifdef CONFIG_LIVEPATCH +.Lsysc_patch_pending: + lg %r2,__LC_CURRENT# pass pointer to task struct + larl%r14,.Lsysc_return + jg klp_update_patch_state +#endif + +# # _PIF_PER_TRAP is set, call do_per_trap # .Lsysc_singlestep: @@ -654,6 +669,10 @@ ENTRY(io_int_handler) jo .Lio_mcck_pending TSTMSK __TI_flags(%r12),_TIF_NEED_RESCHED jo .Lio_reschedule +#ifdef CONFIG_LIVEPATCH + TSTMSK __TI_flags(%r12),_TIF_PATCH_PENDING + jo .Lio_patch_pending +#endif TSTMSK __TI_flags(%r12),_TIF_SIGPENDING jo .Lio_sigpending TSTMSK __TI_flags(%r12),_TIF_NOTIFY_RESUME @@ -700,6 +719,16 @@ ENTRY(io_int_handler) j .Lio_return # +# _TIF_PATCH_PENDING is set, call klp_update_patch_state +# +#ifdef CONFIG_LIVEPATCH +.Lio_patch_pending: + lg %r2,__LC_CURRENT# pass pointer to task struct + larl%r14,.Lio_return + jg klp_update_patch_state +#endif + +# # _TIF_SIGPENDING or is set, call do_signal # .Lio_sigpending: -- 2.7.4
[PATCH v5 06/15] livepatch/s390: reorganize TIF thread flag bits
From: Jiri SlabyGroup the TIF thread flag bits by their inclusion in the _TIF_WORK and _TIF_TRACE macros. Signed-off-by: Jiri Slaby Signed-off-by: Josh Poimboeuf Reviewed-by: Miroslav Benes --- arch/s390/include/asm/thread_info.h | 22 ++ 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h index a5b54a4..4977668 100644 --- a/arch/s390/include/asm/thread_info.h +++ b/arch/s390/include/asm/thread_info.h @@ -51,14 +51,12 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); /* * thread information flags bit numbers */ +/* _TIF_WORK bits */ #define TIF_NOTIFY_RESUME 0 /* callback before returning to user */ #define TIF_SIGPENDING 1 /* signal pending */ #define TIF_NEED_RESCHED 2 /* rescheduling necessary */ -#define TIF_SYSCALL_TRACE 3 /* syscall trace active */ -#define TIF_SYSCALL_AUDIT 4 /* syscall auditing active */ -#define TIF_SECCOMP5 /* secure computing */ -#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint instrumentation */ -#define TIF_UPROBE 7 /* breakpointed or single-stepping */ +#define TIF_UPROBE 3 /* breakpointed or single-stepping */ + #define TIF_31BIT 16 /* 32bit process */ #define TIF_MEMDIE 17 /* is terminating due to OOM killer */ #define TIF_RESTORE_SIGMASK18 /* restore signal mask in do_signal() */ @@ -66,15 +64,23 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); #define TIF_BLOCK_STEP 20 /* This task is block stepped */ #define TIF_UPROBE_SINGLESTEP 21 /* This task is uprobe single stepped */ +/* _TIF_TRACE bits */ +#define TIF_SYSCALL_TRACE 24 /* syscall trace active */ +#define TIF_SYSCALL_AUDIT 25 /* syscall auditing active */ +#define TIF_SECCOMP26 /* secure computing */ +#define TIF_SYSCALL_TRACEPOINT 27 /* syscall tracepoint instrumentation */ + #define _TIF_NOTIFY_RESUME _BITUL(TIF_NOTIFY_RESUME) #define _TIF_SIGPENDING_BITUL(TIF_SIGPENDING) #define _TIF_NEED_RESCHED _BITUL(TIF_NEED_RESCHED) +#define _TIF_UPROBE_BITUL(TIF_UPROBE) + +#define _TIF_31BIT _BITUL(TIF_31BIT) +#define _TIF_SINGLE_STEP _BITUL(TIF_SINGLE_STEP) + #define _TIF_SYSCALL_TRACE _BITUL(TIF_SYSCALL_TRACE) #define _TIF_SYSCALL_AUDIT _BITUL(TIF_SYSCALL_AUDIT) #define _TIF_SECCOMP _BITUL(TIF_SECCOMP) #define _TIF_SYSCALL_TRACEPOINT_BITUL(TIF_SYSCALL_TRACEPOINT) -#define _TIF_UPROBE_BITUL(TIF_UPROBE) -#define _TIF_31BIT _BITUL(TIF_31BIT) -#define _TIF_SINGLE_STEP _BITUL(TIF_SINGLE_STEP) #endif /* _ASM_THREAD_INFO_H */ -- 2.7.4
[PATCH v5 05/15] livepatch/powerpc: add TIF_PATCH_PENDING thread flag
Add the TIF_PATCH_PENDING thread flag to enable the new livepatch per-task consistency model for powerpc. The bit getting set indicates the thread has a pending patch which needs to be applied when the thread exits the kernel. The bit is included in the _TIF_USER_WORK_MASK macro so that do_notify_resume() and klp_update_patch_state() get called when the bit is set. Signed-off-by: Josh PoimboeufReviewed-by: Petr Mladek Reviewed-by: Miroslav Benes Reviewed-by: Kamalesh Babulal --- arch/powerpc/include/asm/thread_info.h | 4 +++- arch/powerpc/kernel/signal.c | 4 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 87e4b2d..6fc6464 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -92,6 +92,7 @@ static inline struct thread_info *current_thread_info(void) TIF_NEED_RESCHED */ #define TIF_32BIT 4 /* 32 bit binary */ #define TIF_RESTORE_TM 5 /* need to restore TM FP/VEC/VSX */ +#define TIF_PATCH_PENDING 6 /* pending live patching update */ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SINGLESTEP 8 /* singlestepping active */ #define TIF_NOHZ 9 /* in adaptive nohz mode */ @@ -115,6 +116,7 @@ static inline struct thread_info *current_thread_info(void) #define _TIF_POLLING_NRFLAG(1<
[PATCH v5 04/15] livepatch/x86: add TIF_PATCH_PENDING thread flag
Add the TIF_PATCH_PENDING thread flag to enable the new livepatch per-task consistency model for x86_64. The bit getting set indicates the thread has a pending patch which needs to be applied when the thread exits the kernel. The bit is placed in the _TIF_ALLWORK_MASK macro, which results in exit_to_usermode_loop() calling klp_update_patch_state() when it's set. Signed-off-by: Josh PoimboeufAcked-by: Andy Lutomirski Reviewed-by: Petr Mladek Reviewed-by: Miroslav Benes Reviewed-by: Kamalesh Babulal --- arch/x86/entry/common.c| 9 ++--- arch/x86/include/asm/thread_info.h | 4 +++- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index b83c61c..6a9d564 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include @@ -129,14 +130,13 @@ static long syscall_trace_enter(struct pt_regs *regs) #define EXIT_TO_USERMODE_LOOP_FLAGS\ (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ -_TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY) +_TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY | _TIF_PATCH_PENDING) static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags) { /* * In order to return to user mode, we need to have IRQs off with -* none of _TIF_SIGPENDING, _TIF_NOTIFY_RESUME, _TIF_USER_RETURN_NOTIFY, -* _TIF_UPROBE, or _TIF_NEED_RESCHED set. Several of these flags +* none of EXIT_TO_USERMODE_LOOP_FLAGS set. Several of these flags * can be set at any time on preemptable kernels if we have IRQs on, * so we need to loop. Disabling preemption wouldn't help: doing the * work to clear some of the flags can sleep. @@ -163,6 +163,9 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags) if (cached_flags & _TIF_USER_RETURN_NOTIFY) fire_user_return_notifiers(); + if (cached_flags & _TIF_PATCH_PENDING) + klp_update_patch_state(current); + /* Disable IRQs and retry */ local_irq_disable(); diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index 207d0d9..83372dc 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -84,6 +84,7 @@ struct thread_info { #define TIF_SECCOMP8 /* secure computing */ #define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */ #define TIF_UPROBE 12 /* breakpointed or singlestepping */ +#define TIF_PATCH_PENDING 13 /* pending live patching update */ #define TIF_NOTSC 16 /* TSC is not accessible in userland */ #define TIF_IA32 17 /* IA32 compatibility process */ #define TIF_NOHZ 19 /* in adaptive nohz mode */ @@ -107,6 +108,7 @@ struct thread_info { #define _TIF_SECCOMP (1 << TIF_SECCOMP) #define _TIF_USER_RETURN_NOTIFY(1 << TIF_USER_RETURN_NOTIFY) #define _TIF_UPROBE(1 << TIF_UPROBE) +#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING) #define _TIF_NOTSC (1 << TIF_NOTSC) #define _TIF_IA32 (1 << TIF_IA32) #define _TIF_NOHZ (1 << TIF_NOHZ) @@ -133,7 +135,7 @@ struct thread_info { (_TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME | _TIF_SIGPENDING |\ _TIF_NEED_RESCHED | _TIF_SINGLESTEP | _TIF_SYSCALL_EMU | \ _TIF_SYSCALL_AUDIT | _TIF_USER_RETURN_NOTIFY | _TIF_UPROBE | \ -_TIF_NOHZ | _TIF_SYSCALL_TRACEPOINT) +_TIF_PATCH_PENDING | _TIF_NOHZ | _TIF_SYSCALL_TRACEPOINT) /* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW \ -- 2.7.4
[PATCH v5 03/15] livepatch: create temporary klp_update_patch_state() stub
Create temporary stubs for klp_update_patch_state() so we can add TIF_PATCH_PENDING to different architectures in separate patches without breaking build bisectability. Signed-off-by: Josh PoimboeufReviewed-by: Petr Mladek --- include/linux/livepatch.h | 5 - kernel/livepatch/core.c | 3 +++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h index 9072f04..5cc20e5 100644 --- a/include/linux/livepatch.h +++ b/include/linux/livepatch.h @@ -123,10 +123,13 @@ void arch_klp_init_object_loaded(struct klp_patch *patch, int klp_module_coming(struct module *mod); void klp_module_going(struct module *mod); +void klp_update_patch_state(struct task_struct *task); + #else /* !CONFIG_LIVEPATCH */ static inline int klp_module_coming(struct module *mod) { return 0; } -static inline void klp_module_going(struct module *mod) { } +static inline void klp_module_going(struct module *mod) {} +static inline void klp_update_patch_state(struct task_struct *task) {} #endif /* CONFIG_LIVEPATCH */ diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index af46438..217b39d 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -64,6 +64,9 @@ static LIST_HEAD(klp_ops); static struct kobject *klp_root_kobj; +/* TODO: temporary stub */ +void klp_update_patch_state(struct task_struct *task) {} + static struct klp_ops *klp_find_ops(unsigned long old_addr) { struct klp_ops *ops; -- 2.7.4
[PATCH v5 02/15] x86/entry: define _TIF_ALLWORK_MASK flags explicitly
The _TIF_ALLWORK_MASK macro automatically includes the least-significant 16 bits of the thread_info flags, which is less than obvious and tends to create confusion and surprises when reading or modifying the code. Define the flags explicitly. Signed-off-by: Josh PoimboeufReviewed-by: Petr Mladek Reviewed-by: Miroslav Benes Reviewed-by: Kamalesh Babulal --- arch/x86/include/asm/thread_info.h | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index ad6f5eb0..207d0d9 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -73,9 +73,6 @@ struct thread_info { * thread information flags * - these are process state flags that various assembly files * may need to access - * - pending work-to-be-done flags are in LSW - * - other flags in MSW - * Warning: layout of LSW is hardcoded in entry.S */ #define TIF_SYSCALL_TRACE 0 /* syscall trace active */ #define TIF_NOTIFY_RESUME 1 /* callback before returning to user */ @@ -103,8 +100,8 @@ struct thread_info { #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) #define _TIF_SIGPENDING(1 << TIF_SIGPENDING) -#define _TIF_SINGLESTEP(1 << TIF_SINGLESTEP) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) +#define _TIF_SINGLESTEP(1 << TIF_SINGLESTEP) #define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU) #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) #define _TIF_SECCOMP (1 << TIF_SECCOMP) @@ -133,8 +130,10 @@ struct thread_info { /* work to do on any return to user space */ #define _TIF_ALLWORK_MASK \ - ((0x & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT | \ - _TIF_NOHZ) + (_TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME | _TIF_SIGPENDING |\ +_TIF_NEED_RESCHED | _TIF_SINGLESTEP | _TIF_SYSCALL_EMU | \ +_TIF_SYSCALL_AUDIT | _TIF_USER_RETURN_NOTIFY | _TIF_UPROBE | \ +_TIF_NOHZ | _TIF_SYSCALL_TRACEPOINT) /* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW \ -- 2.7.4
[PATCH v5 01/15] stacktrace/x86: add function for detecting reliable stack traces
For live patching and possibly other use cases, a stack trace is only useful if it can be assured that it's completely reliable. Add a new save_stack_trace_tsk_reliable() function to achieve that. Note that if the target task isn't the current task, and the target task is allowed to run, then it could be writing the stack while the unwinder is reading it, resulting in possible corruption. So the caller of save_stack_trace_tsk_reliable() must ensure that the task is either 'current' or inactive. save_stack_trace_tsk_reliable() relies on the x86 unwinder's detection of pt_regs on the stack. If the pt_regs are not user-mode registers from a syscall, then they indicate an in-kernel interrupt or exception (e.g. preemption or a page fault), in which case the stack is considered unreliable due to the nature of frame pointers. It also relies on the x86 unwinder's detection of other issues, such as: - corrupted stack data - stack grows the wrong way - stack walk doesn't reach the bottom - user didn't provide a large enough entries array Such issues are reported by checking unwind_error() and !unwind_done(). Also add CONFIG_HAVE_RELIABLE_STACKTRACE so arch-independent code can determine at build time whether the function is implemented. Signed-off-by: Josh Poimboeuf--- arch/Kconfig | 6 +++ arch/x86/Kconfig | 1 + arch/x86/include/asm/unwind.h | 6 +++ arch/x86/kernel/stacktrace.c | 96 +- arch/x86/kernel/unwind_frame.c | 2 + include/linux/stacktrace.h | 9 ++-- kernel/stacktrace.c| 12 +- 7 files changed, 126 insertions(+), 6 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 80f3e5e..478b939 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -749,6 +749,12 @@ config HAVE_STACK_VALIDATION Architecture supports the 'objtool check' host tool command, which performs compile-time stack metadata validation. +config HAVE_RELIABLE_STACKTRACE + bool + help + Architecture has a save_stack_trace_tsk_reliable() function which + only returns a stack trace if it can guarantee the trace is reliable. + config HAVE_ARCH_HASH bool default n diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ea82a7b..e79fbf8 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -160,6 +160,7 @@ config X86 select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_REGS_AND_STACK_ACCESS_API + select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER && STACK_VALIDATION select HAVE_STACK_VALIDATIONif X86_64 select HAVE_SYSCALL_TRACEPOINTS select HAVE_UNSTABLE_SCHED_CLOCK diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h index 6fa75b1..137e9cce 100644 --- a/arch/x86/include/asm/unwind.h +++ b/arch/x86/include/asm/unwind.h @@ -11,6 +11,7 @@ struct unwind_state { unsigned long stack_mask; struct task_struct *task; int graph_idx; + bool error; #ifdef CONFIG_FRAME_POINTER unsigned long *bp, *orig_sp; struct pt_regs *regs; @@ -40,6 +41,11 @@ void unwind_start(struct unwind_state *state, struct task_struct *task, __unwind_start(state, task, regs, first_frame); } +static inline bool unwind_error(struct unwind_state *state) +{ + return state->error; +} + #ifdef CONFIG_FRAME_POINTER static inline diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c index 0653788..c5490d9 100644 --- a/arch/x86/kernel/stacktrace.c +++ b/arch/x86/kernel/stacktrace.c @@ -74,6 +74,101 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace) } EXPORT_SYMBOL_GPL(save_stack_trace_tsk); +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE + +#define STACKTRACE_DUMP_ONCE(task) ({ \ + static bool __section(.data.unlikely) __dumped; \ + \ + if (!__dumped) {\ + __dumped = true;\ + WARN_ON(1); \ + show_stack(task, NULL); \ + } \ +}) + +static int __save_stack_trace_reliable(struct stack_trace *trace, + struct task_struct *task) +{ + struct unwind_state state; + struct pt_regs *regs; + unsigned long addr; + + for (unwind_start(, task, NULL, NULL); !unwind_done(); +unwind_next_frame()) { + + regs = unwind_get_entry_regs(); + if (regs) { + /* +* Kernel mode registers on the stack indicate an +* in-kernel interrupt or exception (e.g., preemption +*
[PATCH v5 00/15] livepatch: hybrid consistency model
Here's v5 of the consistency model, targeted for 4.12. Only a few minor changes this time. I would very much appreciate reviews/acks from the following: - Michael Ellerman for the powerpc changes in patch 5. - Heiko Carstens for the s390 changes in patches 6 & 7. - Peter Zijlstra/Ingo Molnar for the use of task_rq_lock() and the modification of do_idle() in patch 13. Thanks! Based on linux-next/master (20170213). v5: - return -EINVAL in __save_stack_trace_reliable() - only call show_stack() once - add save_stack_trace_tsk_reliable() define for !CONFIG_STACKTRACE - update kernel version and date in ABI doc - make suggested improvements to livepatch.txt - update barrier comments - remove klp_try_complete_transition() call from klp_start_transition() - move end of klp_try_complete_transition() into klp_complete_transition() - fix __klp_enable_patch() error path - check for transition in klp_module_going() v4: - add warnings for "impossible" scenarios in __save_stack_trace_reliable() - sort _TIF_ALLWORK_MASK flags - move klp_transition_work to transition.c. This resulted in the following related changes: - klp_mutex is now visible to transition.c - klp_start_transition() now calls klp_try_complete_transition() - klp_try_complete_transition() now sets up the work - rearrange code in transition.c accordingly - klp_reverse_transition(): clear TIF flags and call synchronize_rcu() - klp_try_complete_transition(): do synchronize_rcu() only when unpatching - klp_start_transition(): only set TIF flags when necessary - klp_complete_transition(): add synchronize_rcu() when patching - klp_ftrace_handler(): put WARN_ON_ONCE back in and add comment - use for_each_possible_cpu() to patch offline idle tasks - add warnings to sample module when setting patch.immediate - don't use pr_debug() with the task rq lock - add documentation about porting consistency model to other arches - move klp_patch_pending() to patch 13 - improve several comments and commit messages v3: - rebase on new x86 unwinder - force !HAVE_RELIABLE_STACKTRACE arches to use patch->immediate for now, because we don't have a way to transition kthreads otherwise - rebase s390 TIF_PATCH_PENDING patch onto latest entry code - update barrier comments and move barrier from the end of klp_init_transition() to its callers - "klp_work" -> "klp_transition_work" - "klp_patch_task()" -> "klp_update_patch_state()" - explicit _TIF_ALLWORK_MASK - change klp_reverse_transition() to not try to complete transition. instead modify the work queue delay to zero. - get rid of klp_schedule_work() in favor of calling schedule_delayed_work() directly with a KLP_TRANSITION_DELAY - initialize klp_target_state to KLP_UNDEFINED - move klp_target_state assignment to before patch->immediate check in klp_init_transition() - rcu_read_lock() in klp_update_patch_state(), test the thread flag in patch task, synchronize_rcu() in klp_complete_transition() - use kstrtobool() in enabled_store() - change task_rq_lock() argument type to struct rq_flags - add several WARN_ON_ONCE assertions for klp_target_state and task->patch_state v2: - "universe" -> "patch state" - rename klp_update_task_universe() -> klp_patch_task() - add preempt IRQ tracking (TF_PREEMPT_IRQ) - fix print_context_stack_reliable() bug - improve print_context_stack_reliable() comments - klp_ftrace_handler comment fixes - add "patch_state" proc file to tid_base_stuff - schedule work even for !RELIABLE_STACKTRACE - forked child inherits patch state from parent - add detailed comment to livepatch.h klp_func definition about the klp_func patched/transition state transitions - update exit_to_usermode_loop() comment - clear all TIF_KLP_NEED_UPDATE flags in klp_complete_transition() - remove unnecessary function externs - add livepatch documentation, sysfs documentation, /proc documentation - /proc/pid/patch_state: -1 means no patch is currently being applied/reverted - "TIF_KLP_NEED_UPDATE" -> "TIF_PATCH_PENDING" - support for s390 and powerpc-le - don't assume stacks with dynamic ftrace trampolines are reliable - add _TIF_ALLWORK_MASK info to commit log v1.9: - revive from the dead and rebased - reliable stacks! - add support for immediate consistency model - add a ton of comments - fix up memory barriers - remove "allow patch modules to be removed" patch for now, it still needs more discussion and thought - it can be done with something - "proc/pid/universe" -> "proc/pid/patch_status" - remove WARN_ON_ONCE from !func condition in ftrace handler -- can happen because of RCU - keep klp_mutex private by putting the work_fn in core.c - convert states from int to boolean - remove obsolete '@state' comments - several header file and include improvements suggested by Jiri S - change kallsyms_lookup_size_offset() errors from EINVAL -
Re: [PATCH v2] powerpc: Blacklist GCC 5.4 6.1 and 6.2
On Mon, 2017-02-13 at 09:44 -0600, Segher Boessenkool wrote: > Hi Cyril, > > On Mon, Feb 13, 2017 at 02:35:36PM +1100, Cyril Bur wrote: > > A bug in the -02 optimisation of GCC 5.4 6.1 and 6.2 causes > > setup_command_line() to not pass the correct first argument to strcpy > > and therefore not actually copy the command_line. > > There is no such thing as an "-O2 optimisation". Right, perhaps I should have phrased it as "One of the -O2 level optimisations of GCC 5.4, 6.1 and 6.2 causes setup_command_line() to not pass the correct first argument to strcpy and therefore not actually copy the command_line, -O1 does not have this problem." > > > At the time of writing GCC 5.4 is the most recent and is affected. GCC > > 6.3 contains the backported fix, has been tested and appears safe to > > use. > > 6.3 is (of course) the newer release; 5.4 is a maintenance release of > a compiler that is a year older. Yes. I think the point I was trying to make is that since they backported the fix to 5.x and 6.x then I expect that 5.5 will have the fix but since it doesn't exist yet, I can't be sure. I'll add something to that effect. > > > +# - gcc-5.4, 6.1, 6.2 don't copy the command_line around correctly > > + echo -n '*** GCC-5.4 6.1 6.2 have a bad -O2 optimisation ' ; \ > > + echo 'which will cause lost command_line options (at least).' ; > > \ > > Maybe something more like > > "GCC 5.4, 6.1, and 6.2 have a bug that results in a kernel that does > not boot. Please use GCC 6.3 or later.". "that may not boot" is more accurate, if it can boot without a command_line param it might just do so. > > Please mention the GCC PR # somewhere in the code, too? > Sure. Thanks, Cyril > > Segher
[PATCH] powerpc/xmon: add debugfs entry for xmon
Currently the xmon debugger is set only via kernel boot command-line. It's disabled by default, and can be enabled with "xmon=on" on the command-line. Also, xmon may be accessed via sysrq mechanism, but once we enter xmon via sysrq, it's kept enabled until system is rebooted, even if we exit the debugger. A kernel crash will then lead to xmon instance, instead of triggering a kdump procedure (if configured), for example. This patch introduces a debugfs entry for xmon, allowing user to query its current state and change it if desired. Basically, the "xmon" file to read from/write to is under the debugfs mount point, on powerpc directory. Reading this file will provide the current state of the debugger, one of the following: "on", "off", "early" or "nobt". Writing one of these states to the file will take immediate effect on the debugger. Signed-off-by: Guilherme G. Piccoli--- * I had this patch partially done for some time, and after a discussion at the kernel slack channel latest week, I decided to rebase and fix some remaining bugs. I'd change 'x' option to always disable the debugger, since with this patch we can always re-enable xmon, but today I noticed Pan's patch on the mailing list, so perhaps his approach of adding a flag to 'x' option is preferable. I can change this in a V2, if requested. Thanks in advance! arch/powerpc/xmon/xmon.c | 124 +++ 1 file changed, 105 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 9c0e17c..5fb39db 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -29,6 +29,12 @@ #include #include +#ifdef CONFIG_DEBUG_FS +#include +#include +#include +#endif + #include #include #include @@ -184,7 +190,12 @@ static void dump_tlb_44x(void); static void dump_tlb_book3e(void); #endif -static int xmon_no_auto_backtrace; +/* xmon_state values */ +#define XMON_OFF 0 +#define XMON_ON1 +#define XMON_EARLY 2 +#define XMON_NOBT 3 +static int xmon_state; #ifdef CONFIG_PPC64 #define REG"%.16lx" @@ -880,8 +891,8 @@ cmds(struct pt_regs *excp) last_cmd = NULL; xmon_regs = excp; - if (!xmon_no_auto_backtrace) { - xmon_no_auto_backtrace = 1; + if (xmon_state != XMON_NOBT) { + xmon_state = XMON_NOBT; xmon_show_stack(excp->gpr[1], excp->link, excp->nip); } @@ -3244,6 +3255,26 @@ static void xmon_init(int enable) } } +static int parse_xmon(char *p) +{ + if (!p || strncmp(p, "early", 5) == 0) { + /* just "xmon" is equivalent to "xmon=early" */ + xmon_init(1); + xmon_state = XMON_EARLY; + } else if (strncmp(p, "on", 2) == 0) { + xmon_init(1); + xmon_state = XMON_ON; + } else if (strncmp(p, "off", 3) == 0) { + xmon_init(0); + xmon_state = XMON_OFF; + } else if (strncmp(p, "nobt", 4) == 0) + xmon_state = XMON_NOBT; + else + return 1; + + return 0; +} + #ifdef CONFIG_MAGIC_SYSRQ static void sysrq_handle_xmon(int key) { @@ -3266,34 +3297,89 @@ static int __init setup_xmon_sysrq(void) __initcall(setup_xmon_sysrq); #endif /* CONFIG_MAGIC_SYSRQ */ -static int __initdata xmon_early, xmon_off; +#ifdef CONFIG_DEBUG_FS +static ssize_t xmon_dbgfs_read(struct file *file, char __user *ubuffer, + size_t len, loff_t *offset) +{ + int buf_len = 0; + char buf[6] = { 0 }; -static int __init early_parse_xmon(char *p) + switch (xmon_state) { + case XMON_OFF: + buf_len = sprintf(buf, "off"); + break; + case XMON_ON: + buf_len = sprintf(buf, "on"); + break; + case XMON_EARLY: + buf_len = sprintf(buf, "early"); + break; + case XMON_NOBT: + buf_len = sprintf(buf, "nobt"); + break; + } + + return simple_read_from_buffer(ubuffer, len, offset, buf, buf_len); +} + +static ssize_t xmon_dbgfs_write(struct file *file, const char __user *ubuffer, + size_t len, loff_t *offset) { - if (!p || strncmp(p, "early", 5) == 0) { - /* just "xmon" is equivalent to "xmon=early" */ - xmon_init(1); - xmon_early = 1; - } else if (strncmp(p, "on", 2) == 0) - xmon_init(1); - else if (strncmp(p, "off", 3) == 0) - xmon_off = 1; - else if (strncmp(p, "nobt", 4) == 0) - xmon_no_auto_backtrace = 1; - else - return 1; + int ret, not_copied; + char *buf; + + /* Valid states are on, off, early and nobt. */ + if ((*offset != 0) || (len <= 0) || (len > 6)) +return -EINVAL; + + buf = kzalloc(len + 1,
[PATCH] KVM: PPC: Book3S: Ratelimit copy data failure error messages
kvm_ppc_mmu_book3s_32/64 xlat() log "KVM can't copy data" error upon failing to copy user data to kernel space. This floods kernel log once such fails occur in short time period. Ratelimit this error to avoid flooding kernel logs upon copy data failures. Signed-off-by: Vipin K Parashar--- arch/powerpc/kvm/book3s_32_mmu.c | 3 ++- arch/powerpc/kvm/book3s_64_mmu.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index a2eb6d3..ca8f960 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -224,7 +224,8 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr, ptem = kvmppc_mmu_book3s_32_get_ptem(sre, eaddr, primary); if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) { - printk(KERN_ERR "KVM: Can't copy data from 0x%lx!\n", ptegp); + if (printk_ratelimit()) + printk(KERN_ERR "KVM: Can't copy data from 0x%lx!\n", ptegp); goto no_page_found; } diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c index b9131aa..b420aca 100644 --- a/arch/powerpc/kvm/book3s_64_mmu.c +++ b/arch/powerpc/kvm/book3s_64_mmu.c @@ -265,7 +265,8 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, goto no_page_found; if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) { - printk(KERN_ERR "KVM can't copy data from 0x%lx!\n", ptegp); + if (printk_ratelimit()) + printk(KERN_ERR "KVM can't copy data from 0x%lx!\n", ptegp); goto no_page_found; } -- 2.7.4
Re: [PATCH v2] powerpc: Blacklist GCC 5.4 6.1 and 6.2
Hi Cyril, On Mon, Feb 13, 2017 at 02:35:36PM +1100, Cyril Bur wrote: > A bug in the -02 optimisation of GCC 5.4 6.1 and 6.2 causes > setup_command_line() to not pass the correct first argument to strcpy > and therefore not actually copy the command_line. There is no such thing as an "-O2 optimisation". > At the time of writing GCC 5.4 is the most recent and is affected. GCC > 6.3 contains the backported fix, has been tested and appears safe to > use. 6.3 is (of course) the newer release; 5.4 is a maintenance release of a compiler that is a year older. > +# - gcc-5.4, 6.1, 6.2 don't copy the command_line around correctly > + echo -n '*** GCC-5.4 6.1 6.2 have a bad -O2 optimisation ' ; \ > + echo 'which will cause lost command_line options (at least).' ; > \ Maybe something more like "GCC 5.4, 6.1, and 6.2 have a bug that results in a kernel that does not boot. Please use GCC 6.3 or later.". Please mention the GCC PR # somewhere in the code, too? Segher
[PATCH V2 2/2] powerpc/mm/slice: Update slice mask printing to use bitmap printing.
We now get output like below which is much better. [0.935306] good_mask low_slice: 0-15 [0.935360] good_mask high_slice: 0-511 Compared to [0.953414] good_mask: - 1. I also fixed an error with slice_dbg printing. Signed-off-by: Aneesh Kumar K.V--- arch/powerpc/mm/slice.c | 30 +++--- 1 file changed, 7 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c index 08ac27eae408..d3701b0f439f 100644 --- a/arch/powerpc/mm/slice.c +++ b/arch/powerpc/mm/slice.c @@ -53,29 +53,13 @@ int _slice_debug = 1; static void slice_print_mask(const char *label, struct slice_mask mask) { - char*p, buf[SLICE_NUM_LOW + 3 + SLICE_NUM_HIGH + 1]; - int i; - if (!_slice_debug) return; - p = buf; - for (i = 0; i < SLICE_NUM_LOW; i++) - *(p++) = (mask.low_slices & (1 << i)) ? '1' : '0'; - *(p++) = ' '; - *(p++) = '-'; - *(p++) = ' '; - for (i = 0; i < SLICE_NUM_HIGH; i++) { - if (test_bit(i, mask.high_slices)) - *(p++) = '1'; - else - *(p++) = '0'; - } - *(p++) = 0; - - printk(KERN_DEBUG "%s:%s\n", label, buf); + pr_devel("%s low_slice: %*pbl\n", label, (int)SLICE_NUM_LOW, _slices); + pr_devel("%s high_slice: %*pbl\n", label, (int)SLICE_NUM_HIGH, mask.high_slices); } -#define slice_dbg(fmt...) do { if (_slice_debug) pr_debug(fmt); } while(0) +#define slice_dbg(fmt...) do { if (_slice_debug) pr_devel(fmt); } while (0) #else @@ -242,8 +226,8 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz } slice_dbg(" lsps=%lx, hsps=%lx\n", - mm->context.low_slices_psize, - mm->context.high_slices_psize); + (unsigned long)mm->context.low_slices_psize, + (unsigned long)mm->context.high_slices_psize); spin_unlock_irqrestore(_convert_lock, flags); @@ -685,8 +669,8 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned int psize) slice_dbg(" lsps=%lx, hsps=%lx\n", - mm->context.low_slices_psize, - mm->context.high_slices_psize); + (unsigned long)mm->context.low_slices_psize, + (unsigned long)mm->context.high_slices_psize); bail: spin_unlock_irqrestore(_convert_lock, flags); -- 2.7.4
[PATCH V2 1/2] powerpc/mm/slice: Move slice_mask struct definition to slice.c
This structure definition need not be in a header since this is used only by slice.c file. So move it to slice.c. This also allow us to use SLICE_NUM_HIGH instead of 512 and also helps in getting rid of one BUILD_BUG_ON(). I also switch the low_slices type to u64 from u16. This doesn't have an impact on size of struct due to padding added with u16 type. This helps in using bitmap printing function for printing slice mask. Signed-off-by: Aneesh Kumar K.V--- arch/powerpc/include/asm/page_64.h | 11 --- arch/powerpc/mm/slice.c| 13 ++--- 2 files changed, 10 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/page_64.h b/arch/powerpc/include/asm/page_64.h index 9b60e9455c6e..3ecfc2734b50 100644 --- a/arch/powerpc/include/asm/page_64.h +++ b/arch/powerpc/include/asm/page_64.h @@ -99,17 +99,6 @@ extern u64 ppc64_pft_size; #define GET_HIGH_SLICE_INDEX(addr) ((addr) >> SLICE_HIGH_SHIFT) #ifndef __ASSEMBLY__ -/* - * One bit per slice. We have lower slices which cover 256MB segments - * upto 4G range. That gets us 16 low slices. For the rest we track slices - * in 1TB size. - * 64 below is actually SLICE_NUM_HIGH to fixup complie errros - */ -struct slice_mask { - u16 low_slices; - DECLARE_BITMAP(high_slices, 512); -}; - struct mm_struct; extern unsigned long slice_get_unmapped_area(unsigned long addr, diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c index b3f45e413a60..08ac27eae408 100644 --- a/arch/powerpc/mm/slice.c +++ b/arch/powerpc/mm/slice.c @@ -37,7 +37,16 @@ #include static DEFINE_SPINLOCK(slice_convert_lock); - +/* + * One bit per slice. We have lower slices which cover 256MB segments + * upto 4G range. That gets us 16 low slices. For the rest we track slices + * in 1TB size. + * 64 below is actually SLICE_NUM_HIGH to fixup complie errros + */ +struct slice_mask { + u64 low_slices; + DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH); +}; #ifdef DEBUG int _slice_debug = 1; @@ -407,8 +416,6 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len, struct mm_struct *mm = current->mm; unsigned long newaddr; - /* Make sure high_slices bitmap size is same as we expected */ - BUILD_BUG_ON(512 != SLICE_NUM_HIGH); /* * init different masks */ -- 2.7.4
Re: [PATCH] powerpc/mm/slice: Update slice mask printing to use bitmap printing.
On Monday 13 February 2017 04:40 PM, Aneesh Kumar K.V wrote: We now get output like below which is much better. [0.935306] good_mask low_slice: 4-5,9,11-13 [0.935360] good_mask high_slice: 0-511 [0.935385] mask low_slice: 3-6,8,10-12 [0.935397] mask high_slice: Compared to [0.953414] good_mask: - 1. I also fixed an error with slice_dbg printing. Signed-off-by: Aneesh Kumar K.V--- arch/powerpc/mm/slice.c | 30 +++--- 1 file changed, 7 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c index b3f45e413a60..0575897fdbe3 100644 --- a/arch/powerpc/mm/slice.c +++ b/arch/powerpc/mm/slice.c @@ -44,29 +44,13 @@ int _slice_debug = 1; static void slice_print_mask(const char *label, struct slice_mask mask) { - char*p, buf[SLICE_NUM_LOW + 3 + SLICE_NUM_HIGH + 1]; - int i; - if (!_slice_debug) return; - p = buf; - for (i = 0; i < SLICE_NUM_LOW; i++) - *(p++) = (mask.low_slices & (1 << i)) ? '1' : '0'; - *(p++) = ' '; - *(p++) = '-'; - *(p++) = ' '; - for (i = 0; i < SLICE_NUM_HIGH; i++) { - if (test_bit(i, mask.high_slices)) - *(p++) = '1'; - else - *(p++) = '0'; - } - *(p++) = 0; - - printk(KERN_DEBUG "%s:%s\n", label, buf); + pr_devel("%s low_slice: %*pbl\n", label, (int)SLICE_NUM_LOW, _slices); This doesn't work as expected because low_slices is of type u16. I am fixing that to be u64. + pr_devel("%s high_slice: %*pbl\n", label, (int)SLICE_NUM_HIGH, mask.high_slices); } -#define slice_dbg(fmt...) do { if (_slice_debug) pr_debug(fmt); } while(0) +#define slice_dbg(fmt...) do { if (_slice_debug) pr_devel(fmt); } while (0) #else @@ -233,8 +217,8 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz } slice_dbg(" lsps=%lx, hsps=%lx\n", - mm->context.low_slices_psize, - mm->context.high_slices_psize); + (unsigned long)mm->context.low_slices_psize, + (unsigned long)mm->context.high_slices_psize); spin_unlock_irqrestore(_convert_lock, flags); @@ -678,8 +662,8 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned int psize) slice_dbg(" lsps=%lx, hsps=%lx\n", - mm->context.low_slices_psize, - mm->context.high_slices_psize); + (unsigned long)mm->context.low_slices_psize, + (unsigned long)mm->context.high_slices_psize); bail: spin_unlock_irqrestore(_convert_lock, flags); -aneesh
[PATCH] powerpc/perf: Avoid FAB_*_MATCH checks for power9
Since power9 does not support FAB_*_MATCH bits in MMCR1, avoid these checks for power9. For this, patch factor out code in isa207_get_constraint() to retain these checks only for power8. Patch also updates the comment in power9-pmu raw event encode layout to remove FAB_*_MATCH. Finally for power9, patch adds additional check for threshold events when adding the thresh mask and value in isa207_get_constraint(). fixes: 7ffd948fae4c ('powerpc/perf: factor out power8 pmu functions') fixes: 18201b204286 ('powerpc/perf: power9 raw event format encoding') Signed-off-by: Ravi BangoriaSigned-off-by: Madhavan Srinivasan --- arch/powerpc/perf/isa207-common.c | 58 ++- arch/powerpc/perf/power9-pmu.c| 8 ++ 2 files changed, 42 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c index 50e598cf644b..2703a1e340e7 100644 --- a/arch/powerpc/perf/isa207-common.c +++ b/arch/powerpc/perf/isa207-common.c @@ -97,6 +97,28 @@ static unsigned long combine_shift(unsigned long pmc) return MMCR1_COMBINE_SHIFT(pmc); } +static inline bool event_is_threshold(u64 event) +{ + return (event >> EVENT_THR_SEL_SHIFT) & EVENT_THR_SEL_MASK; +} + +static bool is_thresh_cmp_valid(u64 event) +{ + unsigned int cmp, exp; + + /* +* Check the mantissa upper two bits are not zero, unless the +* exponent is also zero. See the THRESH_CMP_MANTISSA doc. +*/ + cmp = (event >> EVENT_THR_CMP_SHIFT) & EVENT_THR_CMP_MASK; + exp = cmp >> 7; + + if (exp && (cmp & 0x60) == 0) + return false; + + return true; +} + int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp) { unsigned int unit, pmc, cache, ebb; @@ -163,28 +185,26 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp) value |= CNST_SAMPLE_VAL(event >> EVENT_SAMPLE_SHIFT); } - /* -* Special case for PM_MRK_FAB_RSP_MATCH and PM_MRK_FAB_RSP_MATCH_CYC, -* the threshold control bits are used for the match value. -*/ - if (event_is_fab_match(event)) { - mask |= CNST_FAB_MATCH_MASK; - value |= CNST_FAB_MATCH_VAL(event >> EVENT_THR_CTL_SHIFT); + if (cpu_has_feature(CPU_FTR_ARCH_300)) { + if (event_is_threshold(event) && is_thresh_cmp_valid(event)) { + mask |= CNST_THRESH_MASK; + value |= CNST_THRESH_VAL(event >> EVENT_THRESH_SHIFT); + } } else { /* -* Check the mantissa upper two bits are not zero, unless the -* exponent is also zero. See the THRESH_CMP_MANTISSA doc. +* Special case for PM_MRK_FAB_RSP_MATCH and PM_MRK_FAB_RSP_MATCH_CYC, +* the threshold control bits are used for the match value. */ - unsigned int cmp, exp; - - cmp = (event >> EVENT_THR_CMP_SHIFT) & EVENT_THR_CMP_MASK; - exp = cmp >> 7; - - if (exp && (cmp & 0x60) == 0) - return -1; + if (event_is_fab_match(event)) { + mask |= CNST_FAB_MATCH_MASK; + value |= CNST_FAB_MATCH_VAL(event >> EVENT_THR_CTL_SHIFT); + } else { + if (!is_thresh_cmp_valid(event)) + return -1; - mask |= CNST_THRESH_MASK; - value |= CNST_THRESH_VAL(event >> EVENT_THRESH_SHIFT); + mask |= CNST_THRESH_MASK; + value |= CNST_THRESH_VAL(event >> EVENT_THRESH_SHIFT); + } } if (!pmc && ebb) @@ -279,7 +299,7 @@ int isa207_compute_mmcr(u64 event[], int n_ev, * PM_MRK_FAB_RSP_MATCH and PM_MRK_FAB_RSP_MATCH_CYC, * the threshold bits are used for the match value. */ - if (event_is_fab_match(event[i])) { + if (!cpu_has_feature(CPU_FTR_ARCH_300) && event_is_fab_match(event[i])) { mmcr1 |= ((event[i] >> EVENT_THR_CTL_SHIFT) & EVENT_THR_CTL_MASK) << MMCR1_FAB_SHIFT; } else { diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c index 7332634e18c9..7950cee7d617 100644 --- a/arch/powerpc/perf/power9-pmu.c +++ b/arch/powerpc/perf/power9-pmu.c @@ -22,7 +22,7 @@ * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | * | | [ ] [ ] [ thresh_cmp ] [ thresh_ctl ] * | | | | | - * | | *- IFM (Linux)|thresh start/stop OR FAB match -* + * | | *- IFM (Linux)|
[PATCH] powerpc/mm/slice: Update slice mask printing to use bitmap printing.
We now get output like below which is much better. [0.935306] good_mask low_slice: 4-5,9,11-13 [0.935360] good_mask high_slice: 0-511 [0.935385] mask low_slice: 3-6,8,10-12 [0.935397] mask high_slice: Compared to [0.953414] good_mask: - 1. I also fixed an error with slice_dbg printing. Signed-off-by: Aneesh Kumar K.V--- arch/powerpc/mm/slice.c | 30 +++--- 1 file changed, 7 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c index b3f45e413a60..0575897fdbe3 100644 --- a/arch/powerpc/mm/slice.c +++ b/arch/powerpc/mm/slice.c @@ -44,29 +44,13 @@ int _slice_debug = 1; static void slice_print_mask(const char *label, struct slice_mask mask) { - char*p, buf[SLICE_NUM_LOW + 3 + SLICE_NUM_HIGH + 1]; - int i; - if (!_slice_debug) return; - p = buf; - for (i = 0; i < SLICE_NUM_LOW; i++) - *(p++) = (mask.low_slices & (1 << i)) ? '1' : '0'; - *(p++) = ' '; - *(p++) = '-'; - *(p++) = ' '; - for (i = 0; i < SLICE_NUM_HIGH; i++) { - if (test_bit(i, mask.high_slices)) - *(p++) = '1'; - else - *(p++) = '0'; - } - *(p++) = 0; - - printk(KERN_DEBUG "%s:%s\n", label, buf); + pr_devel("%s low_slice: %*pbl\n", label, (int)SLICE_NUM_LOW, _slices); + pr_devel("%s high_slice: %*pbl\n", label, (int)SLICE_NUM_HIGH, mask.high_slices); } -#define slice_dbg(fmt...) do { if (_slice_debug) pr_debug(fmt); } while(0) +#define slice_dbg(fmt...) do { if (_slice_debug) pr_devel(fmt); } while (0) #else @@ -233,8 +217,8 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz } slice_dbg(" lsps=%lx, hsps=%lx\n", - mm->context.low_slices_psize, - mm->context.high_slices_psize); + (unsigned long)mm->context.low_slices_psize, + (unsigned long)mm->context.high_slices_psize); spin_unlock_irqrestore(_convert_lock, flags); @@ -678,8 +662,8 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned int psize) slice_dbg(" lsps=%lx, hsps=%lx\n", - mm->context.low_slices_psize, - mm->context.high_slices_psize); + (unsigned long)mm->context.low_slices_psize, + (unsigned long)mm->context.high_slices_psize); bail: spin_unlock_irqrestore(_convert_lock, flags); -- 2.7.4
RE: [RFC][PATCH] powerpc/64s: optimise syscall entry with relon hypercalls
From: Nicholas Piggin > Sent: 10 February 2017 18:23 > After bc3551257a ("powerpc/64: Allow for relocation-on interrupts from > guest to host"), a getppid() system call goes from 307 cycles to 358 > cycles (+17%). This is due significantly to the scratch SPR used by the > hypercall. > > It turns out there are a some volatile registers common to both system > call and hypercall (in particular, r12, cr0, ctr), which can be used to > avoid the SPR and some other overheads for the system call case. This > brings getppid to 320 cycles (+4%). ... > + * syscall register convention is in Documentation/powerpc/syscall64-abi.txt > + * > + * For hypercalls, the register convention is as follows: > + * r0 volatile > + * r1-2 nonvolatile > + * r3 volatile parameter and return value for status > + * r4-r10 volatile input and output value > + * r11 volatile hypercall number and output value > + * r12 volatile > + * r13-r31 nonvolatile > + * LR nonvolatile > + * CTR volatile > + * XER volatile > + * CR0-1 CR5-7 volatile > + * CR2-4 nonvolatile > + * Other registers nonvolatile > + * > + * The intersection of volatile registers that don't contain possible > + * inputs is: r12, cr0, xer, ctr. We may use these as scratch regs > + * upon entry without saving. Except that they must surely be set to some known value on exit in order to avoid leaking information to the guest. David
Re: [RFC] implement QUEUED spinlocks on powerpc
在 2017/2/7 下午2:46, Eric Dumazet 写道: > On Mon, Feb 6, 2017 at 10:21 PM, panxinhuiwrote: > >> hi all >> I do some netperf tests and get some benchmark results. >> I also attach my test script and netperf-result(Excel) >> HI, all I use loopback interface to run netperf tests, #tc qd add dev lo root pfifo limit 1 #ip link 1: lo: mtu 65536 qdisc pfifo state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 and put the result in netperf.xlsx(excel) It is a 32 vcpus P8 machine, with 32Gib memory. This time spinlock is the best one, qspinlock > pvqspinlock. So sad. thanks xinhui >> There are two machine. one runs netserver and the other runs netperf >> benchmark. 1000Mbps network is connected with them. >> >> #ip link infomation >> 2: eth0: mtu 1500 qdisc pfifo_fast state >> UNKNOWN mode DEFAULT group default qlen 1000 >> link/ether ba:68:9c:14:32:02 brd ff:ff:ff:ff:ff:ff >> >> According to the results, there is not much performance gap with each other. >> And as we are only testing the throughput, the pvqspinlock shows the >> overhead of its pv stuff. but qspinlock shows a little improvement than >> spinlock. My simple summary in this testcase is >> qspinlock > spinlock > pvqspinlock. >> >> when run 200 concurrent netperf, I paste the total throughput here. >> >> concurrent runners| total throughput | variance >> --- >> spinlock| 199 | 66882.8 | 89.93 >> --- >> qspinlock | 199 | 66350.4 | 72.0239 >> --- >> pvqspinlock | 199 | 64740.5 | 85.7837 >> >> You could see more data in nerperf.xlsx >> >> thanks >> xinhui > > > Hi xinhui > > 1Gbit NIC is too slow for this use case. I would try a 10Gbit NIC at least... > > Alternatively, you could use loopback interface. (netperf -H 127.0.0.1) > > tc qd add dev lo root pfifo limit 1 > netperf.xlsx Description: MS-Excel 2007 spreadsheet
[PATCH] add a const to ioread* routines to fix compile testing
On some architectures, the ioread routines are still using a non-const argument for the address parameter. Let's change that to be consistent with the others and fix compile testing (ARM drivers on Intel for instance). Signed-off-by: Cédric Le Goater--- I am not sure how we should handle these changes, so, here is a big patch for all architectures to let maintainers decide. I suppose we could merge the patch in one arch, and first see how the 0-Day bot reacts. The patch can be found on this branch : https://github.com/legoater/linux/tree/aspeed Compiled on : arm arm64 avr32 frv ia64 alpha m68k parisc32 parisc64 mips mips64 sh32 sparc32 sparc64 x86_64 i386 powerpc32 powerpc64 powerpc64le s390x arch/alpha/include/asm/core_apecs.h | 6 ++-- arch/alpha/include/asm/core_cia.h| 6 ++-- arch/alpha/include/asm/core_lca.h| 6 ++-- arch/alpha/include/asm/core_marvel.h | 4 +-- arch/alpha/include/asm/core_mcpcia.h | 6 ++-- arch/alpha/include/asm/core_t2.h | 2 +- arch/alpha/include/asm/io.h | 2 +- arch/alpha/include/asm/io_trivial.h | 6 ++-- arch/alpha/include/asm/jensen.h | 2 +- arch/alpha/include/asm/machvec.h | 6 ++-- arch/alpha/kernel/core_marvel.c | 2 +- arch/alpha/kernel/io.c | 12 arch/frv/include/asm/io.h| 12 arch/frv/include/asm/mb-regs.h | 6 ++-- arch/mips/lib/iomap.c| 22 ++--- arch/parisc/lib/iomap.c | 60 ++-- arch/powerpc/kernel/iomap.c | 20 ++-- arch/sh/kernel/iomap.c | 22 ++--- arch/sparc/include/asm/io_64.h | 6 ++-- include/asm-generic/iomap.h | 20 ++-- lib/iomap.c | 22 ++--- 21 files changed, 125 insertions(+), 125 deletions(-) diff --git a/arch/alpha/include/asm/core_apecs.h b/arch/alpha/include/asm/core_apecs.h index 6785ff7e02bc..a4c88d2a66f0 100644 --- a/arch/alpha/include/asm/core_apecs.h +++ b/arch/alpha/include/asm/core_apecs.h @@ -383,7 +383,7 @@ struct el_apecs_procdata } \ } while (0) -__EXTERN_INLINE unsigned int apecs_ioread8(void __iomem *xaddr) +__EXTERN_INLINE unsigned int apecs_ioread8(const void __iomem *xaddr) { unsigned long addr = (unsigned long) xaddr; unsigned long result, base_and_type; @@ -419,7 +419,7 @@ __EXTERN_INLINE void apecs_iowrite8(u8 b, void __iomem *xaddr) *(vuip) ((addr << 5) + base_and_type) = w; } -__EXTERN_INLINE unsigned int apecs_ioread16(void __iomem *xaddr) +__EXTERN_INLINE unsigned int apecs_ioread16(const void __iomem *xaddr) { unsigned long addr = (unsigned long) xaddr; unsigned long result, base_and_type; @@ -455,7 +455,7 @@ __EXTERN_INLINE void apecs_iowrite16(u16 b, void __iomem *xaddr) *(vuip) ((addr << 5) + base_and_type) = w; } -__EXTERN_INLINE unsigned int apecs_ioread32(void __iomem *xaddr) +__EXTERN_INLINE unsigned int apecs_ioread32(const void __iomem *xaddr) { unsigned long addr = (unsigned long) xaddr; if (addr < APECS_DENSE_MEM) diff --git a/arch/alpha/include/asm/core_cia.h b/arch/alpha/include/asm/core_cia.h index 9e0516c0ca27..fdc029953b90 100644 --- a/arch/alpha/include/asm/core_cia.h +++ b/arch/alpha/include/asm/core_cia.h @@ -341,7 +341,7 @@ struct el_CIA_sysdata_mcheck { #define vuip volatile unsigned int __force * #define vulp volatile unsigned long __force * -__EXTERN_INLINE unsigned int cia_ioread8(void __iomem *xaddr) +__EXTERN_INLINE unsigned int cia_ioread8(const void __iomem *xaddr) { unsigned long addr = (unsigned long) xaddr; unsigned long result, base_and_type; @@ -373,7 +373,7 @@ __EXTERN_INLINE void cia_iowrite8(u8 b, void __iomem *xaddr) *(vuip) ((addr << 5) + base_and_type) = w; } -__EXTERN_INLINE unsigned int cia_ioread16(void __iomem *xaddr) +__EXTERN_INLINE unsigned int cia_ioread16(const void __iomem *xaddr) { unsigned long addr = (unsigned long) xaddr; unsigned long result, base_and_type; @@ -403,7 +403,7 @@ __EXTERN_INLINE void cia_iowrite16(u16 b, void __iomem *xaddr) *(vuip) ((addr << 5) + base_and_type) = w; } -__EXTERN_INLINE unsigned int cia_ioread32(void __iomem *xaddr) +__EXTERN_INLINE unsigned int cia_ioread32(const void __iomem *xaddr) { unsigned long addr = (unsigned long) xaddr; if (addr < CIA_DENSE_MEM) diff --git a/arch/alpha/include/asm/core_lca.h b/arch/alpha/include/asm/core_lca.h index 8ee6c516279c..25277e989731 100644 --- a/arch/alpha/include/asm/core_lca.h +++ b/arch/alpha/include/asm/core_lca.h @@ -229,7 +229,7 @@ union el_lca { } while (0) -__EXTERN_INLINE unsigned int lca_ioread8(void __iomem *xaddr) +__EXTERN_INLINE unsigned int lca_ioread8(const void __iomem *xaddr) {
Re: [PATCH 1/3] kprobes: introduce weak variant of kprobe_exceptions_notify
On 2017/02/10 02:41PM, Michael Ellerman wrote: > "Naveen N. Rao"writes: > > > kprobe_exceptions_notify() is not used on some of the architectures such > > as arm[64] and powerpc anymore. Introduce a weak variant for such > > architectures. > > I'll merge patch 1 & 3 via the powerpc tree for v4.11. > > You can then send patch 2 to the arm guys after -rc1, or for 4.12. Sure, thanks! - Naveen