Re: powerpc: Simplify module TOC handling
On Mon, 2016-18-01 at 00:44:27 UTC, Michael Ellerman wrote: > From: Alan Modra> > PowerPC64 uses the symbol .TOC. much as other targets use > _GLOBAL_OFFSET_TABLE_. It identifies the value of the GOT pointer (or in > powerpc parlance, the TOC pointer). Global offset tables are generally > local to an executable or shared library, or in the kernel, module. Thus > it does not make sense for a module to resolve a relocation against > .TOC. to the kernel's .TOC. value. A module has its own .TOC., and > indeed the powerpc64 module relocation processing ignores the kernel > value of .TOC. and instead calculates a module-local value. > > This patch removes code involved in exporting the kernel .TOC., tweaks > modpost to ignore an undefined .TOC., and the module loader to twiddle > the section symbol so that .TOC. isn't seen as undefined. > > Note that if the kernel was compiled with -msingle-pic-base then ELFv2 > would not have function global entry code setting up r2. In that case > the module call stubs would need to be modified to set up r2 using the > kernel .TOC. value, requiring some of this code to be reinstated. > > mpe: Furthermore a change in binutils master (not yet released) causes > the current way we handle the TOC to no longer work when building with > MODVERSIONS=y and RELOCATABLE=n. The symptom is that modules can not be > loaded due to there being no version found for TOC. > > Cc: sta...@vger.kernel.org # 3.16+ > Signed-off-by: Alan Modra > Signed-off-by: Michael Ellerman Applied to powerpc fixes. https://git.kernel.org/powerpc/c/c153693d7eb9eeb28478aa2dea cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] support for text-relative kallsyms table
On 21 January 2016 at 06:10, Rusty Russellwrote: > Ard Biesheuvel writes: >> This implements text-relative kallsyms address tables. This was developed >> as part of my series to implement KASLR/CONFIG_RELOCATABLE for arm64, but >> I think it may be beneficial to other architectures as well, so I am >> presenting it as a separate series. > > Nice work! > Thanks > AFAICT this should work for every arch, as long as they start with _text > (esp: data and init must be > _text). In addition, it's not harmful on > 32 bit archs. > > IOW, I'd like to turn it on for everyone and discard some code. But > it's easier to roll in like you've done first. > > Should we enable it by default for every arch for now, and see what > happens? > As you say, this only works if every symbol >= _text, which is obviously not the case per the conditional in scripts/kallsyms.c, which emits _text + n or _text - n depending on whether the symbol precedes or follows _text. The git log tells me for which arch this was originally implemented, but it does not tell me which other archs have come to rely on it in the meantime. On top of that, ia64 fails to build with this option, since it has some whitelisted absolute symbols that look suspiciously like they could be emitted as _text relative (and it does not even matter in the absence of CONFIG_RELOCATABLE on ia64, afaict) but I don't know whether we can just override their types as T, since it would also change the type in the contents of /proc/kallsyms. So some guidance would be appreciated here. So I agree that it would be preferred to have a single code path, but I would need some help validating it on architectures I don't have access to. Thanks, Ard. >> The idea is that on 64-bit builds, it is rather wasteful to use absolute >> addressing for kernel symbols since they are all within a couple of MBs >> of each other. On top of that, the absolute addressing implies that, when >> the kernel is relocated at runtime, each address in the table needs to be >> fixed up individually. >> >> Since all section-relative addresses are already emitted relative to _text, >> it is quite straight-forward to record only the offset, and add the absolute >> address of _text at runtime when referring to the address table. >> >> The reduction ranges from around 250 KB uncompressed vmlinux size and 10 KB >> compressed size (s390) to 3 MB/500 KB for ppc64 (although, in the latter >> case, >> the reduction in uncompressed size is primarily __init data) >> >> Kees Cook was so kind to test these against x86_64, and confirmed that KASLR >> still operates as expected. >> >> Ard Biesheuvel (4): >> kallsyms: add support for relative offsets in kallsyms address table >> powerpc: enable text relative kallsyms for ppc64 >> s390: enable text relative kallsyms for 64-bit targets >> x86_64: enable text relative kallsyms for 64-bit targets >> >> arch/powerpc/Kconfig| 1 + >> arch/s390/Kconfig | 1 + >> arch/x86/Kconfig| 1 + >> init/Kconfig| 14 >> kernel/kallsyms.c | 35 +- >> scripts/kallsyms.c | 38 +--- >> scripts/link-vmlinux.sh | 4 +++ >> scripts/namespace.pl| 1 + >> 8 files changed, 82 insertions(+), 13 deletions(-) >> >> -- >> 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 2/9] selftests/powerpc: Test preservation of FPU and VMX regs across preemption
Loop in assembly checking the registers with many threads. Signed-off-by: Cyril Bur--- tools/testing/selftests/powerpc/math/.gitignore| 2 + tools/testing/selftests/powerpc/math/Makefile | 5 +- tools/testing/selftests/powerpc/math/fpu_asm.S | 34 +++ tools/testing/selftests/powerpc/math/fpu_preempt.c | 113 + tools/testing/selftests/powerpc/math/vmx_asm.S | 44 +++- tools/testing/selftests/powerpc/math/vmx_preempt.c | 113 + 6 files changed, 306 insertions(+), 5 deletions(-) create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c diff --git a/tools/testing/selftests/powerpc/math/.gitignore b/tools/testing/selftests/powerpc/math/.gitignore index b19b269..1a6f09e 100644 --- a/tools/testing/selftests/powerpc/math/.gitignore +++ b/tools/testing/selftests/powerpc/math/.gitignore @@ -1,2 +1,4 @@ fpu_syscall vmx_syscall +fpu_preempt +vmx_preempt diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile index 418bef1..b6f4158 100644 --- a/tools/testing/selftests/powerpc/math/Makefile +++ b/tools/testing/selftests/powerpc/math/Makefile @@ -1,4 +1,4 @@ -TEST_PROGS := fpu_syscall vmx_syscall +TEST_PROGS := fpu_syscall fpu_preempt vmx_syscall vmx_preempt all: $(TEST_PROGS) @@ -6,7 +6,10 @@ $(TEST_PROGS): ../harness.c $(TEST_PROGS): CFLAGS += -O2 -g -pthread -m64 -maltivec fpu_syscall: fpu_asm.S +fpu_preempt: fpu_asm.S + vmx_syscall: vmx_asm.S +vmx_preempt: vmx_asm.S include ../../lib.mk diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S b/tools/testing/selftests/powerpc/math/fpu_asm.S index 8733874..46bbe99 100644 --- a/tools/testing/selftests/powerpc/math/fpu_asm.S +++ b/tools/testing/selftests/powerpc/math/fpu_asm.S @@ -159,3 +159,37 @@ FUNC_START(test_fpu) POP_BASIC_STACK(256) blr FUNC_END(test_fpu) + +#int preempt_fpu(double *darray, int *threads_running, int *running) +#On starting will (atomically) decrement not_ready as a signal that the FPU +#has been loaded with darray. Will proceed to check the validity of the FPU +#registers while running is not zero. +FUNC_START(preempt_fpu) + PUSH_BASIC_STACK(256) + std r3,32(sp) #double *darray + std r4,40(sp) #volatile int *not_ready + std r5,48(sp) #int *running + PUSH_FPU(56) + + bl load_fpu + + #Atomic DEC + ld r3,40(sp) +1: lwarx r4,0,r3 + addi r4,r4,-1 + stwcx. r4,0,r3 + bne- 1b + +2: ld r3, 32(sp) + bl check_fpu + cmpdi r3,0 + bne 3f + ld r4, 48(sp) + ld r5, 0(r4) + cmpwi r5,0 + bne 2b + +3: POP_FPU(56) + POP_BASIC_STACK(256) + blr +FUNC_END(preempt_fpu) diff --git a/tools/testing/selftests/powerpc/math/fpu_preempt.c b/tools/testing/selftests/powerpc/math/fpu_preempt.c new file mode 100644 index 000..0f85b79 --- /dev/null +++ b/tools/testing/selftests/powerpc/math/fpu_preempt.c @@ -0,0 +1,113 @@ +/* + * Copyright 2015, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * This test attempts to see if the FPU registers change across preemption. + * Two things should be noted here a) The check_fpu function in asm only checks + * the non volatile registers as it is reused from the syscall test b) There is + * no way to be sure preemption happened so this test just uses many threads + * and a long wait. As such, a successful test doesn't mean much but a failure + * is bad. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "utils.h" + +/* Time to wait for workers to get preempted (seconds) */ +#define PREEMPT_TIME 20 +/* + * Factor by which to multiply number of online CPUs for total number of + * worker threads + */ +#define THREAD_FACTOR 8 + + +__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, +1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, +2.1}; + +int threads_starting; +int running; + +extern void preempt_fpu(double *darray, int *threads_starting, int *running); + +void *preempt_fpu_c(void *p) +{ + int i; + srand(pthread_self()); + for (i = 0; i < 21; i++) + darray[i] = rand(); + + /* Test failed if it ever returns */ + preempt_fpu(darray, _starting, ); + + return p; +} + +int test_preempt_fpu(void) +{ + int i, rc, threads; + pthread_t *tids; + + threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR; + tids = malloc((threads) * sizeof(pthread_t)); + FAIL_IF(!tids); + + running = true; + threads_starting =
[PATCH] qe_ic: fix a buffer overflow error and add check elsewhere
127 is the theoretical up boundary of QEIC number, in fact there only be 44 qe_ic_info now. add check to overflow for qe_ic_info Signed-off-by: Zhao Qiang--- drivers/soc/fsl/qe/qe_ic.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c index 5419527..90c00b7 100644 --- a/drivers/soc/fsl/qe/qe_ic.c +++ b/drivers/soc/fsl/qe/qe_ic.c @@ -261,6 +261,11 @@ static int qe_ic_host_map(struct irq_domain *h, unsigned int virq, struct qe_ic *qe_ic = h->host_data; struct irq_chip *chip; + if (hw >= ARRAY_SIZE(qe_ic_info)) { + pr_err("%s: Invalid hw irq number for QEIC\n", __func__); + return -EINVAL; + } + if (qe_ic_info[hw].mask == 0) { printk(KERN_ERR "Can't map reserved IRQ\n"); return -EINVAL; @@ -409,7 +414,8 @@ int qe_ic_set_priority(unsigned int virq, unsigned int priority) if (priority > 8 || priority == 0) return -EINVAL; - if (src > 127) + if (WARN_ONCE(src >= ARRAY_SIZE(qe_ic_info), + "%s: Invalid hw irq number for QEIC\n", __func__)) return -EINVAL; if (qe_ic_info[src].pri_reg == 0) return -EINVAL; @@ -438,6 +444,9 @@ int qe_ic_set_high_priority(unsigned int virq, unsigned int priority, int high) if (priority > 2 || priority == 0) return -EINVAL; + if (WARN_ONCE(src >= ARRAY_SIZE(qe_ic_info), + "%s: Invalid hw irq number for QEIC\n", __func__)) + return -EINVAL; switch (qe_ic_info[src].pri_reg) { case QEIC_CIPZCC: -- 2.1.0.27.g96db324 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc: Wire up copy_file_range() syscall
On Wed, 2016-13-01 at 16:50:22 UTC, Chandan Rajendra wrote: > Test runs on a ppc64 BE guest succeeded. > > Signed-off-by: Chandan RajendraApplied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/d7f9ee60a6ebc263861a1d8c06 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/eeh: Fix PE location code
Sam Mendoza-Jonaswrites: > On Wed, Jan 20, 2016 at 02:56:13PM +1100, Russell Currey wrote: >> On Wed, 2015-12-02 at 16:25 +1100, Gavin Shan wrote: >> > In eeh_pe_loc_get(), the PE location code is retrieved from the >> > "ibm,loc-code" property of the device node for the bridge of the >> > PE's primary bus. It's not correct because the property indicates >> > the parent PE's location code. >> > >> > This reads the correct PE location code from "ibm,io-base-loc-code" >> > or "ibm,slot-location-code" property of PE parent bus's device node. >> > >> > Signed-off-by: Gavin Shan >> > --- >> >> Tested-by: Russell Currey > > Thanks Russell! > > W.R.T including this in stable, I don't believe anything actively breaks > without the patch, but in the event of an EEH freeze the wrong slot for > the device will be identified, making troubleshooting more difficult. As someone who's likely going to have to deal with the bug reports for such things, I like the idea of this going to stable as *maybe* I'll get fewer of them that I have to close pointing to this commit... -- Stewart Smith OPAL Architect, IBM. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/4] powerpc: enable text relative kallsyms for ppc64
On Wed, 2016-01-20 at 10:05 +0100, Ard Biesheuvel wrote: > This enables the newly introduced text-relative kallsyms support when > building 64-bit targets. This cuts the size of the kallsyms address > table in half, and drastically reduces the size of the PIE dynamic > relocation section when building with CONFIG_RELOCATABLE=y (by about > 3 MB for ppc64_defconfig) > > Signed-off-by: Ard Biesheuvel> --- > > Results for ppc64_defconfig: > > BEFORE: > === > $ size vmlinux >text data bss dec hex filename > 19827996 2008456 849612 2268606415a2970 vmlinux > > $ readelf -S .tmp_kallsyms2.o > There are 9 section headers, starting at offset 0x4513f8: > > Section Headers: > [Nr] Name Type Address Offset >Size EntSize Flags Link Info Align > ... > [ 4] .rodata PROGBITS 0100 >001fcf00 A 0 0 256 > [ 5] .rela.rodata RELA 001fd1d8 >00254220 0018 I 7 4 8 > [ 6] .shstrtab STRTAB 001fd000 >0039 0 0 1 > ... > > $ ls -l arch/powerpc/boot/zImage > -rwxrwxr-x 2 ard ard 7533160 Jan 20 08:43 arch/powerpc/boot/zImage > > AFTER: > == > $ size vmlinux >text data bss dec hex filename > 16979516 2009992 849612 1983912012eb890 vmlinux > > $ readelf -S .tmp_kallsyms2.o > There are 8 section headers, starting at offset 0x199bb0: > > Section Headers: > [Nr] Name Type Address Offset >Size EntSize Flags Link Info Align > ... > [ 4] .rodata PROGBITS 0100 >00199900 A 0 0 256 > [ 5] .shstrtab STRTAB 00199a00 >0034 0 0 1 > ... > > $ ls -l arch/powerpc/boot/zImage > -rwxrwxr-x 2 ard ard 6985672 Jan 20 08:45 arch/powerpc/boot/zImage Nice space saving, thanks very much. I've booted this on a bunch of machines and it seems to be working fine. Tested-by: Michael Ellerman (powerpc) cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: remove newly added extra definition of pmd_dirty
On Thu, 2016-01-21 at 13:05 +1100, Stephen Rothwell wrote: > Commit d5d6a443b243 ("arch/powerpc/include/asm/pgtable-ppc64.h: > add pmd_[dirty|mkclean] for THP") added a new identical definition > of pdm_dirty. Remove it again. > > Cc: Minchan Kim> Cc: Andrew Morton > Signed-off-by: Stephen Rothwell Thanks. I have a couple of other things to send to Linus so I'll pick this up. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: remove newly added extra definition of pmd_dirty
Commit d5d6a443b243 ("arch/powerpc/include/asm/pgtable-ppc64.h: add pmd_[dirty|mkclean] for THP") added a new identical definition of pdm_dirty. Remove it again. Cc: Minchan KimCc: Andrew Morton Signed-off-by: Stephen Rothwell --- arch/powerpc/include/asm/book3s/64/pgtable.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 8204b0c393aa..8d1c41d28318 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -223,7 +223,6 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd) #define pmd_pfn(pmd) pte_pfn(pmd_pte(pmd)) #define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd)) #define pmd_young(pmd) pte_young(pmd_pte(pmd)) -#define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd)) #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd))) #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) -- 2.7.0.rc3 -- Cheers, Stephen Rothwells...@canb.auug.org.au ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: remove newly added extra definition of pmd_dirty
On Thu, Jan 21, 2016 at 01:05:20PM +1100, Stephen Rothwell wrote: > Commit d5d6a443b243 ("arch/powerpc/include/asm/pgtable-ppc64.h: > add pmd_[dirty|mkclean] for THP") added a new identical definition > of pdm_dirty. Remove it again. > > Cc: Minchan Kim> Cc: Andrew Morton > Signed-off-by: Stephen Rothwell Thanks for the fix! ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] support for text-relative kallsyms table
Ard Biesheuvelwrites: > This implements text-relative kallsyms address tables. This was developed > as part of my series to implement KASLR/CONFIG_RELOCATABLE for arm64, but > I think it may be beneficial to other architectures as well, so I am > presenting it as a separate series. Nice work! AFAICT this should work for every arch, as long as they start with _text (esp: data and init must be > _text). In addition, it's not harmful on 32 bit archs. IOW, I'd like to turn it on for everyone and discard some code. But it's easier to roll in like you've done first. Should we enable it by default for every arch for now, and see what happens? Thanks! Rusty. > The idea is that on 64-bit builds, it is rather wasteful to use absolute > addressing for kernel symbols since they are all within a couple of MBs > of each other. On top of that, the absolute addressing implies that, when > the kernel is relocated at runtime, each address in the table needs to be > fixed up individually. > > Since all section-relative addresses are already emitted relative to _text, > it is quite straight-forward to record only the offset, and add the absolute > address of _text at runtime when referring to the address table. > > The reduction ranges from around 250 KB uncompressed vmlinux size and 10 KB > compressed size (s390) to 3 MB/500 KB for ppc64 (although, in the latter case, > the reduction in uncompressed size is primarily __init data) > > Kees Cook was so kind to test these against x86_64, and confirmed that KASLR > still operates as expected. > > Ard Biesheuvel (4): > kallsyms: add support for relative offsets in kallsyms address table > powerpc: enable text relative kallsyms for ppc64 > s390: enable text relative kallsyms for 64-bit targets > x86_64: enable text relative kallsyms for 64-bit targets > > arch/powerpc/Kconfig| 1 + > arch/s390/Kconfig | 1 + > arch/x86/Kconfig| 1 + > init/Kconfig| 14 > kernel/kallsyms.c | 35 +- > scripts/kallsyms.c | 38 +--- > scripts/link-vmlinux.sh | 4 +++ > scripts/namespace.pl| 1 + > 8 files changed, 82 insertions(+), 13 deletions(-) > > -- > 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc: remove newly added extra definition of pmd_dirty
On Thu, 2016-21-01 at 02:05:20 UTC, Stephen Rothwell wrote: > Commit d5d6a443b243 ("arch/powerpc/include/asm/pgtable-ppc64.h: > add pmd_[dirty|mkclean] for THP") added a new identical definition > of pdm_dirty. Remove it again. > > Cc: Minchan Kim> Cc: Andrew Morton > Signed-off-by: Stephen Rothwell Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/0e2bce7411542fa336ef490414 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO devices or emulated PCI. These calls allow adding multiple entries (up to 512) into the TCE table in one call which saves time on transition between kernel and user space. This implements the KVM_CAP_PPC_MULTITCE capability. When present, the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE. If they can not be handled by the kernel, they are passed on to the user space. The user space still has to have an implementation for these. Both HV and PR-syle KVM are supported. Signed-off-by: Alexey Kardashevskiy--- Changes: v2: * compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero * s/~IOMMU_PAGE_MASK_4K/SZ_4K-1/ when testing @tce_list --- Documentation/virtual/kvm/api.txt | 25 ++ arch/powerpc/include/asm/kvm_ppc.h | 12 +++ arch/powerpc/kvm/book3s_64_vio.c| 110 +++- arch/powerpc/kvm/book3s_64_vio_hv.c | 145 ++-- arch/powerpc/kvm/book3s_hv.c| 26 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 +- arch/powerpc/kvm/book3s_pr_papr.c | 35 arch/powerpc/kvm/powerpc.c | 3 + 8 files changed, 349 insertions(+), 13 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 07e4cdf..da39435 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3035,6 +3035,31 @@ Returns: 0 on success, -1 on error Queues an SMI on the thread's vcpu. +4.97 KVM_CAP_PPC_MULTITCE + +Capability: KVM_CAP_PPC_MULTITCE +Architectures: ppc +Type: vm + +This capability means the kernel is capable of handling hypercalls +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user +space. This significantly accelerates DMA operations for PPC KVM guests. +User space should expect that its handlers for these hypercalls +are not going to be called if user space previously registered LIOBN +in KVM (via KVM_CREATE_SPAPR_TCE or similar calls). + +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest, +user space might have to advertise it for the guest. For example, +IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is +present in the "ibm,hypertas-functions" device-tree property. + +The hypercalls mentioned above may or may not be processed successfully +in the kernel based fast path. If they can not be handled by the kernel, +they will get passed on to user space. So user space still has to have +an implementation for these despite the in kernel acceleration. + +This capability is always enabled. + 5. The kvm_run structure diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 9513911..4cadee5 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); +extern struct kvmppc_spapr_tce_table *kvmppc_find_table( + struct kvm_vcpu *vcpu, unsigned long liobn); extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, unsigned long ioba, unsigned long npages); extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt, unsigned long tce); +extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa, + unsigned long *ua, unsigned long **prmap); +extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt, + unsigned long idx, unsigned long tce); extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba, unsigned long tce); +extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu, + unsigned long liobn, unsigned long ioba, + unsigned long tce_list, unsigned long npages); +extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu, + unsigned long liobn, unsigned long ioba, + unsigned long tce_value, unsigned long npages); extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba); extern struct page *kvm_alloc_hpt(unsigned long nr_pages); diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index 975f0ab..987f406 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -14,6 +14,7 @@ * * Copyright 2010 Paul Mackerras, IBM Corp. * Copyright 2011 David Gibson, IBM Corporation + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation */ #include @@ -37,8 +38,7 @@ #include #include #include - -#define TCES_PER_PAGE
[PATCH kernel v2 0/6] KVM: PPC: Add in-kernel multitce handling
These patches enable in-kernel acceleration for H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls which allow doing multiple (up to 512) TCE entries update in a single call saving time on switching context. QEMU already supports these hypercalls so this is just an optimization. Both HV and PR KVM modes are supported. This does not affect VFIO, this support is coming next. This depends on "powerpc: Make vmalloc_to_phys() public". Please comment. Thanks. Alexey Kardashevskiy (6): KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers KVM: PPC: Use RCU for arch.spapr_tce_tables KVM: PPC: Account TCE-containing pages in locked_vm KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers KVM: PPC: Add support for multiple-TCE hcalls Documentation/virtual/kvm/api.txt| 25 +++ arch/powerpc/include/asm/kvm_book3s_64.h | 2 - arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/include/asm/kvm_ppc.h | 16 ++ arch/powerpc/kvm/book3s.c| 2 +- arch/powerpc/kvm/book3s_64_vio.c | 188 -- arch/powerpc/kvm/book3s_64_vio_hv.c | 318 ++- arch/powerpc/kvm/book3s_hv.c | 26 ++- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 +- arch/powerpc/kvm/book3s_pr_papr.c| 35 arch/powerpc/kvm/powerpc.c | 3 + 11 files changed, 557 insertions(+), 65 deletions(-) -- 2.5.0.rc3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH kernel v2 4/6] KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K
SPAPR_TCE_SHIFT is used in few places only and since IOMMU_PAGE_SHIFT_4K can be easily used instead, remove SPAPR_TCE_SHIFT. Signed-off-by: Alexey KardashevskiyReviewed-by: David Gibson --- arch/powerpc/include/asm/kvm_book3s_64.h | 2 -- arch/powerpc/kvm/book3s_64_vio.c | 3 ++- arch/powerpc/kvm/book3s_64_vio_hv.c | 4 ++-- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 2aa79c8..7529aab 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -33,8 +33,6 @@ static inline void svcpu_put(struct kvmppc_book3s_shadow_vcpu *svcpu) } #endif -#define SPAPR_TCE_SHIFT12 - #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE #define KVM_DEFAULT_HPT_ORDER 24 /* 16MB HPT by default */ #endif diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index ea498b4..975f0ab 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -36,12 +36,13 @@ #include #include #include +#include #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64)) static unsigned long kvmppc_stt_npages(unsigned long window_size) { - return ALIGN((window_size >> SPAPR_TCE_SHIFT) + return ALIGN((window_size >> IOMMU_PAGE_SHIFT_4K) * sizeof(u64), PAGE_SIZE) / PAGE_SIZE; } diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index 862f9a2..e142171 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -99,7 +99,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, if (ret != H_SUCCESS) return ret; - idx = ioba >> SPAPR_TCE_SHIFT; + idx = ioba >> IOMMU_PAGE_SHIFT_4K; page = stt->pages[idx / TCES_PER_PAGE]; tbl = (u64 *)page_address(page); @@ -127,7 +127,7 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, if (ret != H_SUCCESS) return ret; - idx = ioba >> SPAPR_TCE_SHIFT; + idx = ioba >> IOMMU_PAGE_SHIFT_4K; page = stt->pages[idx / TCES_PER_PAGE]; tbl = (u64 *)page_address(page); -- 2.5.0.rc3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH kernel v2 3/6] KVM: PPC: Account TCE-containing pages in locked_vm
At the moment pages used for TCE tables (in addition to pages addressed by TCEs) are not counted in locked_vm counter so a malicious userspace tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and lock a lot of memory. This adds counting for pages used for TCE tables. This counts the number of pages required for a table plus pages for the kvmppc_spapr_tce_table struct (TCE table descriptor) itself. This changes release_spapr_tce_table() to store @npages on stack to avoid calling kvmppc_stt_npages() in the loop (tiny optimization, probably). This does not change the amount of (de)allocated memory. Signed-off-by: Alexey Kardashevskiy--- Changes: v2: * switched from long to unsigned long types * added WARN_ON_ONCE() in locked_vm decrement case --- arch/powerpc/kvm/book3s_64_vio.c | 55 +--- 1 file changed, 52 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index 9526c34..ea498b4 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -39,19 +39,62 @@ #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64)) -static long kvmppc_stt_npages(unsigned long window_size) +static unsigned long kvmppc_stt_npages(unsigned long window_size) { return ALIGN((window_size >> SPAPR_TCE_SHIFT) * sizeof(u64), PAGE_SIZE) / PAGE_SIZE; } +static long kvmppc_account_memlimit(unsigned long npages, bool inc) +{ + long ret = 0; + const unsigned long bytes = sizeof(struct kvmppc_spapr_tce_table) + + (npages * sizeof(struct page *)); + const unsigned long stt_pages = ALIGN(bytes, PAGE_SIZE) / PAGE_SIZE; + + if (!current || !current->mm) + return ret; /* process exited */ + + npages += stt_pages; + + down_write(>mm->mmap_sem); + + if (inc) { + unsigned long locked, lock_limit; + + locked = current->mm->locked_vm + npages; + lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + if (locked > lock_limit && !capable(CAP_IPC_LOCK)) + ret = -ENOMEM; + else + current->mm->locked_vm += npages; + } else { + if (WARN_ON_ONCE(npages > current->mm->locked_vm)) + npages = current->mm->locked_vm; + + current->mm->locked_vm -= npages; + } + + pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid, + inc ? '+' : '-', + npages << PAGE_SHIFT, + current->mm->locked_vm << PAGE_SHIFT, + rlimit(RLIMIT_MEMLOCK), + ret ? " - exceeded" : ""); + + up_write(>mm->mmap_sem); + + return ret; +} + static void release_spapr_tce_table(struct rcu_head *head) { struct kvmppc_spapr_tce_table *stt = container_of(head, struct kvmppc_spapr_tce_table, rcu); int i; + unsigned long npages = kvmppc_stt_npages(stt->window_size); - for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++) + for (i = 0; i < npages; i++) __free_page(stt->pages[i]); kfree(stt); @@ -89,6 +132,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp) kvm_put_kvm(stt->kvm); + kvmppc_account_memlimit(kvmppc_stt_npages(stt->window_size), false); call_rcu(>rcu, release_spapr_tce_table); return 0; @@ -103,7 +147,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args) { struct kvmppc_spapr_tce_table *stt = NULL; - long npages; + unsigned long npages; int ret = -ENOMEM; int i; @@ -114,6 +158,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, } npages = kvmppc_stt_npages(args->window_size); + ret = kvmppc_account_memlimit(npages, true); + if (ret) { + stt = NULL; + goto fail; + } stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *), GFP_KERNEL); -- 2.5.0.rc3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have following patches applied nicer. This moves the ioba boundaries check to a helper and adds a check for least bits which have to be zeros. The patch is pretty mechanical (only check for least ioba bits is added) so no change in behaviour is expected. Signed-off-by: Alexey Kardashevskiy--- Changelog: v2: * compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero * made error reporting cleaner --- arch/powerpc/kvm/book3s_64_vio_hv.c | 111 +++- 1 file changed, 72 insertions(+), 39 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index 89e96b3..862f9a2 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -35,71 +35,104 @@ #include #include #include +#include #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64)) +/* + * Finds a TCE table descriptor by LIOBN. + * + * WARNING: This will be called in real or virtual mode on HV KVM and virtual + * mode on PR KVM + */ +static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu, + unsigned long liobn) +{ + struct kvm *kvm = vcpu->kvm; + struct kvmppc_spapr_tce_table *stt; + + list_for_each_entry_lockless(stt, >arch.spapr_tce_tables, list) + if (stt->liobn == liobn) + return stt; + + return NULL; +} + +/* + * Validates IO address. + * + * WARNING: This will be called in real-mode on HV KVM and virtual + * mode on PR KVM + */ +static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, + unsigned long ioba, unsigned long npages) +{ + unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1; + unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K; + unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K; + + if ((ioba & mask) || (idx + npages > size)) + return H_PARAMETER; + + return H_SUCCESS; +} + /* WARNING: This will be called in real-mode on HV KVM and virtual * mode on PR KVM */ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba, unsigned long tce) { - struct kvm *kvm = vcpu->kvm; - struct kvmppc_spapr_tce_table *stt; + struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn); + long ret; + unsigned long idx; + struct page *page; + u64 *tbl; /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */ /* liobn, ioba, tce); */ - list_for_each_entry(stt, >arch.spapr_tce_tables, list) { - if (stt->liobn == liobn) { - unsigned long idx = ioba >> SPAPR_TCE_SHIFT; - struct page *page; - u64 *tbl; + if (!stt) + return H_TOO_HARD; - /* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p window_size=0x%x\n", */ - /* liobn, stt, stt->window_size); */ - if (ioba >= stt->window_size) - return H_PARAMETER; + ret = kvmppc_ioba_validate(stt, ioba, 1); + if (ret != H_SUCCESS) + return ret; - page = stt->pages[idx / TCES_PER_PAGE]; - tbl = (u64 *)page_address(page); + idx = ioba >> SPAPR_TCE_SHIFT; + page = stt->pages[idx / TCES_PER_PAGE]; + tbl = (u64 *)page_address(page); - /* FIXME: Need to validate the TCE itself */ - /* udbg_printf("tce @ %p\n", [idx % TCES_PER_PAGE]); */ - tbl[idx % TCES_PER_PAGE] = tce; - return H_SUCCESS; - } - } + /* FIXME: Need to validate the TCE itself */ + /* udbg_printf("tce @ %p\n", [idx % TCES_PER_PAGE]); */ + tbl[idx % TCES_PER_PAGE] = tce; - /* Didn't find the liobn, punt it to userspace */ - return H_TOO_HARD; + return H_SUCCESS; } EXPORT_SYMBOL_GPL(kvmppc_h_put_tce); long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, - unsigned long ioba) + unsigned long ioba) { - struct kvm *kvm = vcpu->kvm; - struct kvmppc_spapr_tce_table *stt; + struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn); + long ret; + unsigned long idx; + struct page *page; + u64 *tbl; - list_for_each_entry(stt, >arch.spapr_tce_tables, list) { - if (stt->liobn == liobn) { - unsigned long idx = ioba >> SPAPR_TCE_SHIFT; - struct page *page; - u64 *tbl; + if (!stt) + return H_TOO_HARD; - if (ioba >= stt->window_size) -
[PATCH kernel v2 2/6] KVM: PPC: Use RCU for arch.spapr_tce_tables
At the moment spapr_tce_tables is not protected against races. This makes use of RCU-variants of list helpers. As some bits are executed in real mode, this makes use of just introduced list_for_each_entry_rcu_notrace(). This converts release_spapr_tce_table() to a RCU scheduled handler. Signed-off-by: Alexey KardashevskiyReviewed-by: David Gibson --- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kvm/book3s.c | 2 +- arch/powerpc/kvm/book3s_64_vio.c| 20 +++- 3 files changed, 13 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 271fefb..c7ee696 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -184,6 +184,7 @@ struct kvmppc_spapr_tce_table { struct kvm *kvm; u64 liobn; u32 window_size; + struct rcu_head rcu; struct page *pages[0]; }; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 638c6d9..b34220d 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -807,7 +807,7 @@ int kvmppc_core_init_vm(struct kvm *kvm) { #ifdef CONFIG_PPC64 - INIT_LIST_HEAD(>arch.spapr_tce_tables); + INIT_LIST_HEAD_RCU(>arch.spapr_tce_tables); INIT_LIST_HEAD(>arch.rtas_tokens); #endif diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index 54cf9bc..9526c34 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -45,19 +45,16 @@ static long kvmppc_stt_npages(unsigned long window_size) * sizeof(u64), PAGE_SIZE) / PAGE_SIZE; } -static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt) +static void release_spapr_tce_table(struct rcu_head *head) { - struct kvm *kvm = stt->kvm; + struct kvmppc_spapr_tce_table *stt = container_of(head, + struct kvmppc_spapr_tce_table, rcu); int i; - mutex_lock(>lock); - list_del(>list); for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++) __free_page(stt->pages[i]); + kfree(stt); - mutex_unlock(>lock); - - kvm_put_kvm(kvm); } static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault *vmf) @@ -88,7 +85,12 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp) { struct kvmppc_spapr_tce_table *stt = filp->private_data; - release_spapr_tce_table(stt); + list_del_rcu(>list); + + kvm_put_kvm(stt->kvm); + + call_rcu(>rcu, release_spapr_tce_table); + return 0; } @@ -131,7 +133,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, kvm_get_kvm(kvm); mutex_lock(>lock); - list_add(>list, >arch.spapr_tce_tables); + list_add_rcu(>list, >arch.spapr_tce_tables); mutex_unlock(>lock); -- 2.5.0.rc3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
Hi Alexey, [auto build test ERROR on kvm/linux-next] [also build test ERROR on v4.4 next-20160121] [if your patch is applied to the wrong git tree, please drop us a note to help improving the system] url: https://github.com/0day-ci/linux/commits/Alexey-Kardashevskiy/KVM-PPC-Add-in-kernel-multitce-handling/20160121-154336 base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next config: powerpc-allyesconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=powerpc All error/warnings (new ones prefixed by >>): arch/powerpc/kvm/book3s_64_vio_hv.c: In function 'kvmppc_find_table': arch/powerpc/kvm/book3s_64_vio_hv.c:58:2: error: implicit declaration of function 'list_for_each_entry_lockless' [-Werror=implicit-function-declaration] list_for_each_entry_lockless(stt, >arch.spapr_tce_tables, list) ^ arch/powerpc/kvm/book3s_64_vio_hv.c:58:65: error: 'list' undeclared (first use in this function) list_for_each_entry_lockless(stt, >arch.spapr_tce_tables, list) ^ arch/powerpc/kvm/book3s_64_vio_hv.c:58:65: note: each undeclared identifier is reported only once for each function it appears in arch/powerpc/kvm/book3s_64_vio_hv.c:59:3: error: expected ';' before 'if' if (stt->liobn == liobn) ^ arch/powerpc/kvm/book3s_64_vio_hv.c: In function 'kvmppc_rm_h_put_tce_indirect': >> arch/powerpc/kvm/book3s_64_vio_hv.c:263:18: error: implicit declaration of >> function 'vmalloc_to_phys' [-Werror=implicit-function-declaration] rmap = (void *) vmalloc_to_phys(rmap); ^ >> arch/powerpc/kvm/book3s_64_vio_hv.c:263:9: warning: cast to pointer from >> integer of different size [-Wint-to-pointer-cast] rmap = (void *) vmalloc_to_phys(rmap); ^ cc1: some warnings being treated as errors vim +/vmalloc_to_phys +263 arch/powerpc/kvm/book3s_64_vio_hv.c 257 if (ret != H_SUCCESS) 258 return ret; 259 260 if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , )) 261 return H_TOO_HARD; 262 > 263 rmap = (void *) vmalloc_to_phys(rmap); 264 265 lock_rmap(rmap); 266 if (kvmppc_rm_ua_to_hpa(vcpu, ua, )) { --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH kernel] powerpc: Make vmalloc_to_phys() public
This makes vmalloc_to_phys() public as there will be another user (in-kernel VFIO acceleration) for it soon. As a part of future little optimization, this changes the helper to call vmalloc_to_pfn() instead of vmalloc_to_page() as the size of the struct page may not be power-of-two aligned which will make gcc use multiply instructions instead of shifts. Signed-off-by: Alexey Kardashevskiy--- A couple of notes: 1. real_vmalloc_addr() will be reworked later by Paul separately; 2. the optimization note it not valid at the moment as vmalloc_to_pfn() calls vmalloc_to_page() which does the actual search; these helpers functionality will be swapped later (also, by Paul). --- arch/powerpc/include/asm/pgtable.h | 3 +++ arch/powerpc/mm/pgtable.c | 8 arch/powerpc/perf/hv-24x7.c| 8 3 files changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index ac9fb11..47897a3 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -78,6 +78,9 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, } return __find_linux_pte_or_hugepte(pgdir, ea, is_thp, shift); } + +unsigned long vmalloc_to_phys(void *vmalloc_addr); + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_PGTABLE_H */ diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index 83dfd79..de37ff4 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -243,3 +243,11 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) } #endif /* CONFIG_DEBUG_VM */ +unsigned long vmalloc_to_phys(void *va) +{ + unsigned long pfn = vmalloc_to_pfn(va); + + BUG_ON(!pfn); + return __pa(pfn_to_kaddr(pfn)) + offset_in_page(va); +} +EXPORT_SYMBOL_GPL(vmalloc_to_phys); diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 9f9dfda..3b09ecf 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -493,14 +493,6 @@ static size_t event_to_attr_ct(struct hv_24x7_event_data *event) } } -static unsigned long vmalloc_to_phys(void *v) -{ - struct page *p = vmalloc_to_page(v); - - BUG_ON(!p); - return page_to_phys(p) + offset_in_page(v); -} - /* */ struct event_uniq { struct rb_node node; -- 2.5.0.rc3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH kernel v2 5/6] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls) will validate TCE (not to have unexpected bits) and IO address (to be within the DMA window boundaries). This introduces helpers to validate TCE and IO address. The helpers are exported as they compile into vmlinux (to work in realmode) and will be used later by KVM kernel module in virtual mode. Signed-off-by: Alexey Kardashevskiy--- Changes: v2: * added note to the commit log about why new helpers are exported * did not add a note that xxx_validate() validate TCEs for KVM (not for host kernel DMA) as the helper names and file location tell what are they for --- arch/powerpc/include/asm/kvm_ppc.h | 4 ++ arch/powerpc/kvm/book3s_64_vio_hv.c | 92 - 2 files changed, 84 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 2241d53..9513911 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); +extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, + unsigned long ioba, unsigned long npages); +extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt, + unsigned long tce); extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba, unsigned long tce); extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index e142171..8cd3a95 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -36,6 +36,7 @@ #include #include #include +#include #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64)) @@ -64,18 +65,90 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu, * WARNING: This will be called in real-mode on HV KVM and virtual * mode on PR KVM */ -static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, +long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, unsigned long ioba, unsigned long npages) { - unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1; + unsigned long mask = IOMMU_PAGE_MASK_4K; unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K; unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K; - if ((ioba & mask) || (idx + npages > size)) + if ((ioba & ~mask) || (idx + npages > size)) return H_PARAMETER; return H_SUCCESS; } +EXPORT_SYMBOL_GPL(kvmppc_ioba_validate); + +/* + * Validates TCE address. + * At the moment flags and page mask are validated. + * As the host kernel does not access those addresses (just puts them + * to the table and user space is supposed to process them), we can skip + * checking other things (such as TCE is a guest RAM address or the page + * was actually allocated). + * + * WARNING: This will be called in real-mode on HV KVM and virtual + * mode on PR KVM + */ +long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce) +{ + unsigned long mask = IOMMU_PAGE_MASK_4K | TCE_PCI_WRITE | TCE_PCI_READ; + + if (tce & ~mask) + return H_PARAMETER; + + return H_SUCCESS; +} +EXPORT_SYMBOL_GPL(kvmppc_tce_validate); + +/* Note on the use of page_address() in real mode, + * + * It is safe to use page_address() in real mode on ppc64 because + * page_address() is always defined as lowmem_page_address() + * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial + * operation and does not access page struct. + * + * Theoretically page_address() could be defined different + * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL + * should be enabled. + * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64, + * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only + * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP + * is not expected to be enabled on ppc32, page_address() + * is safe for ppc32 as well. + * + * WARNING: This will be called in real-mode on HV KVM and virtual + * mode on PR KVM + */ +static u64 *kvmppc_page_address(struct page *page) +{ +#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL) +#error TODO: fix to avoid page_address() here +#endif + return (u64 *) page_address(page); +} + +/* + * Handles TCE requests for emulated devices. + * Puts guest TCE values to the table and expects user space to convert them. + * Called in both real and virtual modes. + * Cannot fail so kvmppc_tce_validate must be called before it. + * + * WARNING: This will be called in real-mode on HV KVM and virtual + * mode on PR KVM + */
[PATCH v3 4/9] powerpc: Explicitly disable math features when copying thread
Currently when threads get scheduled off they always giveup the FPU, Altivec (VMX) and Vector (VSX) units if they were using them. When they are scheduled back on a fault is then taken to enable each facility and load registers. As a result explicitly disabling FPU/VMX/VSX has not been necessary. Future changes and optimisations remove this mandatory giveup and fault which could cause calls such as clone() and fork() to copy threads and run them later with FPU/VMX/VSX enabled but no registers loaded. This patch starts the process of having MSR_{FP,VEC,VSX} mean that a threads registers are hot while not having MSR_{FP,VEC,VSX} means that the registers must be loaded. This allows for a smarter return to userspace. Signed-off-by: Cyril Bur--- arch/powerpc/kernel/process.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index dccc87e..e0c3d2d 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1307,6 +1307,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, f = ret_from_fork; } + childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX); sp -= STACK_FRAME_OVERHEAD; /* -- 2.7.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 1/9] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall
Test that the non volatile floating point and Altivec registers get correctly preserved across the fork() syscall. fork() works nicely for this purpose, the registers should be the same for both parent and child Signed-off-by: Cyril Bur--- tools/testing/selftests/powerpc/Makefile | 3 +- tools/testing/selftests/powerpc/basic_asm.h| 26 +++ tools/testing/selftests/powerpc/math/.gitignore| 2 + tools/testing/selftests/powerpc/math/Makefile | 14 ++ tools/testing/selftests/powerpc/math/fpu_asm.S | 161 + tools/testing/selftests/powerpc/math/fpu_syscall.c | 90 ++ tools/testing/selftests/powerpc/math/vmx_asm.S | 193 + tools/testing/selftests/powerpc/math/vmx_syscall.c | 92 ++ 8 files changed, 580 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/basic_asm.h create mode 100644 tools/testing/selftests/powerpc/math/.gitignore create mode 100644 tools/testing/selftests/powerpc/math/Makefile create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile index 0c2706b..19e8191 100644 --- a/tools/testing/selftests/powerpc/Makefile +++ b/tools/testing/selftests/powerpc/Makefile @@ -22,7 +22,8 @@ SUB_DIRS = benchmarks \ switch_endian\ syscalls \ tm \ - vphn + vphn \ + math endif diff --git a/tools/testing/selftests/powerpc/basic_asm.h b/tools/testing/selftests/powerpc/basic_asm.h new file mode 100644 index 000..27aca79 --- /dev/null +++ b/tools/testing/selftests/powerpc/basic_asm.h @@ -0,0 +1,26 @@ +#include +#include + +#define LOAD_REG_IMMEDIATE(reg,expr) \ + lis reg,(expr)@highest; \ + ori reg,reg,(expr)@higher; \ + rldicr reg,reg,32,31; \ + orisreg,reg,(expr)@high;\ + ori reg,reg,(expr)@l; + +#define PUSH_BASIC_STACK(size) \ + std 2,24(sp); \ + mflrr0; \ + std r0,16(sp); \ + mfcrr0; \ + stw r0,8(sp); \ + stdusp,-size(sp); + +#define POP_BASIC_STACK(size) \ + addisp,sp,size; \ + ld 2,24(sp); \ + ld r0,16(sp); \ + mtlrr0; \ + lwz r0,8(sp); \ + mtcrr0; \ + diff --git a/tools/testing/selftests/powerpc/math/.gitignore b/tools/testing/selftests/powerpc/math/.gitignore new file mode 100644 index 000..b19b269 --- /dev/null +++ b/tools/testing/selftests/powerpc/math/.gitignore @@ -0,0 +1,2 @@ +fpu_syscall +vmx_syscall diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile new file mode 100644 index 000..418bef1 --- /dev/null +++ b/tools/testing/selftests/powerpc/math/Makefile @@ -0,0 +1,14 @@ +TEST_PROGS := fpu_syscall vmx_syscall + +all: $(TEST_PROGS) + +$(TEST_PROGS): ../harness.c +$(TEST_PROGS): CFLAGS += -O2 -g -pthread -m64 -maltivec + +fpu_syscall: fpu_asm.S +vmx_syscall: vmx_asm.S + +include ../../lib.mk + +clean: + rm -f $(TEST_PROGS) *.o diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S b/tools/testing/selftests/powerpc/math/fpu_asm.S new file mode 100644 index 000..8733874 --- /dev/null +++ b/tools/testing/selftests/powerpc/math/fpu_asm.S @@ -0,0 +1,161 @@ +/* + * Copyright 2015, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include "../basic_asm.h" + +#define PUSH_FPU(pos) \ + stfdf14,pos(sp); \ + stfdf15,pos+8(sp); \ + stfdf16,pos+16(sp); \ + stfdf17,pos+24(sp); \ + stfdf18,pos+32(sp); \ + stfdf19,pos+40(sp); \ + stfdf20,pos+48(sp); \ + stfdf21,pos+56(sp); \ + stfdf22,pos+64(sp); \ + stfdf23,pos+72(sp); \ + stfdf24,pos+80(sp); \ + stfdf25,pos+88(sp); \ + stfdf26,pos+96(sp); \ + stfdf27,pos+104(sp); \ + stfdf28,pos+112(sp); \ + stfdf29,pos+120(sp); \ + stfdf30,pos+128(sp); \ + stfdf31,pos+136(sp); + +#define POP_FPU(pos) \ + lfd f14,pos(sp); \ + lfd f15,pos+8(sp); \ + lfd f16,pos+16(sp); \ + lfd f17,pos+24(sp); \ + lfd f18,pos+32(sp); \ + lfd f19,pos+40(sp); \ + lfd f20,pos+48(sp); \ + lfd f21,pos+56(sp); \ + lfd f22,pos+64(sp); \ + lfd f23,pos+72(sp);
[PATCH v3 5/9] powerpc: Restore FPU/VEC/VSX if previously used
Currently the FPU, VEC and VSX facilities are lazily loaded. This is not a problem unless a process is using these facilities. Modern versions of GCC are very good at automatically vectorising code, new and modernised workloads make use of floating point and vector facilities, even the kernel makes use of vectorised memcpy. All this combined greatly increases the cost of a syscall since the kernel uses the facilities sometimes even in syscall fast-path making it increasingly common for a thread to take an *_unavailable exception soon after a syscall, not to mention potentially taking all three. The obvious overcompensation to this problem is to simply always load all the facilities on every exit to userspace. Loading up all FPU, VEC and VSX registers every time can be expensive and if a workload does avoid using them, it should not be forced to incur this penalty. An 8bit counter is used to detect if the registers have been used in the past and the registers are always loaded until the value wraps to back to zero. Several versions of the assembly in entry_64.S. 1. Always calling C, 2. Performing a common case check and then calling C and 3. A complex check in asm. After some benchmarking it was determined that avoiding C in the common case is a performance benefit. The full check in asm greatly complicated that codepath for a negligible performance gain and the trade-off was deemed not worth it. Signed-off-by: Cyril Bur--- arch/powerpc/include/asm/processor.h | 2 + arch/powerpc/kernel/asm-offsets.c| 2 + arch/powerpc/kernel/entry_64.S | 21 +++-- arch/powerpc/kernel/fpu.S| 4 ++ arch/powerpc/kernel/process.c| 84 +++- arch/powerpc/kernel/vector.S | 4 ++ 6 files changed, 103 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index ac23308..dcab21f 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -236,11 +236,13 @@ struct thread_struct { #endif struct arch_hw_breakpoint hw_brk; /* info on the hardware breakpoint */ unsigned long trap_nr;/* last trap # on this thread */ + u8 load_fp; #ifdef CONFIG_ALTIVEC struct thread_vr_state vr_state; struct thread_vr_state *vr_save_area; unsigned long vrsave; int used_vr;/* set if process has used altivec */ + u8 load_vec; #endif /* CONFIG_ALTIVEC */ #ifdef CONFIG_VSX /* VSR status */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 07cebc3..10d5eab 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -95,12 +95,14 @@ int main(void) DEFINE(THREAD_FPSTATE, offsetof(struct thread_struct, fp_state)); DEFINE(THREAD_FPSAVEAREA, offsetof(struct thread_struct, fp_save_area)); DEFINE(FPSTATE_FPSCR, offsetof(struct thread_fp_state, fpscr)); + DEFINE(THREAD_LOAD_FP, offsetof(struct thread_struct, load_fp)); #ifdef CONFIG_ALTIVEC DEFINE(THREAD_VRSTATE, offsetof(struct thread_struct, vr_state)); DEFINE(THREAD_VRSAVEAREA, offsetof(struct thread_struct, vr_save_area)); DEFINE(THREAD_VRSAVE, offsetof(struct thread_struct, vrsave)); DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr)); DEFINE(VRSTATE_VSCR, offsetof(struct thread_vr_state, vscr)); + DEFINE(THREAD_LOAD_VEC, offsetof(struct thread_struct, load_vec)); #endif /* CONFIG_ALTIVEC */ #ifdef CONFIG_VSX DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr)); diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 0d525ce..038e0a1 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -210,7 +210,20 @@ system_call: /* label this so stack traces look sane */ li r11,-MAX_ERRNO andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) bne-syscall_exit_work - cmpld r3,r11 + + andi. r0,r8,MSR_FP + beq 2f +#ifdef CONFIG_ALTIVEC + andis. r0,r8,MSR_VEC@h + bne 3f +#endif +2: addir3,r1,STACK_FRAME_OVERHEAD + bl restore_math + ld r8,_MSR(r1) + ld r3,RESULT(r1) + li r11,-MAX_ERRNO + +3: cmpld r3,r11 ld r5,_CCR(r1) bge-syscall_error .Lsyscall_error_cont: @@ -602,8 +615,8 @@ _GLOBAL(ret_from_except_lite) /* Check current_thread_info()->flags */ andi. r0,r4,_TIF_USER_WORK_MASK -#ifdef CONFIG_PPC_BOOK3E bne 1f +#ifdef CONFIG_PPC_BOOK3E /* * Check to see if the dbcr0 register is set up to debug. * Use the internal debug mode bit to do this. @@ -618,7 +631,9 @@ _GLOBAL(ret_from_except_lite) mtspr
[PATCH v3 9/9] powerpc: Add the ability to save VSX without giving it up
This patch adds the ability to be able to save the VSX registers to the thread struct without giving up (disabling the facility) next time the process returns to userspace. This patch builds on a previous optimisation for the FPU and VEC registers in the thread copy path to avoid a possibly pointless reload of VSX state. Signed-off-by: Cyril Bur--- arch/powerpc/include/asm/switch_to.h | 4 arch/powerpc/kernel/ppc_ksyms.c | 4 arch/powerpc/kernel/process.c| 42 +--- arch/powerpc/kernel/vector.S | 17 --- 4 files changed, 30 insertions(+), 37 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 9028822..17c8380 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -56,14 +56,10 @@ static inline void __giveup_altivec(struct task_struct *t) { } #ifdef CONFIG_VSX extern void enable_kernel_vsx(void); extern void flush_vsx_to_thread(struct task_struct *); -extern void giveup_vsx(struct task_struct *); -extern void __giveup_vsx(struct task_struct *); static inline void disable_kernel_vsx(void) { msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX); } -#else -static inline void __giveup_vsx(struct task_struct *t) { } #endif #ifdef CONFIG_SPE diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c index 41e1607..ef7024da 100644 --- a/arch/powerpc/kernel/ppc_ksyms.c +++ b/arch/powerpc/kernel/ppc_ksyms.c @@ -28,10 +28,6 @@ EXPORT_SYMBOL(load_vr_state); EXPORT_SYMBOL(store_vr_state); #endif -#ifdef CONFIG_VSX -EXPORT_SYMBOL_GPL(__giveup_vsx); -#endif - #ifdef CONFIG_EPAPR_PARAVIRT EXPORT_SYMBOL(epapr_hypercall_start); #endif diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index fef5b7d..7c3dd30 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -280,19 +280,31 @@ static inline int restore_altivec(struct task_struct *tsk) { return 0; } #endif /* CONFIG_ALTIVEC */ #ifdef CONFIG_VSX -void giveup_vsx(struct task_struct *tsk) +static void __giveup_vsx(struct task_struct *tsk) { - check_if_tm_restore_required(tsk); - - msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX); if (tsk->thread.regs->msr & MSR_FP) __giveup_fpu(tsk); if (tsk->thread.regs->msr & MSR_VEC) __giveup_altivec(tsk); + tsk->thread.regs->msr &= ~MSR_VSX; +} + +static void giveup_vsx(struct task_struct *tsk) +{ + check_if_tm_restore_required(tsk); + + msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX); __giveup_vsx(tsk); msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX); } -EXPORT_SYMBOL(giveup_vsx); + +static void save_vsx(struct task_struct *tsk) +{ + if (tsk->thread.regs->msr & MSR_FP) + save_fpu(tsk); + if (tsk->thread.regs->msr & MSR_VEC) + save_altivec(tsk); +} void enable_kernel_vsx(void) { @@ -331,6 +343,7 @@ static int restore_vsx(struct task_struct *tsk) } #else static inline int restore_vsx(struct task_struct *tsk) { return 0; } +static inline void save_vsx(struct task_struct *tsk) { } #endif /* CONFIG_VSX */ #ifdef CONFIG_SPE @@ -482,14 +495,19 @@ void save_all(struct task_struct *tsk) msr_check_and_set(msr_all_available); - if (usermsr & MSR_FP) - save_fpu(tsk); - - if (usermsr & MSR_VEC) - save_altivec(tsk); + /* +* Saving the way the register space is in hardware, save_vsx boils +* down to a save_fpu() and save_altivec() +*/ + if (usermsr & MSR_VSX) { + save_vsx(tsk); + } else { + if (usermsr & MSR_FP) + save_fpu(tsk); - if (usermsr & MSR_VSX) - __giveup_vsx(tsk); + if (usermsr & MSR_VEC) + save_altivec(tsk); + } if (usermsr & MSR_SPE) __giveup_spe(tsk); diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S index 51b0c17..1c2e7a3 100644 --- a/arch/powerpc/kernel/vector.S +++ b/arch/powerpc/kernel/vector.S @@ -151,23 +151,6 @@ _GLOBAL(load_up_vsx) std r12,_MSR(r1) b fast_exception_return -/* - * __giveup_vsx(tsk) - * Disable VSX for the task given as the argument. - * Does NOT save vsx registers. - */ -_GLOBAL(__giveup_vsx) - addir3,r3,THREAD/* want THREAD of task */ - ld r5,PT_REGS(r3) - cmpdi 0,r5,0 - beq 1f - ld r4,_MSR-STACK_FRAME_OVERHEAD(r5) - lis r3,MSR_VSX@h - andcr4,r4,r3/* disable VSX for previous task */ - std r4,_MSR-STACK_FRAME_OVERHEAD(r5) -1: - blr - #endif /* CONFIG_VSX */ -- 2.7.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org
[PATCH v3 8/9] powerpc: Add the ability to save Altivec without giving it up
This patch adds the ability to be able to save the VEC registers to the thread struct without giving up (disabling the facility) next time the process returns to userspace. This patch builds on a previous optimisation for the FPU registers in the thread copy path to avoid a possibly pointless reload of VEC state. Signed-off-by: Cyril Bur--- arch/powerpc/include/asm/switch_to.h | 3 ++- arch/powerpc/kernel/process.c| 12 +++- arch/powerpc/kernel/vector.S | 24 3 files changed, 17 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 6a201e8..9028822 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -43,12 +43,13 @@ static inline void flush_fp_to_thread(struct task_struct *t) { } extern void enable_kernel_altivec(void); extern void flush_altivec_to_thread(struct task_struct *); extern void giveup_altivec(struct task_struct *); -extern void __giveup_altivec(struct task_struct *); +extern void save_altivec(struct task_struct *); static inline void disable_kernel_altivec(void) { msr_check_and_clear(MSR_VEC); } #else +static inline void save_altivec(struct task_struct *t) { } static inline void __giveup_altivec(struct task_struct *t) { } #endif diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index a3411ce..fef5b7d 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -213,6 +213,16 @@ static int restore_fp(struct task_struct *tsk) { return 0; } #ifdef CONFIG_ALTIVEC #define loadvec(thr) ((thr).load_vec) +static void __giveup_altivec(struct task_struct *tsk) +{ + save_altivec(tsk); + tsk->thread.regs->msr &= ~MSR_VEC; +#ifdef CONFIG_VSX + if (cpu_has_feature(CPU_FTR_VSX)) + tsk->thread.regs->msr &= ~MSR_VSX; +#endif +} + void giveup_altivec(struct task_struct *tsk) { check_if_tm_restore_required(tsk); @@ -476,7 +486,7 @@ void save_all(struct task_struct *tsk) save_fpu(tsk); if (usermsr & MSR_VEC) - __giveup_altivec(tsk); + save_altivec(tsk); if (usermsr & MSR_VSX) __giveup_vsx(tsk); diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S index 038cff8..51b0c17 100644 --- a/arch/powerpc/kernel/vector.S +++ b/arch/powerpc/kernel/vector.S @@ -106,36 +106,20 @@ _GLOBAL(load_up_altivec) blr /* - * __giveup_altivec(tsk) - * Disable VMX for the task given as the argument, - * and save the vector registers in its thread_struct. + * save_altivec(tsk) + * Save the vector registers to its thread_struct */ -_GLOBAL(__giveup_altivec) +_GLOBAL(save_altivec) addir3,r3,THREAD/* want THREAD of task */ PPC_LL r7,THREAD_VRSAVEAREA(r3) PPC_LL r5,PT_REGS(r3) PPC_LCMPI 0,r7,0 bne 2f addir7,r3,THREAD_VRSTATE -2: PPC_LCMPI 0,r5,0 - SAVE_32VRS(0,r4,r7) +2: SAVE_32VRS(0,r4,r7) mfvscr v0 li r4,VRSTATE_VSCR stvxv0,r4,r7 - beq 1f - PPC_LL r4,_MSR-STACK_FRAME_OVERHEAD(r5) -#ifdef CONFIG_VSX -BEGIN_FTR_SECTION - lis r3,(MSR_VEC|MSR_VSX)@h -FTR_SECTION_ELSE - lis r3,MSR_VEC@h -ALT_FTR_SECTION_END_IFSET(CPU_FTR_VSX) -#else - lis r3,MSR_VEC@h -#endif - andcr4,r4,r3/* disable FP for previous task */ - PPC_STL r4,_MSR-STACK_FRAME_OVERHEAD(r5) -1: blr #ifdef CONFIG_VSX -- 2.7.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 7/9] powerpc: Add the ability to save FPU without giving it up
This patch adds the ability to be able to save the FPU registers to the thread struct without giving up (disabling the facility) next time the process returns to userspace. This patch optimises the thread copy path (as a result of a fork() or clone()) so that the parent thread can return to userspace with hot registers avoiding a possibly pointless reload of FPU register state. Signed-off-by: Cyril Bur--- arch/powerpc/include/asm/switch_to.h | 3 ++- arch/powerpc/kernel/fpu.S| 21 - arch/powerpc/kernel/process.c| 12 +++- 3 files changed, 17 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 3690041..6a201e8 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -28,13 +28,14 @@ extern void giveup_all(struct task_struct *); extern void enable_kernel_fp(void); extern void flush_fp_to_thread(struct task_struct *); extern void giveup_fpu(struct task_struct *); -extern void __giveup_fpu(struct task_struct *); +extern void save_fpu(struct task_struct *); static inline void disable_kernel_fp(void) { msr_check_and_clear(MSR_FP); } #else static inline void __giveup_fpu(struct task_struct *t) { } +static inline void save_fpu(struct task_struct *t) { } static inline void flush_fp_to_thread(struct task_struct *t) { } #endif diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S index b063524..15da2b5 100644 --- a/arch/powerpc/kernel/fpu.S +++ b/arch/powerpc/kernel/fpu.S @@ -143,33 +143,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) blr /* - * __giveup_fpu(tsk) - * Disable FP for the task given as the argument, - * and save the floating-point registers in its thread_struct. + * save_fpu(tsk) + * Save the floating-point registers in its thread_struct. * Enables the FPU for use in the kernel on return. */ -_GLOBAL(__giveup_fpu) +_GLOBAL(save_fpu) addir3,r3,THREAD/* want THREAD of task */ PPC_LL r6,THREAD_FPSAVEAREA(r3) PPC_LL r5,PT_REGS(r3) PPC_LCMPI 0,r6,0 bne 2f addir6,r3,THREAD_FPSTATE -2: PPC_LCMPI 0,r5,0 - SAVE_32FPVSRS(0, R4, R6) +2: SAVE_32FPVSRS(0, R4, R6) mffsfr0 stfdfr0,FPSTATE_FPSCR(r6) - beq 1f - PPC_LL r4,_MSR-STACK_FRAME_OVERHEAD(r5) - li r3,MSR_FP|MSR_FE0|MSR_FE1 -#ifdef CONFIG_VSX -BEGIN_FTR_SECTION - orisr3,r3,MSR_VSX@h -END_FTR_SECTION_IFSET(CPU_FTR_VSX) -#endif - andcr4,r4,r3/* disable FP for previous task */ - PPC_STL r4,_MSR-STACK_FRAME_OVERHEAD(r5) -1: blr /* diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 45e37c0..a3411ce 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -133,6 +133,16 @@ void __msr_check_and_clear(unsigned long bits) EXPORT_SYMBOL(__msr_check_and_clear); #ifdef CONFIG_PPC_FPU +void __giveup_fpu(struct task_struct *tsk) +{ + save_fpu(tsk); + tsk->thread.regs->msr &= ~MSR_FP; +#ifdef CONFIG_VSX + if (cpu_has_feature(CPU_FTR_VSX)) + tsk->thread.regs->msr &= ~MSR_VSX; +#endif +} + void giveup_fpu(struct task_struct *tsk) { check_if_tm_restore_required(tsk); @@ -463,7 +473,7 @@ void save_all(struct task_struct *tsk) msr_check_and_set(msr_all_available); if (usermsr & MSR_FP) - __giveup_fpu(tsk); + save_fpu(tsk); if (usermsr & MSR_VEC) __giveup_altivec(tsk); -- 2.7.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 6/9] powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two
This prepares for the decoupling of saving {fpu,altivec,vsx} registers and marking {fpu,altivec,vsx} as being unused by a thread. Currently giveup_{fpu,altivec,vsx}() does both however optimisations to task switching can be made if these two operations are decoupled. save_all() will permit the saving of registers to thread structs and leave threads MSR with bits enabled. This patch introduces no functional change. Signed-off-by: Cyril Bur--- arch/powerpc/include/asm/switch_to.h | 7 +++ arch/powerpc/kernel/process.c| 39 +++- 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 5b268b6..3690041 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -34,6 +34,7 @@ static inline void disable_kernel_fp(void) msr_check_and_clear(MSR_FP); } #else +static inline void __giveup_fpu(struct task_struct *t) { } static inline void flush_fp_to_thread(struct task_struct *t) { } #endif @@ -46,6 +47,8 @@ static inline void disable_kernel_altivec(void) { msr_check_and_clear(MSR_VEC); } +#else +static inline void __giveup_altivec(struct task_struct *t) { } #endif #ifdef CONFIG_VSX @@ -57,6 +60,8 @@ static inline void disable_kernel_vsx(void) { msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX); } +#else +static inline void __giveup_vsx(struct task_struct *t) { } #endif #ifdef CONFIG_SPE @@ -68,6 +73,8 @@ static inline void disable_kernel_spe(void) { msr_check_and_clear(MSR_SPE); } +#else +static inline void __giveup_spe(struct task_struct *t) { } #endif static inline void clear_task_ebb(struct task_struct *t) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 0955b7c..45e37c0 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -349,6 +349,14 @@ void flush_spe_to_thread(struct task_struct *tsk) preempt_enable(); } } +#else +/* + * save_all() is going to test MSR_SPE, rather than pull in all the + * booke definitions all the time on a books kernel just ensure it exists + * but acts as a nop. + */ + +#define MSR_SPE 0 #endif /* CONFIG_SPE */ static unsigned long msr_all_available; @@ -440,12 +448,41 @@ void restore_math(struct pt_regs *regs) regs->msr = msr; } +void save_all(struct task_struct *tsk) +{ + unsigned long usermsr; + + if (!tsk->thread.regs) + return; + + usermsr = tsk->thread.regs->msr; + + if ((usermsr & msr_all_available) == 0) + return; + + msr_check_and_set(msr_all_available); + + if (usermsr & MSR_FP) + __giveup_fpu(tsk); + + if (usermsr & MSR_VEC) + __giveup_altivec(tsk); + + if (usermsr & MSR_VSX) + __giveup_vsx(tsk); + + if (usermsr & MSR_SPE) + __giveup_spe(tsk); + + msr_check_and_clear(msr_all_available); +} + void flush_all_to_thread(struct task_struct *tsk) { if (tsk->thread.regs) { preempt_disable(); BUG_ON(tsk != current); - giveup_all(tsk); + save_all(tsk); #ifdef CONFIG_SPE if (tsk->thread.regs->msr & MSR_SPE) -- 2.7.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 3/9] selftests/powerpc: Test FPU and VMX regs in signal ucontext
Load up the non volatile FPU and VMX regs and ensure that they are the expected value in a signal handler Signed-off-by: Cyril Bur--- tools/testing/selftests/powerpc/math/.gitignore | 2 + tools/testing/selftests/powerpc/math/Makefile | 4 +- tools/testing/selftests/powerpc/math/fpu_signal.c | 135 + tools/testing/selftests/powerpc/math/vmx_signal.c | 138 ++ 4 files changed, 278 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c diff --git a/tools/testing/selftests/powerpc/math/.gitignore b/tools/testing/selftests/powerpc/math/.gitignore index 1a6f09e..4fe13a4 100644 --- a/tools/testing/selftests/powerpc/math/.gitignore +++ b/tools/testing/selftests/powerpc/math/.gitignore @@ -2,3 +2,5 @@ fpu_syscall vmx_syscall fpu_preempt vmx_preempt +fpu_signal +vmx_signal diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile index b6f4158..5b88875 100644 --- a/tools/testing/selftests/powerpc/math/Makefile +++ b/tools/testing/selftests/powerpc/math/Makefile @@ -1,4 +1,4 @@ -TEST_PROGS := fpu_syscall fpu_preempt vmx_syscall vmx_preempt +TEST_PROGS := fpu_syscall fpu_preempt fpu_signal vmx_syscall vmx_preempt vmx_signal all: $(TEST_PROGS) @@ -7,9 +7,11 @@ $(TEST_PROGS): CFLAGS += -O2 -g -pthread -m64 -maltivec fpu_syscall: fpu_asm.S fpu_preempt: fpu_asm.S +fpu_signal: fpu_asm.S vmx_syscall: vmx_asm.S vmx_preempt: vmx_asm.S +vmx_signal: vmx_asm.S include ../../lib.mk diff --git a/tools/testing/selftests/powerpc/math/fpu_signal.c b/tools/testing/selftests/powerpc/math/fpu_signal.c new file mode 100644 index 000..888aa51 --- /dev/null +++ b/tools/testing/selftests/powerpc/math/fpu_signal.c @@ -0,0 +1,135 @@ +/* + * Copyright 2015, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * This test attempts to see if the FPU registers are correctly reported in a + * signal context. Each worker just spins checking its FPU registers, at some + * point a signal will interrupt it and C code will check the signal context + * ensuring it is also the same. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "utils.h" + +/* Number of times each thread should receive the signal */ +#define ITERATIONS 10 +/* + * Factor by which to multiply number of online CPUs for total number of + * worker threads + */ +#define THREAD_FACTOR 8 + +__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, +1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, +2.1}; + +bool bad_context; +int threads_starting; +int running; + +extern long preempt_fpu(double *darray, int *threads_starting, int *running); + +void signal_fpu_sig(int sig, siginfo_t *info, void *context) +{ + int i; + ucontext_t *uc = context; + mcontext_t *mc = >uc_mcontext; + + /* Only the non volatiles were loaded up */ + for (i = 14; i < 32; i++) { + if (mc->fp_regs[i] != darray[i - 14]) { + bad_context = true; + break; + } + } +} + +void *signal_fpu_c(void *p) +{ + int i; + long rc; + struct sigaction act; + act.sa_sigaction = signal_fpu_sig; + act.sa_flags = SA_SIGINFO; + rc = sigaction(SIGUSR1, , NULL); + if (rc) + return p; + + srand(pthread_self()); + for (i = 0; i < 21; i++) + darray[i] = rand(); + + rc = preempt_fpu(darray, _starting, ); + + return (void *) rc; +} + +int test_signal_fpu(void) +{ + int i, j, rc, threads; + void *rc_p; + pthread_t *tids; + + threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR; + tids = malloc(threads * sizeof(pthread_t)); + FAIL_IF(!tids); + + running = true; + threads_starting = threads; + for (i = 0; i < threads; i++) { + rc = pthread_create([i], NULL, signal_fpu_c, NULL); + FAIL_IF(rc); + } + + setbuf(stdout, NULL); + printf("\tWaiting for all workers to start..."); + while (threads_starting) + asm volatile("": : :"memory"); + printf("done\n"); + + printf("\tSending signals to all threads %d times...", ITERATIONS); + for (i = 0; i < ITERATIONS; i++) { + for (j = 0; j < threads; j++) { + pthread_kill(tids[j], SIGUSR1); + } + sleep(1); + } + printf("done\n"); + + printf("\tStopping workers..."); + running = 0; +
[PATCH v3 0/9] FP/VEC/VSX switching optimisations
Cover-letter for V1 of the series is at https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-November/136350.html Cover-letter for V2 of the series is at https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-January/138054.html Changes in V3: Addressed review comments from Michael Neuling - Made commit message in 4/9 better reflect the patch - Removed overuse of #ifdef blocks and redundant condition in 5/9 - Split 6/8 in two to better prepare for 7,8,9 - Removed #ifdefs in 6/9 Cyril Bur (9): selftests/powerpc: Test the preservation of FPU and VMX regs across syscall selftests/powerpc: Test preservation of FPU and VMX regs across preemption selftests/powerpc: Test FPU and VMX regs in signal ucontext powerpc: Explicitly disable math features when copying thread powerpc: Restore FPU/VEC/VSX if previously used powerpc: Prepare for splitting giveup_{fpu,altivec,vsx} in two powerpc: Add the ability to save FPU without giving it up powerpc: Add the ability to save Altivec without giving it up powerpc: Add the ability to save VSX without giving it up arch/powerpc/include/asm/processor.h | 2 + arch/powerpc/include/asm/switch_to.h | 13 +- arch/powerpc/kernel/asm-offsets.c | 2 + arch/powerpc/kernel/entry_64.S | 21 +- arch/powerpc/kernel/fpu.S | 25 +-- arch/powerpc/kernel/ppc_ksyms.c| 4 - arch/powerpc/kernel/process.c | 172 ++-- arch/powerpc/kernel/vector.S | 45 +--- tools/testing/selftests/powerpc/Makefile | 3 +- tools/testing/selftests/powerpc/basic_asm.h| 26 +++ tools/testing/selftests/powerpc/math/.gitignore| 6 + tools/testing/selftests/powerpc/math/Makefile | 19 ++ tools/testing/selftests/powerpc/math/fpu_asm.S | 195 ++ tools/testing/selftests/powerpc/math/fpu_preempt.c | 113 ++ tools/testing/selftests/powerpc/math/fpu_signal.c | 135 tools/testing/selftests/powerpc/math/fpu_syscall.c | 90 tools/testing/selftests/powerpc/math/vmx_asm.S | 229 + tools/testing/selftests/powerpc/math/vmx_preempt.c | 113 ++ tools/testing/selftests/powerpc/math/vmx_signal.c | 138 + tools/testing/selftests/powerpc/math/vmx_syscall.c | 92 + 20 files changed, 1360 insertions(+), 83 deletions(-) create mode 100644 tools/testing/selftests/powerpc/basic_asm.h create mode 100644 tools/testing/selftests/powerpc/math/.gitignore create mode 100644 tools/testing/selftests/powerpc/math/Makefile create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c -- 2.7.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Different SIGSEGV codes (x86 and ppc64le)
On Tue, Jan 19, 2016 at 9:34 PM, Michael Ellermanwrote: > > The kernel describes those error codes as: > > #define SEGV_MAPERR (__SI_FAULT|1) /* address not mapped to object */ > #define SEGV_ACCERR (__SI_FAULT|2) /* invalid permissions for mapped > object */ > > Which one is correct in this case isn't entirely clear. There is a stack > mapping, but you're not allowed to use it because of the stack ulimit, so > arguably ACCERR is more accurate. > > However that's only true because of the stack guard page, which is supposed to > be somewhat invisible to userspace. If I disable the stack guard page logic, > userspace sees SEGV_MAPERR, so it seems that historically that's what is > expected. I think MAPERR is likely the right thing for a guard page access. That said, I'd also warn people from caring too mucbh about the details of si_code. We've not traditionally been very good at filling it in. So any program that uses it for any actual semantic behavior is likely just broken. Print it it in debuggers by all means, but relying on it in any other way is just crazy. It's just not historically reliable enough. So I wouldn't worry about it excessively. Linus ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/4] kallsyms: add support for relative offsets in kallsyms address table
On Wed, Jan 20, 2016 at 1:05 AM, Ard Biesheuvelwrote: > Similar to how relative extables are implemented, it is possible to emit > the kallsyms table in such a way that it contains offsets relative to some > anchor point in the kernel image rather than absolute addresses. The benefit > is that such table entries are no longer subject to dynamic relocation when > the build time and runtime offsets of the kernel image are different. Also, > on 64-bit architectures, it essentially cuts the size of the address table > in half since offsets can typically be expressed in 32 bits. > > Since it is useful for some architectures (like x86) to retain the ability > to emit absolute values as well, this patch adds support for both, by > emitting absolute addresses as positive 32-bit values, and addresses > relative to _text as negative values, which are subtracted from the runtime > address of _text to produce the actual address. Positive values are used as > they are found in the table. > > Support for the above is enabled by setting CONFIG_KALLSYMS_TEXT_RELATIVE. > > Signed-off-by: Ard Biesheuvel Reviewed-by: Kees Cook A nice space-saver! :) -Kees > --- > init/Kconfig| 14 > kernel/kallsyms.c | 35 +- > scripts/kallsyms.c | 38 +--- > scripts/link-vmlinux.sh | 4 +++ > scripts/namespace.pl| 1 + > 5 files changed, 79 insertions(+), 13 deletions(-) > > diff --git a/init/Kconfig b/init/Kconfig > index 5b86082fa238..73e00b040572 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1427,6 +1427,20 @@ config KALLSYMS_ALL > >Say N unless you really need all symbols. > > +config KALLSYMS_TEXT_RELATIVE > + bool > + help > + Instead of emitting them as absolute values in the native word size, > + emit the symbol references in the kallsyms table as 32-bit entries, > + each containing either an absolute value in the range [0, S32_MAX] > or > + a text relative value in the range [_text, _text + S32_MAX], encoded > + as negative values. > + > + On 64-bit builds, this reduces the size of the address table by 50%, > + but more importantly, it results in entries whose values are build > + time constants, and no relocation pass is required at runtime to fix > + up the entries based on the runtime load address of the kernel. > + > config PRINTK > default y > bool "Enable support for printk" if EXPERT > diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c > index 5c5987f10819..e612f7f9e71b 100644 > --- a/kernel/kallsyms.c > +++ b/kernel/kallsyms.c > @@ -38,6 +38,7 @@ > * during the second link stage. > */ > extern const unsigned long kallsyms_addresses[] __weak; > +extern const int kallsyms_offsets[] __weak; > extern const u8 kallsyms_names[] __weak; > > /* > @@ -176,6 +177,19 @@ static unsigned int get_symbol_offset(unsigned long pos) > return name - kallsyms_names; > } > > +static unsigned long kallsyms_sym_address(int idx) > +{ > + if (!IS_ENABLED(CONFIG_KALLSYMS_TEXT_RELATIVE)) > + return kallsyms_addresses[idx]; > + > + /* positive offsets are absolute values */ > + if (kallsyms_offsets[idx] >= 0) > + return kallsyms_offsets[idx]; > + > + /* negative offsets are relative to _text - 1 */ > + return (unsigned long)_text - 1 - kallsyms_offsets[idx]; > +} > + > /* Lookup the address for this symbol. Returns 0 if not found. */ > unsigned long kallsyms_lookup_name(const char *name) > { > @@ -187,7 +201,7 @@ unsigned long kallsyms_lookup_name(const char *name) > off = kallsyms_expand_symbol(off, namebuf, > ARRAY_SIZE(namebuf)); > > if (strcmp(namebuf, name) == 0) > - return kallsyms_addresses[i]; > + return kallsyms_sym_address(i); > } > return module_kallsyms_lookup_name(name); > } > @@ -204,7 +218,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char > *, struct module *, > > for (i = 0, off = 0; i < kallsyms_num_syms; i++) { > off = kallsyms_expand_symbol(off, namebuf, > ARRAY_SIZE(namebuf)); > - ret = fn(data, namebuf, NULL, kallsyms_addresses[i]); > + ret = fn(data, namebuf, NULL, kallsyms_sym_address(i)); > if (ret != 0) > return ret; > } > @@ -220,7 +234,10 @@ static unsigned long get_symbol_pos(unsigned long addr, > unsigned long i, low, high, mid; > > /* This kernel should never had been booted. */ > - BUG_ON(!kallsyms_addresses); > + if (!IS_ENABLED(CONFIG_KALLSYMS_TEXT_RELATIVE)) > + BUG_ON(!kallsyms_addresses); > + else > + BUG_ON(!kallsyms_offsets); > > /* Do a binary search on the sorted kallsyms_addresses array.
Re: [PATCH 4/4] x86_64: enable text relative kallsyms for 64-bit targets
On Wed, Jan 20, 2016 at 1:05 AM, Ard Biesheuvelwrote: > This enables the newly introduced text-relative kallsyms support when > building 64-bit targets. This cuts the size of the kallsyms address > table in half, reducing the memory footprint of the kernel .rodata > section by about 400 KB for a KALLSYMS_ALL build, and about 100 KB > reduction in compressed size. (with CONFIG_RELOCATABLE=y) > > Signed-off-by: Ard Biesheuvel Tested-by: Kees Cook -Kees > --- > I tested this with my Ubuntu Wily box's config-4.2.0-23-generic, and > got the following results: > > BEFORE: > === > $ size vmlinux >textdata bss dec hex filename > 129729492213240 1482752 16668941 fe590d vmlinux > > $ readelf -S .tmp_kallsyms2.o |less > There are 9 section headers, starting at offset 0x3e0788: > > Section Headers: > [Nr] Name Type Address Offset >Size EntSize Flags Link Info Align > ... > [ 4] .rodata PROGBITS 0040 >001c7738 A 0 0 8 > [ 5] .rela.rodata RELA 001c7950 >00218e38 0018 I 7 4 8 > [ 6] .shstrtab STRTAB 001c7778 >0039 0 0 1 > > $ ls -l arch/x86/boot/bzImage > -rw-rw-r-- 1 ard ard 6893168 Jan 20 09:36 arch/x86/boot/bzImage > > AFTER: > == > $ size vmlinux >textdata bss dec hex filename > 126045012213240 1482752 16300493 f8b9cd vmlinux > > $ readelf -S .tmp_kallsyms2.o |less > There are 8 section headers, starting at offset 0x16dd10: > > Section Headers: > [Nr] Name Type Address Offset >Size EntSize Flags Link Info Align > ... > [ 4] .rodata PROGBITS 0040 >0016db20 A 0 0 8 > [ 5] .shstrtab STRTAB 0016db60 >0034 0 0 1 > ... > > $ ls -l arch/x86/boot/bzImage > -rw-rw-r-- 1 ard ard 6790224 Jan 19 22:24 arch/x86/boot/bzImage > --- > arch/x86/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 4a10ba9e95da..180a94bda8d4 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -142,6 +142,7 @@ config X86 > select HAVE_UNSTABLE_SCHED_CLOCK > select HAVE_USER_RETURN_NOTIFIER > select IRQ_FORCED_THREADING > + select KALLSYMS_TEXT_RELATIVE if X86_64 > select MODULES_USE_ELF_RELA if X86_64 > select MODULES_USE_ELF_REL if X86_32 > select OLD_SIGACTIONif X86_32 > -- > 2.5.0 > -- Kees Cook Chrome OS & Brillo Security ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/eeh: Fix PE location code
On Wed, Jan 20, 2016 at 02:56:13PM +1100, Russell Currey wrote: > On Wed, 2015-12-02 at 16:25 +1100, Gavin Shan wrote: > > In eeh_pe_loc_get(), the PE location code is retrieved from the > > "ibm,loc-code" property of the device node for the bridge of the > > PE's primary bus. It's not correct because the property indicates > > the parent PE's location code. > > > > This reads the correct PE location code from "ibm,io-base-loc-code" > > or "ibm,slot-location-code" property of PE parent bus's device node. > > > > Signed-off-by: Gavin Shan> > --- > > Tested-by: Russell Currey Thanks Russell! W.R.T including this in stable, I don't believe anything actively breaks without the patch, but in the event of an EEH freeze the wrong slot for the device will be identified, making troubleshooting more difficult. > > arch/powerpc/kernel/eeh_pe.c | 33 +++-- > > 1 file changed, 15 insertions(+), 18 deletions(-) > > > > diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c > > index 8654cb1..ca9e537 100644 > > --- a/arch/powerpc/kernel/eeh_pe.c > > +++ b/arch/powerpc/kernel/eeh_pe.c > > @@ -883,32 +883,29 @@ void eeh_pe_restore_bars(struct eeh_pe *pe) > > const char *eeh_pe_loc_get(struct eeh_pe *pe) > > { > > struct pci_bus *bus = eeh_pe_bus_get(pe); > > - struct device_node *dn = pci_bus_to_OF_node(bus); > > + struct device_node *dn; > > const char *loc = NULL; > > > > - if (!dn) > > - goto out; > > + while (bus) { > > + dn = pci_bus_to_OF_node(bus); > > + if (!dn) { > > + bus = bus->parent; > > + continue; > > + } > > > > - /* PHB PE or root PE ? */ > > - if (pci_is_root_bus(bus)) { > > - loc = of_get_property(dn, "ibm,loc-code", NULL); > > - if (!loc) > > + if (pci_is_root_bus(bus)) > > loc = of_get_property(dn, "ibm,io-base-loc- > > code", NULL); > > + else > > + loc = of_get_property(dn, "ibm,slot-location- > > code", > > + NULL); > > + > > if (loc) > > - goto out; > > + return loc; > > > > - /* Check the root port */ > > - dn = dn->child; > > - if (!dn) > > - goto out; > > + bus = bus->parent; > > } > > > > - loc = of_get_property(dn, "ibm,loc-code", NULL); > > - if (!loc) > > - loc = of_get_property(dn, "ibm,slot-location-code", > > NULL); > > - > > -out: > > - return loc ? loc : "N/A"; > > + return "N/A"; > > } > > > > /** > > ___ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Enable VMX copy on PPC970 (G5)
There's no reason I'm aware of that the VMX copy loop shouldn't work on PPC970. And in fact it seems to boot and generally be happy. Signed-off-by: Michael Ellerman--- arch/powerpc/include/asm/cputable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index b118072670fb..f412666fafe2 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -411,7 +411,7 @@ enum { CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_201 | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_CAN_NAP | CPU_FTR_MMCRA | \ CPU_FTR_CP_USE_DCBTZ | CPU_FTR_STCX_CHECKS_ADDRESS | \ - CPU_FTR_HVMODE | CPU_FTR_DABRX) + CPU_FTR_HVMODE | CPU_FTR_DABRX | CPU_FTR_VMX_COPY) #define CPU_FTRS_POWER5(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_MMCRA | CPU_FTR_SMT | \ -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/4] powerpc: enable text relative kallsyms for ppc64
This enables the newly introduced text-relative kallsyms support when building 64-bit targets. This cuts the size of the kallsyms address table in half, and drastically reduces the size of the PIE dynamic relocation section when building with CONFIG_RELOCATABLE=y (by about 3 MB for ppc64_defconfig) Signed-off-by: Ard Biesheuvel--- Results for ppc64_defconfig: BEFORE: === $ size vmlinux textdata bss dec hex filename 198279962008456 849612 2268606415a2970 vmlinux $ readelf -S .tmp_kallsyms2.o There are 9 section headers, starting at offset 0x4513f8: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 4] .rodata PROGBITS 0100 001fcf00 A 0 0 256 [ 5] .rela.rodata RELA 001fd1d8 00254220 0018 I 7 4 8 [ 6] .shstrtab STRTAB 001fd000 0039 0 0 1 ... $ ls -l arch/powerpc/boot/zImage -rwxrwxr-x 2 ard ard 7533160 Jan 20 08:43 arch/powerpc/boot/zImage AFTER: == $ size vmlinux textdata bss dec hex filename 169795162009992 849612 1983912012eb890 vmlinux $ readelf -S .tmp_kallsyms2.o There are 8 section headers, starting at offset 0x199bb0: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 4] .rodata PROGBITS 0100 00199900 A 0 0 256 [ 5] .shstrtab STRTAB 00199a00 0034 0 0 1 ... $ ls -l arch/powerpc/boot/zImage -rwxrwxr-x 2 ard ard 6985672 Jan 20 08:45 arch/powerpc/boot/zImage --- arch/powerpc/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 94f6c5089e0c..d1c26749632b 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -158,6 +158,7 @@ config PPC select ARCH_HAS_DMA_SET_COHERENT_MASK select ARCH_HAS_DEVMEM_IS_ALLOWED select HAVE_ARCH_SECCOMP_FILTER + select KALLSYMS_TEXT_RELATIVE if PPC64 config GENERIC_CSUM def_bool CPU_LITTLE_ENDIAN -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/4] support for text-relative kallsyms table
This implements text-relative kallsyms address tables. This was developed as part of my series to implement KASLR/CONFIG_RELOCATABLE for arm64, but I think it may be beneficial to other architectures as well, so I am presenting it as a separate series. The idea is that on 64-bit builds, it is rather wasteful to use absolute addressing for kernel symbols since they are all within a couple of MBs of each other. On top of that, the absolute addressing implies that, when the kernel is relocated at runtime, each address in the table needs to be fixed up individually. Since all section-relative addresses are already emitted relative to _text, it is quite straight-forward to record only the offset, and add the absolute address of _text at runtime when referring to the address table. The reduction ranges from around 250 KB uncompressed vmlinux size and 10 KB compressed size (s390) to 3 MB/500 KB for ppc64 (although, in the latter case, the reduction in uncompressed size is primarily __init data) Kees Cook was so kind to test these against x86_64, and confirmed that KASLR still operates as expected. Ard Biesheuvel (4): kallsyms: add support for relative offsets in kallsyms address table powerpc: enable text relative kallsyms for ppc64 s390: enable text relative kallsyms for 64-bit targets x86_64: enable text relative kallsyms for 64-bit targets arch/powerpc/Kconfig| 1 + arch/s390/Kconfig | 1 + arch/x86/Kconfig| 1 + init/Kconfig| 14 kernel/kallsyms.c | 35 +- scripts/kallsyms.c | 38 +--- scripts/link-vmlinux.sh | 4 +++ scripts/namespace.pl| 1 + 8 files changed, 82 insertions(+), 13 deletions(-) -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/4] s390: enable text relative kallsyms for 64-bit targets
This enables the newly introduced text-relative kallsyms support when building 64-bit targets. This cuts the size of the kallsyms address table in half, reducing the memory footprint of the kernel .rodata section by about 250 KB for a defconfig build. Signed-off-by: Ard Biesheuvel--- BEFORE: === $ size vmlinux textdata bss dec hex filename 123295863107008 14727792301643861cc45a2 vmlinux $ readelf -S .tmp_kallsyms2.o There are 9 section headers, starting at offset 0x125b50: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 4] .rodata PROGBITS 0040 00125ad0 A 0 0 8 [ 5] .rela.rodata RELA 00125f28 0015ead8 0018 7 4 8 [ 6] .shstrtab STRTAB 00125b10 0039 0 0 1 ... $ ls -l arch/s390/boot/bzImage -rwxrwxr-x 1 ard ard 5234224 Jan 20 08:22 arch/s390/boot/bzImage AFTER: == $ size vmlinux textdata bss dec hex filename 120881143102912 14727792299188181c88662 vmlinux $ readelf -S .tmp_kallsyms2.o There are 8 section headers, starting at offset 0xeb428: Section Headers: [Nr] Name Type Address Offset ... [ 4] .rodata PROGBITS 0040 000eb3b0 A 0 0 8 [ 5] .shstrtab STRTAB 000eb3f0 0034 0 0 1 ... $ ls -l arch/s390/boot/bzImage -rwxrwxr-x 1 ard ard 5224256 Jan 20 08:23 arch/s390/boot/bzImage --- arch/s390/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index dbeeb3a049f2..588160fd1db0 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -149,6 +149,7 @@ config S390 select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_SYSCALL_TRACEPOINTS select HAVE_VIRT_CPU_ACCOUNTING + select KALLSYMS_TEXT_RELATIVE if 64BIT select MODULES_USE_ELF_RELA select NO_BOOTMEM select OLD_SIGACTION -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 4/4] x86_64: enable text relative kallsyms for 64-bit targets
This enables the newly introduced text-relative kallsyms support when building 64-bit targets. This cuts the size of the kallsyms address table in half, reducing the memory footprint of the kernel .rodata section by about 400 KB for a KALLSYMS_ALL build, and about 100 KB reduction in compressed size. (with CONFIG_RELOCATABLE=y) Signed-off-by: Ard Biesheuvel--- I tested this with my Ubuntu Wily box's config-4.2.0-23-generic, and got the following results: BEFORE: === $ size vmlinux textdata bss dec hex filename 129729492213240 1482752 16668941 fe590d vmlinux $ readelf -S .tmp_kallsyms2.o |less There are 9 section headers, starting at offset 0x3e0788: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 4] .rodata PROGBITS 0040 001c7738 A 0 0 8 [ 5] .rela.rodata RELA 001c7950 00218e38 0018 I 7 4 8 [ 6] .shstrtab STRTAB 001c7778 0039 0 0 1 $ ls -l arch/x86/boot/bzImage -rw-rw-r-- 1 ard ard 6893168 Jan 20 09:36 arch/x86/boot/bzImage AFTER: == $ size vmlinux textdata bss dec hex filename 126045012213240 1482752 16300493 f8b9cd vmlinux $ readelf -S .tmp_kallsyms2.o |less There are 8 section headers, starting at offset 0x16dd10: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 4] .rodata PROGBITS 0040 0016db20 A 0 0 8 [ 5] .shstrtab STRTAB 0016db60 0034 0 0 1 ... $ ls -l arch/x86/boot/bzImage -rw-rw-r-- 1 ard ard 6790224 Jan 19 22:24 arch/x86/boot/bzImage --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4a10ba9e95da..180a94bda8d4 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -142,6 +142,7 @@ config X86 select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER select IRQ_FORCED_THREADING + select KALLSYMS_TEXT_RELATIVE if X86_64 select MODULES_USE_ELF_RELA if X86_64 select MODULES_USE_ELF_REL if X86_32 select OLD_SIGACTIONif X86_32 -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/4] kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit the kallsyms table in such a way that it contains offsets relative to some anchor point in the kernel image rather than absolute addresses. The benefit is that such table entries are no longer subject to dynamic relocation when the build time and runtime offsets of the kernel image are different. Also, on 64-bit architectures, it essentially cuts the size of the address table in half since offsets can typically be expressed in 32 bits. Since it is useful for some architectures (like x86) to retain the ability to emit absolute values as well, this patch adds support for both, by emitting absolute addresses as positive 32-bit values, and addresses relative to _text as negative values, which are subtracted from the runtime address of _text to produce the actual address. Positive values are used as they are found in the table. Support for the above is enabled by setting CONFIG_KALLSYMS_TEXT_RELATIVE. Signed-off-by: Ard Biesheuvel--- init/Kconfig| 14 kernel/kallsyms.c | 35 +- scripts/kallsyms.c | 38 +--- scripts/link-vmlinux.sh | 4 +++ scripts/namespace.pl| 1 + 5 files changed, 79 insertions(+), 13 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 5b86082fa238..73e00b040572 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1427,6 +1427,20 @@ config KALLSYMS_ALL Say N unless you really need all symbols. +config KALLSYMS_TEXT_RELATIVE + bool + help + Instead of emitting them as absolute values in the native word size, + emit the symbol references in the kallsyms table as 32-bit entries, + each containing either an absolute value in the range [0, S32_MAX] or + a text relative value in the range [_text, _text + S32_MAX], encoded + as negative values. + + On 64-bit builds, this reduces the size of the address table by 50%, + but more importantly, it results in entries whose values are build + time constants, and no relocation pass is required at runtime to fix + up the entries based on the runtime load address of the kernel. + config PRINTK default y bool "Enable support for printk" if EXPERT diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index 5c5987f10819..e612f7f9e71b 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -38,6 +38,7 @@ * during the second link stage. */ extern const unsigned long kallsyms_addresses[] __weak; +extern const int kallsyms_offsets[] __weak; extern const u8 kallsyms_names[] __weak; /* @@ -176,6 +177,19 @@ static unsigned int get_symbol_offset(unsigned long pos) return name - kallsyms_names; } +static unsigned long kallsyms_sym_address(int idx) +{ + if (!IS_ENABLED(CONFIG_KALLSYMS_TEXT_RELATIVE)) + return kallsyms_addresses[idx]; + + /* positive offsets are absolute values */ + if (kallsyms_offsets[idx] >= 0) + return kallsyms_offsets[idx]; + + /* negative offsets are relative to _text - 1 */ + return (unsigned long)_text - 1 - kallsyms_offsets[idx]; +} + /* Lookup the address for this symbol. Returns 0 if not found. */ unsigned long kallsyms_lookup_name(const char *name) { @@ -187,7 +201,7 @@ unsigned long kallsyms_lookup_name(const char *name) off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); if (strcmp(namebuf, name) == 0) - return kallsyms_addresses[i]; + return kallsyms_sym_address(i); } return module_kallsyms_lookup_name(name); } @@ -204,7 +218,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, for (i = 0, off = 0; i < kallsyms_num_syms; i++) { off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); - ret = fn(data, namebuf, NULL, kallsyms_addresses[i]); + ret = fn(data, namebuf, NULL, kallsyms_sym_address(i)); if (ret != 0) return ret; } @@ -220,7 +234,10 @@ static unsigned long get_symbol_pos(unsigned long addr, unsigned long i, low, high, mid; /* This kernel should never had been booted. */ - BUG_ON(!kallsyms_addresses); + if (!IS_ENABLED(CONFIG_KALLSYMS_TEXT_RELATIVE)) + BUG_ON(!kallsyms_addresses); + else + BUG_ON(!kallsyms_offsets); /* Do a binary search on the sorted kallsyms_addresses array. */ low = 0; @@ -228,7 +245,7 @@ static unsigned long get_symbol_pos(unsigned long addr, while (high - low > 1) { mid = low + (high - low) / 2; - if (kallsyms_addresses[mid] <= addr) + if (kallsyms_sym_address(mid) <= addr) low = mid; else high = mid;
Re: [PATCH v5 0/9] ftrace with regs + live patching for ppc64 LE (ABI v2)
On Wed, Jan 20, 2016 at 05:03:23PM +1100, Michael Ellerman wrote: > On Wed, 2016-01-06 at 15:17 +0100, Petr Mladek wrote: > > On Fri 2015-12-04 15:45:29, Torsten Duwe wrote: > > > Changes since v4: > > > * change comment style in entry_64.S to C89 > > > (nobody is using assembler syntax comments there). > > > * the bool function restore_r2 shouldn't return 2, > > > that's a little confusing. > > > * Test whether the compiler supports -mprofile-kernel > > > and only then define CC_USING_MPROFILE_KERNEL > > > * also make the return value of klp_check_compiler_support > > > depend on that. > > > > Note that there is still needed the extra patch from > > http://thread.gmane.org/gmane.linux.kernel/2093867/focus=2099603 > > to get the livepatching working. > > Sorry which extra patch? Message-ID: <20151203160004.ge8...@pathway.suse.cz> By Petr Mladek, "Re: [PATCH v4 0/9] ftrace with regs + live patching..." 2015-12-03. It is further up in the function call hierarchy and basically tells the arch-independent KLP to call the normal entry point on ppc64le, and that the _mcount call site is 16 bytes further. > > Both ftrace with regs and live patching works for me with this patch > > set and the extra patch. So. for the whole patchset: > > > > Tested-by: Petr Mladek> > Can you give me some more info on how you're testing it? What config options, > toolchain etc.? > > For me the series doesn't even boot, even with livepatching disabled. May indeed be a toolchain issue. I had to fix gcc-4.8.5 to get "notrace" working for -mprofile-kernel. That's a gcc bug. What are you using? The config in the v5 patch series should be waterproof, especially with KLP disabled ftrace with regs must work (all self-tests succeeded). If you send me your config (via PM I suggest, spare the lists) I can verify it with the toolchain here. Petr made a suggestion to reshuffle the config options to have it cleaner; I suggest to patch that separately. Torsten ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported
On 2016/1/16 1:24, David Laight wrote: From: Yongji Xie Sent: 15 January 2016 07:06 MSI-X tables are not allowed to be mmapped in vfio-pci driver in case that user get to touch this directly. This will cause some performance issues when when PCI adapters have critical registers in the same page as the MSI-X table. ... If the driver wants to generate an incorrect MSI-X interrupt it can do so by requesting the device do a normal memory transfer to the target address area that raises MSI-X interrupts. IOMMUs supporting interrupt remapping can prevent this case. So disabling writes to the MSI-X table (and pending bit array) areas only raises the bar very slightly. A device may also give the driver write access to the MSI-X table through other addresses. This seems to make disallowing the mapping of the MSI-X table rather pointless. If we allow the mapping of the MSI-X table, it seems the guest kernels of some architectures can write invalid data to MSI-X table when device drivers initialize MSI-X interrupts. Regards, Yongji Xie I've also dumped out the MSI-X table (during development) to check that the values are being written there correctly. David ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/4] s390: enable text relative kallsyms for 64-bit targets
On Wed, Jan 20, 2016 at 10:05:37AM +0100, Ard Biesheuvel wrote: > This enables the newly introduced text-relative kallsyms support when > building 64-bit targets. This cuts the size of the kallsyms address > table in half, reducing the memory footprint of the kernel .rodata > section by about 250 KB for a defconfig build. > > Signed-off-by: Ard Biesheuvel> --- > > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig > index dbeeb3a049f2..588160fd1db0 100644 > --- a/arch/s390/Kconfig > +++ b/arch/s390/Kconfig > @@ -149,6 +149,7 @@ config S390 > select HAVE_REGS_AND_STACK_ACCESS_API > select HAVE_SYSCALL_TRACEPOINTS > select HAVE_VIRT_CPU_ACCOUNTING > + select KALLSYMS_TEXT_RELATIVE if 64BIT Please remove the "if 64BIT" since s390 is always 64BIT in the meantime. Tested on s390 and everything seems still to work ;) Acked-by: Heiko Carstens ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/4] s390: enable text relative kallsyms for 64-bit targets
On 20 January 2016 at 10:43, Heiko Carstenswrote: > On Wed, Jan 20, 2016 at 10:05:37AM +0100, Ard Biesheuvel wrote: >> This enables the newly introduced text-relative kallsyms support when >> building 64-bit targets. This cuts the size of the kallsyms address >> table in half, reducing the memory footprint of the kernel .rodata >> section by about 250 KB for a defconfig build. >> >> Signed-off-by: Ard Biesheuvel >> --- >> >> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig >> index dbeeb3a049f2..588160fd1db0 100644 >> --- a/arch/s390/Kconfig >> +++ b/arch/s390/Kconfig >> @@ -149,6 +149,7 @@ config S390 >> select HAVE_REGS_AND_STACK_ACCESS_API >> select HAVE_SYSCALL_TRACEPOINTS >> select HAVE_VIRT_CPU_ACCOUNTING >> + select KALLSYMS_TEXT_RELATIVE if 64BIT > > Please remove the "if 64BIT" since s390 is always 64BIT in the meantime. > Tested on s390 and everything seems still to work ;) > > Acked-by: Heiko Carstens > Thanks! Did you take a look at /proc/kallsyms, by any chance? It should look identical with and without these patches ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/4] s390: enable text relative kallsyms for 64-bit targets
On 20 January 2016 at 11:17, Heiko Carstenswrote: > On Wed, Jan 20, 2016 at 11:04:24AM +0100, Ard Biesheuvel wrote: >> On 20 January 2016 at 10:43, Heiko Carstens >> wrote: >> > On Wed, Jan 20, 2016 at 10:05:37AM +0100, Ard Biesheuvel wrote: >> >> This enables the newly introduced text-relative kallsyms support when >> >> building 64-bit targets. This cuts the size of the kallsyms address >> >> table in half, reducing the memory footprint of the kernel .rodata >> >> section by about 250 KB for a defconfig build. >> >> >> >> Signed-off-by: Ard Biesheuvel >> >> --- >> >> >> >> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig >> >> index dbeeb3a049f2..588160fd1db0 100644 >> >> --- a/arch/s390/Kconfig >> >> +++ b/arch/s390/Kconfig >> >> @@ -149,6 +149,7 @@ config S390 >> >> select HAVE_REGS_AND_STACK_ACCESS_API >> >> select HAVE_SYSCALL_TRACEPOINTS >> >> select HAVE_VIRT_CPU_ACCOUNTING >> >> + select KALLSYMS_TEXT_RELATIVE if 64BIT >> > >> > Please remove the "if 64BIT" since s390 is always 64BIT in the meantime. >> > Tested on s390 and everything seems still to work ;) >> > >> > Acked-by: Heiko Carstens >> > >> >> Thanks! Did you take a look at /proc/kallsyms, by any chance? It >> should look identical with and without these patches > > Close to identical, since the generated code and offsets change a bit with > your new config option enabled and disabled. But only those parts that are > linked behind kernel/kallsyms.c. > > However I did run a couple of ftrace, kprobes tests and enforced call > backtraces. Everything still works. > > So it looks all good. > Thanks a lot! ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V10 2/4] perf/powerpc: add support for sampling intr machine state
On Mon, 2016-01-11 at 15:58 +0530, Anju T wrote: > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 9a7057e..c4ce60d 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -119,6 +119,7 @@ config PPC > select GENERIC_ATOMIC64 if PPC32 > select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE > select HAVE_PERF_EVENTS > + select HAVE_PERF_REGS > select HAVE_REGS_AND_STACK_ACCESS_API > select HAVE_HW_BREAKPOINT if PERF_EVENTS && PPC_BOOK3S_64 > select ARCH_WANT_IPC_PARSE_VERSION > diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c > new file mode 100644 > index 000..d32581763 > --- /dev/null > +++ b/arch/powerpc/perf/perf_regs.c ... > + > +u64 perf_reg_abi(struct task_struct *task) > +{ > + return PERF_SAMPLE_REGS_ABI_64; What is this value used for exactly? It seems like on 32-bit kernels we should be returning PERF_SAMPLE_REGS_ABI_32. > +} > + > +void perf_get_regs_user(struct perf_regs *regs_user, > + struct pt_regs *regs, > + struct pt_regs *regs_user_copy) > +{ > + regs_user->regs = task_pt_regs(current); > + regs_user->abi = perf_reg_abi(current); > +} cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 0/9] ftrace with regs + live patching for ppc64 LE (ABI v2)
On Wed 2016-01-20 17:03:23, Michael Ellerman wrote: > On Wed, 2016-01-06 at 15:17 +0100, Petr Mladek wrote: > > On Fri 2015-12-04 15:45:29, Torsten Duwe wrote: > > > Changes since v4: > > > * change comment style in entry_64.S to C89 > > > (nobody is using assembler syntax comments there). > > > * the bool function restore_r2 shouldn't return 2, > > > that's a little confusing. > > > * Test whether the compiler supports -mprofile-kernel > > > and only then define CC_USING_MPROFILE_KERNEL > > > * also make the return value of klp_check_compiler_support > > > depend on that. > > > > Note that there is still needed the extra patch from > > http://thread.gmane.org/gmane.linux.kernel/2093867/focus=2099603 > > to get the livepatching working. > > Sorry which extra patch? It was in an older reply and can be found at http://thread.gmane.org/gmane.linux.kernel/2093867/focus=2099603 > > Both ftrace with regs and live patching works for me with this patch > > set and the extra patch. So. for the whole patchset: > > > > Tested-by: Petr Mladek> > Can you give me some more info on how you're testing it? What config options, > toolchain etc.? You need to fulfill all dependencies for CONFIG_LIVEPATCH, see kernel/livepatch/Kconfig. Please, find attached the config that that I used. I did the testing on PPC64LE with a kernel based on 4.4.0-rc8 using the attached config. I used the following stuff: $> gcc --version gcc (SUSE Linux) 4.8.5 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $> rpm -q binutils binutils-2.25.0-13.1.ppc64le I tested it the following way: # booted the compiled kernel and printed the default cmdline $> cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-4.4.0-rc3-11-default+ root=UUID=... # tried function_graph tracer to check ftrace with regs echo function_graph >/sys/kernel/debug/tracing/current_tracer ; \ echo 1 >/sys/kernel/debug/tracing/tracing_on ; \ sleep 1 ; \ /usr/bin/ls /proc ; \ echo 0 >/sys/kernel/debug/tracing/tracing_on ; \ less /sys/kernel/debug/tracing/trace # loaded the patch and printed the patch cmdline $> modprobe livepatch-sample $> cat /proc/cmdline this has been live patched # tried to disable and enable the patch $> echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled $> cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-4.4.0-rc3-11-default+ root=UUID=... $> echo 1 > /sys/kernel/livepatch/livepatch_sample/enabled $> cat /proc/cmdline this has been live patched # also checked messages $> dmesg | tail -n 4 [ 33.673057] livepatch: tainting kernel with TAINT_LIVEPATCH [ 33.673068] livepatch: enabling patch 'livepatch_sample' [ 1997.098257] livepatch: disabling patch 'livepatch_sample' [ 2079.696277] livepatch: enabling patch 'livepatch_sample' > For me the series doesn't even boot, even with livepatching disabled. I wonder if you have enabled CONFIG_FTRACE_STARTUP_TEST and if the ftrace with regs fails on your setup. Best Regards, Petr ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/4] s390: enable text relative kallsyms for 64-bit targets
On Wed, Jan 20, 2016 at 11:04:24AM +0100, Ard Biesheuvel wrote: > On 20 January 2016 at 10:43, Heiko Carstenswrote: > > On Wed, Jan 20, 2016 at 10:05:37AM +0100, Ard Biesheuvel wrote: > >> This enables the newly introduced text-relative kallsyms support when > >> building 64-bit targets. This cuts the size of the kallsyms address > >> table in half, reducing the memory footprint of the kernel .rodata > >> section by about 250 KB for a defconfig build. > >> > >> Signed-off-by: Ard Biesheuvel > >> --- > >> > >> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig > >> index dbeeb3a049f2..588160fd1db0 100644 > >> --- a/arch/s390/Kconfig > >> +++ b/arch/s390/Kconfig > >> @@ -149,6 +149,7 @@ config S390 > >> select HAVE_REGS_AND_STACK_ACCESS_API > >> select HAVE_SYSCALL_TRACEPOINTS > >> select HAVE_VIRT_CPU_ACCOUNTING > >> + select KALLSYMS_TEXT_RELATIVE if 64BIT > > > > Please remove the "if 64BIT" since s390 is always 64BIT in the meantime. > > Tested on s390 and everything seems still to work ;) > > > > Acked-by: Heiko Carstens > > > > Thanks! Did you take a look at /proc/kallsyms, by any chance? It > should look identical with and without these patches Close to identical, since the generated code and offsets change a bit with your new config option enabled and disabled. But only those parts that are linked behind kernel/kallsyms.c. However I did run a couple of ftrace, kprobes tests and enforced call backtraces. Everything still works. So it looks all good. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v10 3/4] tools/perf: Map the ID values with register names
On Mon, 2016-01-11 at 15:58 +0530, Anju T wrote: > diff --git a/tools/perf/arch/powerpc/include/perf_regs.h > b/tools/perf/arch/powerpc/include/perf_regs.h > new file mode 100644 > index 000..93080f5 > --- /dev/null > +++ b/tools/perf/arch/powerpc/include/perf_regs.h > @@ -0,0 +1,64 @@ > +#ifndef ARCH_PERF_REGS_H > +#define ARCH_PERF_REGS_H > + > +#include > +#include > +#include > + > +#define PERF_REGS_MASK ((1ULL << PERF_REG_POWERPC_MAX) - 1) > +#define PERF_REGS_MAX PERF_REG_POWERPC_MAX > +#define PERF_SAMPLE_REGS_ABI PERF_SAMPLE_REGS_ABI_64 That looks wrong if perf is built 32-bit ? > +#define PERF_REG_IP PERF_REG_POWERPC_NIP > +#define PERF_REG_SP PERF_REG_POWERPC_GPR1 > + > +static const char *reg_names[] = { > + [PERF_REG_POWERPC_GPR0] = "gpr0", Can you instead call them "r0" etc. That is much more common on powerpc than "gpr0". > + [PERF_REG_POWERPC_GPR1] = "gpr1", > + [PERF_REG_POWERPC_GPR2] = "gpr2", > + [PERF_REG_POWERPC_GPR3] = "gpr3", > + [PERF_REG_POWERPC_GPR4] = "gpr4", > + [PERF_REG_POWERPC_GPR5] = "gpr5", > + [PERF_REG_POWERPC_GPR6] = "gpr6", > + [PERF_REG_POWERPC_GPR7] = "gpr7", > + [PERF_REG_POWERPC_GPR8] = "gpr8", > + [PERF_REG_POWERPC_GPR9] = "gpr9", > + [PERF_REG_POWERPC_GPR10] = "gpr10", > + [PERF_REG_POWERPC_GPR11] = "gpr11", > + [PERF_REG_POWERPC_GPR12] = "gpr12", > + [PERF_REG_POWERPC_GPR13] = "gpr13", > + [PERF_REG_POWERPC_GPR14] = "gpr14", > + [PERF_REG_POWERPC_GPR15] = "gpr15", > + [PERF_REG_POWERPC_GPR16] = "gpr16", > + [PERF_REG_POWERPC_GPR17] = "gpr17", > + [PERF_REG_POWERPC_GPR18] = "gpr18", > + [PERF_REG_POWERPC_GPR19] = "gpr19", > + [PERF_REG_POWERPC_GPR20] = "gpr20", > + [PERF_REG_POWERPC_GPR21] = "gpr21", > + [PERF_REG_POWERPC_GPR22] = "gpr22", > + [PERF_REG_POWERPC_GPR23] = "gpr23", > + [PERF_REG_POWERPC_GPR24] = "gpr24", > + [PERF_REG_POWERPC_GPR25] = "gpr25", > + [PERF_REG_POWERPC_GPR26] = "gpr26", > + [PERF_REG_POWERPC_GPR27] = "gpr27", > + [PERF_REG_POWERPC_GPR28] = "gpr28", > + [PERF_REG_POWERPC_GPR29] = "gpr29", > + [PERF_REG_POWERPC_GPR30] = "gpr30", > + [PERF_REG_POWERPC_GPR31] = "gpr31", > + [PERF_REG_POWERPC_NIP] = "nip", > + [PERF_REG_POWERPC_MSR] = "msr", > + [PERF_REG_POWERPC_ORIG_R3] = "orig_r3", > + [PERF_REG_POWERPC_CTR] = "ctr", > + [PERF_REG_POWERPC_LNK] = "link", > + [PERF_REG_POWERPC_XER] = "xer", > + [PERF_REG_POWERPC_CCR] = "ccr", > + [PERF_REG_POWERPC_TRAP] = "trap", > + [PERF_REG_POWERPC_DAR] = "dar", > + [PERF_REG_POWERPC_DSISR] = "dsisr" > +}; > + > +static inline const char *perf_reg_name(int id) > +{ > + return reg_names[id]; > +} > +#endif /* ARCH_PERF_REGS_H */ > diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile > index 38a0853..62a2f2d 100644 > --- a/tools/perf/config/Makefile > +++ b/tools/perf/config/Makefile > @@ -23,6 +23,11 @@ $(call detected_var,ARCH) > > NO_PERF_REGS := 1 > > +# Additional ARCH settings for ppc64 > +ifeq ($(ARCH),powerpc) powerpc also includes ppc, ie. 32-bit, so the comment is wrong. > + NO_PERF_REGS := 0 > +endif > + > # Additional ARCH settings for x86 > ifeq ($(ARCH),x86) >$(call detected,CONFIG_X86) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] support for text-relative kallsyms table
* Ard Biesheuvelwrote: > This implements text-relative kallsyms address tables. This was developed as > part of my series to implement KASLR/CONFIG_RELOCATABLE for arm64, but I > think > it may be beneficial to other architectures as well, so I am presenting it as > a > separate series. > > The idea is that on 64-bit builds, it is rather wasteful to use absolute > addressing for kernel symbols since they are all within a couple of MBs of > each > other. On top of that, the absolute addressing implies that, when the kernel > is > relocated at runtime, each address in the table needs to be fixed up > individually. > > Since all section-relative addresses are already emitted relative to _text, > it > is quite straight-forward to record only the offset, and add the absolute > address of _text at runtime when referring to the address table. > > The reduction ranges from around 250 KB uncompressed vmlinux size and 10 KB > compressed size (s390) to 3 MB/500 KB for ppc64 (although, in the latter > case, > the reduction in uncompressed size is primarily __init data) So since kallsyms is in unswappable kernel RAM, the uncompressed size reduction is what we care about mostly. How much bootloader load times are impacted is a third order concern. IOW a nice change! Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V10 1/4] perf/powerpc: assign an id to each powerpc register
Hi Anju, On Mon, 2016-01-11 at 15:58 +0530, Anju T wrote: > The enum definition assigns an 'id' to each register in "struct pt_regs" > of arch/powerpc. The order of these values in the enum definition are > based on the corresponding macros in arch/powerpc/include/uapi/asm/ptrace.h. Sorry one thing ... > diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h > b/arch/powerpc/include/uapi/asm/perf_regs.h > new file mode 100644 > index 000..cfbd068 > --- /dev/null > +++ b/arch/powerpc/include/uapi/asm/perf_regs.h > @@ -0,0 +1,49 @@ > +#ifndef _ASM_POWERPC_PERF_REGS_H > +#define _ASM_POWERPC_PERF_REGS_H > + > +enum perf_event_powerpc_regs { > + PERF_REG_POWERPC_GPR0, > + PERF_REG_POWERPC_GPR1, > + PERF_REG_POWERPC_GPR2, > + PERF_REG_POWERPC_GPR3, > + PERF_REG_POWERPC_GPR4, > + PERF_REG_POWERPC_GPR5, > + PERF_REG_POWERPC_GPR6, > + PERF_REG_POWERPC_GPR7, > + PERF_REG_POWERPC_GPR8, > + PERF_REG_POWERPC_GPR9, > + PERF_REG_POWERPC_GPR10, > + PERF_REG_POWERPC_GPR11, > + PERF_REG_POWERPC_GPR12, > + PERF_REG_POWERPC_GPR13, > + PERF_REG_POWERPC_GPR14, > + PERF_REG_POWERPC_GPR15, > + PERF_REG_POWERPC_GPR16, > + PERF_REG_POWERPC_GPR17, > + PERF_REG_POWERPC_GPR18, > + PERF_REG_POWERPC_GPR19, > + PERF_REG_POWERPC_GPR20, > + PERF_REG_POWERPC_GPR21, > + PERF_REG_POWERPC_GPR22, > + PERF_REG_POWERPC_GPR23, > + PERF_REG_POWERPC_GPR24, > + PERF_REG_POWERPC_GPR25, > + PERF_REG_POWERPC_GPR26, > + PERF_REG_POWERPC_GPR27, > + PERF_REG_POWERPC_GPR28, > + PERF_REG_POWERPC_GPR29, > + PERF_REG_POWERPC_GPR30, > + PERF_REG_POWERPC_GPR31, > + PERF_REG_POWERPC_NIP, > + PERF_REG_POWERPC_MSR, > + PERF_REG_POWERPC_ORIG_R3, > + PERF_REG_POWERPC_CTR, > + PERF_REG_POWERPC_LNK, > + PERF_REG_POWERPC_XER, > + PERF_REG_POWERPC_CCR, You skipped SOFTE here at my suggestion, because it's called MQ on 32-bit. But I've changed my mind, I think we *should* define SOFTE, and ignore MQ, because MQ is unused. So just add: + PERF_REG_POWERPC_SOFTE, > + PERF_REG_POWERPC_TRAP, > + PERF_REG_POWERPC_DAR, > + PERF_REG_POWERPC_DSISR, > + PERF_REG_POWERPC_MAX, > +}; > +#endif /* _ASM_POWERPC_PERF_REGS_H */ cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] support for text-relative kallsyms table
On Wednesday 20 January 2016 11:33:25 Ingo Molnar wrote: > > The reduction ranges from around 250 KB uncompressed vmlinux size and 10 KB > > compressed size (s390) to 3 MB/500 KB for ppc64 (although, in the latter > > case, > > the reduction in uncompressed size is primarily __init data) > > So since kallsyms is in unswappable kernel RAM, the uncompressed size > reduction is > what we care about mostly. How much bootloader load times are impacted is a > third > order concern. > > IOW a nice change! I think some people care a lot about the compressed size as well: http://git.openwrt.org/?p=openwrt.git;a=blob;f=target/linux/generic/patches-4.4/203-kallsyms_uncompressed.patch;h=cf8a447bbcd5b1621d4edc36a69fe0ad384fe53f;hb=HEAD This has been in openwrt.git for ages, because a lot of the target devices are much more limited on flash memory size (4MB typically) than they are on RAM size (at least 32MB). Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v9 2/6] Documentation, dt, arm64/arm: dt bindings for numa.
On Mon, Jan 18, 2016 at 10:06:01PM +0530, Ganapatrao Kulkarni wrote: > DT bindings for numa mapping of memory, cores and IOs. > > Reviewed-by: Robert Richter> Signed-off-by: Ganapatrao Kulkarni > --- > Documentation/devicetree/bindings/arm/numa.txt | 272 > + > 1 file changed, 272 insertions(+) > create mode 100644 Documentation/devicetree/bindings/arm/numa.txt This is looks okay to me, but some cosmetic things on the example. > +== > +4 - Example dts > +== > + > +2 sockets system consists of 2 boards connected through ccn bus and > +each board having one socket/soc of 8 cpus, memory and pci bus. > + > + memory@00c0 { Drop the leading 0s on unit addresses. > + device_type = "memory"; > + reg = <0x0 0x00c0 0x0 0x8000>; > + /* node 0 */ > + numa-node-id = <0>; > + }; > + > + memory@100 { > + device_type = "memory"; > + reg = <0x100 0x 0x0 0x8000>; > + /* node 1 */ > + numa-node-id = <1>; > + }; > + > + cpus { > + #address-cells = <2>; > + #size-cells = <0>; > + > + cpu@000 { Same here (leaving one of course). > + device_type = "cpu"; > + compatible = "arm,armv8"; > + reg = <0x0 0x000>; > + enable-method = "psci"; > + /* node 0 */ > + numa-node-id = <0>; > + }; > + cpu@001 { and so on... > + device_type = "cpu"; > + compatible = "arm,armv8"; > + reg = <0x0 0x001>; Either all leading 0s or none. > + reg = <0x0 0x008>; > + enable-method = "psci"; > + /* node 1 */ Kind of a pointless comment. Wouldn't each cluster of cpus for a given numa node be in a different cpu affinity? Certainly not required by the architecture, but the common case at least. > + numa-node-id = <1>; > + }; [...] > + pcie0: pcie0@0x8480, { Drop the 0x and the comma. > + compatible = "arm,armv8"; > + device_type = "pci"; > + bus-range = <0 255>; > + #size-cells = <2>; > + #address-cells = <3>; > + reg = <0x8480 0x 0 0x1000>; /* Configuration space > */ > + ranges = <0x0300 0x8010 0x 0x8010 0x 0x70 > 0x>; > + /* node 0 */ > + numa-node-id = <0>; > +}; > + > + pcie1: pcie1@0x9480, { ditto > + compatible = "arm,armv8"; > + device_type = "pci"; > + bus-range = <0 255>; > + #size-cells = <2>; > + #address-cells = <3>; > + reg = <0x9480 0x 0 0x1000>; /* Configuration space > */ > + ranges = <0x0300 0x9010 0x 0x9010 0x 0x70 > 0x>; > + /* node 1 */ > + numa-node-id = <1>; > +}; > + > + distance-map { > + compatible = "numa-distance-map-v1"; > + distance-matrix = <0 0 10>, > + <0 1 20>, > + <1 1 10>; > + }; > -- > 1.8.1.4 > > -- > To unsubscribe from this list: send the line "unsubscribe devicetree" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev