Re: powerpc: Simplify module TOC handling

2016-01-20 Thread Michael Ellerman
On Mon, 2016-18-01 at 00:44:27 UTC, Michael Ellerman wrote:
> From: Alan Modra 
> 
> PowerPC64 uses the symbol .TOC. much as other targets use
> _GLOBAL_OFFSET_TABLE_. It identifies the value of the GOT pointer (or in
> powerpc parlance, the TOC pointer). Global offset tables are generally
> local to an executable or shared library, or in the kernel, module. Thus
> it does not make sense for a module to resolve a relocation against
> .TOC. to the kernel's .TOC. value. A module has its own .TOC., and
> indeed the powerpc64 module relocation processing ignores the kernel
> value of .TOC. and instead calculates a module-local value.
> 
> This patch removes code involved in exporting the kernel .TOC., tweaks
> modpost to ignore an undefined .TOC., and the module loader to twiddle
> the section symbol so that .TOC. isn't seen as undefined.
> 
> Note that if the kernel was compiled with -msingle-pic-base then ELFv2
> would not have function global entry code setting up r2. In that case
> the module call stubs would need to be modified to set up r2 using the
> kernel .TOC. value, requiring some of this code to be reinstated.
> 
> mpe: Furthermore a change in binutils master (not yet released) causes
> the current way we handle the TOC to no longer work when building with
> MODVERSIONS=y and RELOCATABLE=n. The symptom is that modules can not be
> loaded due to there being no version found for TOC.
> 
> Cc: sta...@vger.kernel.org # 3.16+
> Signed-off-by: Alan Modra 
> Signed-off-by: Michael Ellerman 

Applied to powerpc fixes.

https://git.kernel.org/powerpc/c/c153693d7eb9eeb28478aa2dea

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/4] support for text-relative kallsyms table

2016-01-20 Thread Ard Biesheuvel
On 21 January 2016 at 06:10, Rusty Russell  wrote:
> Ard Biesheuvel  writes:
>> This implements text-relative kallsyms address tables. This was developed
>> as part of my series to implement KASLR/CONFIG_RELOCATABLE for arm64, but
>> I think it may be beneficial to other architectures as well, so I am
>> presenting it as a separate series.
>
> Nice work!
>

Thanks

> AFAICT this should work for every arch, as long as they start with _text
> (esp: data and init must be > _text).  In addition, it's not harmful on
> 32 bit archs.
>
> IOW, I'd like to turn it on for everyone and discard some code.  But
> it's easier to roll in like you've done first.
>
> Should we enable it by default for every arch for now, and see what
> happens?
>

As you say, this only works if every symbol >= _text, which is
obviously not the case per the conditional in scripts/kallsyms.c,
which emits _text + n or _text - n depending on whether the symbol
precedes or follows _text. The git log tells me for which arch this
was originally implemented, but it does not tell me which other archs
have come to rely on it in the meantime.

On top of that, ia64 fails to build with this option, since it has
some whitelisted absolute symbols that look suspiciously like they
could be emitted as _text relative (and it does not even matter in the
absence of CONFIG_RELOCATABLE on ia64, afaict) but I don't know
whether we can just override their types as T, since it would also
change the type in the contents of /proc/kallsyms. So some guidance
would be appreciated here.

So I agree that it would be preferred to have a single code path, but
I would need some help validating it on architectures I don't have
access to.

Thanks,
Ard.


>> The idea is that on 64-bit builds, it is rather wasteful to use absolute
>> addressing for kernel symbols since they are all within a couple of MBs
>> of each other. On top of that, the absolute addressing implies that, when
>> the kernel is relocated at runtime, each address in the table needs to be
>> fixed up individually.
>>
>> Since all section-relative addresses are already emitted relative to _text,
>> it is quite straight-forward to record only the offset, and add the absolute
>> address of _text at runtime when referring to the address table.
>>
>> The reduction ranges from around 250 KB uncompressed vmlinux size and 10 KB
>> compressed size (s390) to 3 MB/500 KB for ppc64 (although, in the latter 
>> case,
>> the reduction in uncompressed size is primarily __init data)
>>
>> Kees Cook was so kind to test these against x86_64, and confirmed that KASLR
>> still operates as expected.
>>
>> Ard Biesheuvel (4):
>>   kallsyms: add support for relative offsets in kallsyms address table
>>   powerpc: enable text relative kallsyms for ppc64
>>   s390: enable text relative kallsyms for 64-bit targets
>>   x86_64: enable text relative kallsyms for 64-bit targets
>>
>>  arch/powerpc/Kconfig|  1 +
>>  arch/s390/Kconfig   |  1 +
>>  arch/x86/Kconfig|  1 +
>>  init/Kconfig| 14 
>>  kernel/kallsyms.c   | 35 +-
>>  scripts/kallsyms.c  | 38 +---
>>  scripts/link-vmlinux.sh |  4 +++
>>  scripts/namespace.pl|  1 +
>>  8 files changed, 82 insertions(+), 13 deletions(-)
>>
>> --
>> 2.5.0
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 2/9] selftests/powerpc: Test preservation of FPU and VMX regs across preemption

2016-01-20 Thread Cyril Bur
Loop in assembly checking the registers with many threads.

Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/math/.gitignore|   2 +
 tools/testing/selftests/powerpc/math/Makefile  |   5 +-
 tools/testing/selftests/powerpc/math/fpu_asm.S |  34 +++
 tools/testing/selftests/powerpc/math/fpu_preempt.c | 113 +
 tools/testing/selftests/powerpc/math/vmx_asm.S |  44 +++-
 tools/testing/selftests/powerpc/math/vmx_preempt.c | 113 +
 6 files changed, 306 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c

diff --git a/tools/testing/selftests/powerpc/math/.gitignore 
b/tools/testing/selftests/powerpc/math/.gitignore
index b19b269..1a6f09e 100644
--- a/tools/testing/selftests/powerpc/math/.gitignore
+++ b/tools/testing/selftests/powerpc/math/.gitignore
@@ -1,2 +1,4 @@
 fpu_syscall
 vmx_syscall
+fpu_preempt
+vmx_preempt
diff --git a/tools/testing/selftests/powerpc/math/Makefile 
b/tools/testing/selftests/powerpc/math/Makefile
index 418bef1..b6f4158 100644
--- a/tools/testing/selftests/powerpc/math/Makefile
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -1,4 +1,4 @@
-TEST_PROGS := fpu_syscall vmx_syscall
+TEST_PROGS := fpu_syscall fpu_preempt vmx_syscall vmx_preempt
 
 all: $(TEST_PROGS)
 
@@ -6,7 +6,10 @@ $(TEST_PROGS): ../harness.c
 $(TEST_PROGS): CFLAGS += -O2 -g -pthread -m64 -maltivec
 
 fpu_syscall: fpu_asm.S
+fpu_preempt: fpu_asm.S
+
 vmx_syscall: vmx_asm.S
+vmx_preempt: vmx_asm.S
 
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S 
b/tools/testing/selftests/powerpc/math/fpu_asm.S
index 8733874..46bbe99 100644
--- a/tools/testing/selftests/powerpc/math/fpu_asm.S
+++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
@@ -159,3 +159,37 @@ FUNC_START(test_fpu)
POP_BASIC_STACK(256)
blr
 FUNC_END(test_fpu)
+
+#int preempt_fpu(double *darray, int *threads_running, int *running)
+#On starting will (atomically) decrement not_ready as a signal that the FPU
+#has been loaded with darray. Will proceed to check the validity of the FPU
+#registers while running is not zero.
+FUNC_START(preempt_fpu)
+   PUSH_BASIC_STACK(256)
+   std r3,32(sp) #double *darray
+   std r4,40(sp) #volatile int *not_ready
+   std r5,48(sp) #int *running
+   PUSH_FPU(56)
+
+   bl load_fpu
+
+   #Atomic DEC
+   ld r3,40(sp)
+1: lwarx r4,0,r3
+   addi r4,r4,-1
+   stwcx. r4,0,r3
+   bne- 1b
+
+2: ld r3, 32(sp)
+   bl check_fpu
+   cmpdi r3,0
+   bne 3f
+   ld r4, 48(sp)
+   ld r5, 0(r4)
+   cmpwi r5,0
+   bne 2b
+
+3: POP_FPU(56)
+   POP_BASIC_STACK(256)
+   blr
+FUNC_END(preempt_fpu)
diff --git a/tools/testing/selftests/powerpc/math/fpu_preempt.c 
b/tools/testing/selftests/powerpc/math/fpu_preempt.c
new file mode 100644
index 000..0f85b79
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_preempt.c
@@ -0,0 +1,113 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the FPU registers change across preemption.
+ * Two things should be noted here a) The check_fpu function in asm only checks
+ * the non volatile registers as it is reused from the syscall test b) There is
+ * no way to be sure preemption happened so this test just uses many threads
+ * and a long wait. As such, a successful test doesn't mean much but a failure
+ * is bad.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+
+/* Time to wait for workers to get preempted (seconds) */
+#define PREEMPT_TIME 20
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+
+__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
+1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
+2.1};
+
+int threads_starting;
+int running;
+
+extern void preempt_fpu(double *darray, int *threads_starting, int *running);
+
+void *preempt_fpu_c(void *p)
+{
+   int i;
+   srand(pthread_self());
+   for (i = 0; i < 21; i++)
+   darray[i] = rand();
+
+   /* Test failed if it ever returns */
+   preempt_fpu(darray, _starting, );
+
+   return p;
+}
+
+int test_preempt_fpu(void)
+{
+   int i, rc, threads;
+   pthread_t *tids;
+
+   threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+   tids = malloc((threads) * sizeof(pthread_t));
+   FAIL_IF(!tids);
+
+   running = true;
+   threads_starting = 

[PATCH] qe_ic: fix a buffer overflow error and add check elsewhere

2016-01-20 Thread Zhao Qiang
127 is the theoretical up boundary of QEIC number,
in fact there only be 44 qe_ic_info now.
add check to overflow for qe_ic_info

Signed-off-by: Zhao Qiang 
---
 drivers/soc/fsl/qe/qe_ic.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c
index 5419527..90c00b7 100644
--- a/drivers/soc/fsl/qe/qe_ic.c
+++ b/drivers/soc/fsl/qe/qe_ic.c
@@ -261,6 +261,11 @@ static int qe_ic_host_map(struct irq_domain *h, unsigned 
int virq,
struct qe_ic *qe_ic = h->host_data;
struct irq_chip *chip;
 
+   if (hw >= ARRAY_SIZE(qe_ic_info)) {
+   pr_err("%s: Invalid hw irq number for QEIC\n", __func__);
+   return -EINVAL;
+   }
+
if (qe_ic_info[hw].mask == 0) {
printk(KERN_ERR "Can't map reserved IRQ\n");
return -EINVAL;
@@ -409,7 +414,8 @@ int qe_ic_set_priority(unsigned int virq, unsigned int 
priority)
 
if (priority > 8 || priority == 0)
return -EINVAL;
-   if (src > 127)
+   if (WARN_ONCE(src >= ARRAY_SIZE(qe_ic_info),
+ "%s: Invalid hw irq number for QEIC\n", __func__))
return -EINVAL;
if (qe_ic_info[src].pri_reg == 0)
return -EINVAL;
@@ -438,6 +444,9 @@ int qe_ic_set_high_priority(unsigned int virq, unsigned int 
priority, int high)
 
if (priority > 2 || priority == 0)
return -EINVAL;
+   if (WARN_ONCE(src >= ARRAY_SIZE(qe_ic_info),
+ "%s: Invalid hw irq number for QEIC\n", __func__))
+   return -EINVAL;
 
switch (qe_ic_info[src].pri_reg) {
case QEIC_CIPZCC:
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: Wire up copy_file_range() syscall

2016-01-20 Thread Michael Ellerman
On Wed, 2016-13-01 at 16:50:22 UTC, Chandan Rajendra wrote:
> Test runs on a ppc64 BE guest succeeded.
> 
> Signed-off-by: Chandan Rajendra 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/d7f9ee60a6ebc263861a1d8c06

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/eeh: Fix PE location code

2016-01-20 Thread Stewart Smith
Sam Mendoza-Jonas  writes:
> On Wed, Jan 20, 2016 at 02:56:13PM +1100, Russell Currey wrote:
>> On Wed, 2015-12-02 at 16:25 +1100, Gavin Shan wrote:
>> > In eeh_pe_loc_get(), the PE location code is retrieved from the
>> > "ibm,loc-code" property of the device node for the bridge of the
>> > PE's primary bus. It's not correct because the property indicates
>> > the parent PE's location code.
>> > 
>> > This reads the correct PE location code from "ibm,io-base-loc-code"
>> > or "ibm,slot-location-code" property of PE parent bus's device node.
>> > 
>> > Signed-off-by: Gavin Shan 
>> > ---
>> 
>> Tested-by: Russell Currey 
>
> Thanks Russell!
>
> W.R.T including this in stable, I don't believe anything actively breaks
> without the patch, but in the event of an EEH freeze the wrong slot for
> the device will be identified, making troubleshooting more difficult.

As someone who's likely going to have to deal with the bug reports for
such things, I like the idea of this going to stable as *maybe* I'll get
fewer of them that I have to close pointing to this commit...

-- 
Stewart Smith
OPAL Architect, IBM.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/4] powerpc: enable text relative kallsyms for ppc64

2016-01-20 Thread Michael Ellerman
On Wed, 2016-01-20 at 10:05 +0100, Ard Biesheuvel wrote:

> This enables the newly introduced text-relative kallsyms support when
> building 64-bit targets. This cuts the size of the kallsyms address
> table in half, and drastically reduces the size of the PIE dynamic
> relocation section when building with CONFIG_RELOCATABLE=y (by about
> 3 MB for ppc64_defconfig)
> 
> Signed-off-by: Ard Biesheuvel 
> ---
> 
> Results for ppc64_defconfig:
> 
> BEFORE:
> ===
> $ size vmlinux
>text  data bss dec hex filename
> 19827996  2008456  849612 2268606415a2970 vmlinux
> 
> $ readelf -S .tmp_kallsyms2.o
> There are 9 section headers, starting at offset 0x4513f8:
> 
> Section Headers:
>   [Nr] Name  Type Address   Offset
>Size  EntSize  Flags  Link  Info  Align
>   ...
>   [ 4] .rodata   PROGBITS   0100
>001fcf00     A   0 0 256
>   [ 5] .rela.rodata  RELA   001fd1d8
>00254220  0018   I   7 4 8
>   [ 6] .shstrtab STRTAB     001fd000
>0039     0 0 1
>   ...
> 
> $ ls -l arch/powerpc/boot/zImage
> -rwxrwxr-x 2 ard ard 7533160 Jan 20 08:43 arch/powerpc/boot/zImage
> 
> AFTER:
> ==
> $ size vmlinux
>text  data bss dec hex filename
> 16979516  2009992  849612 1983912012eb890 vmlinux
> 
> $ readelf -S .tmp_kallsyms2.o
> There are 8 section headers, starting at offset 0x199bb0:
> 
> Section Headers:
>   [Nr] Name  Type Address   Offset
>Size  EntSize  Flags  Link  Info  Align
>   ...
>   [ 4] .rodata   PROGBITS   0100
>00199900     A   0 0 256
>   [ 5] .shstrtab STRTAB     00199a00
>0034     0 0 1
>   ...
> 
> $ ls -l arch/powerpc/boot/zImage
> -rwxrwxr-x 2 ard ard 6985672 Jan 20 08:45 arch/powerpc/boot/zImage


Nice space saving, thanks very much.

I've booted this on a bunch of machines and it seems to be working fine.

Tested-by: Michael Ellerman  (powerpc)

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: remove newly added extra definition of pmd_dirty

2016-01-20 Thread Michael Ellerman
On Thu, 2016-01-21 at 13:05 +1100, Stephen Rothwell wrote:

> Commit d5d6a443b243 ("arch/powerpc/include/asm/pgtable-ppc64.h:
> add pmd_[dirty|mkclean] for THP") added a new identical definition
> of pdm_dirty.  Remove it again.
> 
> Cc: Minchan Kim 
> Cc: Andrew Morton 
> Signed-off-by: Stephen Rothwell 

Thanks.

I have a couple of other things to send to Linus so I'll pick this up.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: remove newly added extra definition of pmd_dirty

2016-01-20 Thread Stephen Rothwell
Commit d5d6a443b243 ("arch/powerpc/include/asm/pgtable-ppc64.h:
add pmd_[dirty|mkclean] for THP") added a new identical definition
of pdm_dirty.  Remove it again.

Cc: Minchan Kim 
Cc: Andrew Morton 
Signed-off-by: Stephen Rothwell 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 8204b0c393aa..8d1c41d28318 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -223,7 +223,6 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 #define pmd_pfn(pmd)   pte_pfn(pmd_pte(pmd))
 #define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd))
 #define pmd_young(pmd) pte_young(pmd_pte(pmd))
-#define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd))
 #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd)))
 #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd)))
 #define pmd_mkdirty(pmd)   pte_pmd(pte_mkdirty(pmd_pte(pmd)))
-- 
2.7.0.rc3

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: remove newly added extra definition of pmd_dirty

2016-01-20 Thread Minchan Kim
On Thu, Jan 21, 2016 at 01:05:20PM +1100, Stephen Rothwell wrote:
> Commit d5d6a443b243 ("arch/powerpc/include/asm/pgtable-ppc64.h:
> add pmd_[dirty|mkclean] for THP") added a new identical definition
> of pdm_dirty.  Remove it again.
> 
> Cc: Minchan Kim 
> Cc: Andrew Morton 
> Signed-off-by: Stephen Rothwell 

Thanks for the fix!
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/4] support for text-relative kallsyms table

2016-01-20 Thread Rusty Russell
Ard Biesheuvel  writes:
> This implements text-relative kallsyms address tables. This was developed
> as part of my series to implement KASLR/CONFIG_RELOCATABLE for arm64, but
> I think it may be beneficial to other architectures as well, so I am
> presenting it as a separate series.

Nice work!

AFAICT this should work for every arch, as long as they start with _text
(esp: data and init must be > _text).  In addition, it's not harmful on
32 bit archs.

IOW, I'd like to turn it on for everyone and discard some code.  But
it's easier to roll in like you've done first.

Should we enable it by default for every arch for now, and see what
happens?

Thanks!
Rusty.

> The idea is that on 64-bit builds, it is rather wasteful to use absolute
> addressing for kernel symbols since they are all within a couple of MBs
> of each other. On top of that, the absolute addressing implies that, when
> the kernel is relocated at runtime, each address in the table needs to be
> fixed up individually.
>
> Since all section-relative addresses are already emitted relative to _text,
> it is quite straight-forward to record only the offset, and add the absolute
> address of _text at runtime when referring to the address table.
>
> The reduction ranges from around 250 KB uncompressed vmlinux size and 10 KB
> compressed size (s390) to 3 MB/500 KB for ppc64 (although, in the latter case,
> the reduction in uncompressed size is primarily __init data)
>
> Kees Cook was so kind to test these against x86_64, and confirmed that KASLR
> still operates as expected.
>
> Ard Biesheuvel (4):
>   kallsyms: add support for relative offsets in kallsyms address table
>   powerpc: enable text relative kallsyms for ppc64
>   s390: enable text relative kallsyms for 64-bit targets
>   x86_64: enable text relative kallsyms for 64-bit targets
>
>  arch/powerpc/Kconfig|  1 +
>  arch/s390/Kconfig   |  1 +
>  arch/x86/Kconfig|  1 +
>  init/Kconfig| 14 
>  kernel/kallsyms.c   | 35 +-
>  scripts/kallsyms.c  | 38 +---
>  scripts/link-vmlinux.sh |  4 +++
>  scripts/namespace.pl|  1 +
>  8 files changed, 82 insertions(+), 13 deletions(-)
>
> -- 
> 2.5.0
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: remove newly added extra definition of pmd_dirty

2016-01-20 Thread Michael Ellerman
On Thu, 2016-21-01 at 02:05:20 UTC, Stephen Rothwell wrote:
> Commit d5d6a443b243 ("arch/powerpc/include/asm/pgtable-ppc64.h:
> add pmd_[dirty|mkclean] for THP") added a new identical definition
> of pdm_dirty.  Remove it again.
> 
> Cc: Minchan Kim 
> Cc: Andrew Morton 
> Signed-off-by: Stephen Rothwell 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/0e2bce7411542fa336ef490414

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls

2016-01-20 Thread Alexey Kardashevskiy
This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
devices or emulated PCI.  These calls allow adding multiple entries
(up to 512) into the TCE table in one call which saves time on
transition between kernel and user space.

This implements the KVM_CAP_PPC_MULTITCE capability. When present,
the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
If they can not be handled by the kernel, they are passed on to
the user space. The user space still has to have an implementation
for these.

Both HV and PR-syle KVM are supported.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
* s/~IOMMU_PAGE_MASK_4K/SZ_4K-1/ when testing @tce_list
---
 Documentation/virtual/kvm/api.txt   |  25 ++
 arch/powerpc/include/asm/kvm_ppc.h  |  12 +++
 arch/powerpc/kvm/book3s_64_vio.c| 110 +++-
 arch/powerpc/kvm/book3s_64_vio_hv.c | 145 ++--
 arch/powerpc/kvm/book3s_hv.c|  26 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
 arch/powerpc/kvm/book3s_pr_papr.c   |  35 
 arch/powerpc/kvm/powerpc.c  |   3 +
 8 files changed, 349 insertions(+), 13 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 07e4cdf..da39435 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3035,6 +3035,31 @@ Returns: 0 on success, -1 on error
 
 Queues an SMI on the thread's vcpu.
 
+4.97 KVM_CAP_PPC_MULTITCE
+
+Capability: KVM_CAP_PPC_MULTITCE
+Architectures: ppc
+Type: vm
+
+This capability means the kernel is capable of handling hypercalls
+H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
+space. This significantly accelerates DMA operations for PPC KVM guests.
+User space should expect that its handlers for these hypercalls
+are not going to be called if user space previously registered LIOBN
+in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
+
+In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+The hypercalls mentioned above may or may not be processed successfully
+in the kernel based fast path. If they can not be handled by the kernel,
+they will get passed on to user space. So user space still has to have
+an implementation for these despite the in kernel acceleration.
+
+This capability is always enabled.
+
 5. The kvm_run structure
 
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 9513911..4cadee5 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
+extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
+   struct kvm_vcpu *vcpu, unsigned long liobn);
 extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
unsigned long ioba, unsigned long npages);
 extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
unsigned long tce);
+extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+   unsigned long *ua, unsigned long **prmap);
+extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
+   unsigned long idx, unsigned long tce);
 extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 unsigned long ioba, unsigned long tce);
+extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+   unsigned long liobn, unsigned long ioba,
+   unsigned long tce_list, unsigned long npages);
+extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+   unsigned long liobn, unsigned long ioba,
+   unsigned long tce_value, unsigned long npages);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 unsigned long ioba);
 extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 975f0ab..987f406 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -14,6 +14,7 @@
  *
  * Copyright 2010 Paul Mackerras, IBM Corp. 
  * Copyright 2011 David Gibson, IBM Corporation 
+ * Copyright 2016 Alexey Kardashevskiy, IBM Corporation 
  */
 
 #include 
@@ -37,8 +38,7 @@
 #include 
 #include 
 #include 
-
-#define TCES_PER_PAGE  

[PATCH kernel v2 0/6] KVM: PPC: Add in-kernel multitce handling

2016-01-20 Thread Alexey Kardashevskiy
These patches enable in-kernel acceleration for H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls which allow doing multiple (up to 512) TCE entries
update in a single call saving time on switching context. QEMU already
supports these hypercalls so this is just an optimization.

Both HV and PR KVM modes are supported.

This does not affect VFIO, this support is coming next.

This depends on "powerpc: Make vmalloc_to_phys() public".

Please comment. Thanks.


Alexey Kardashevskiy (6):
  KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
  KVM: PPC: Use RCU for arch.spapr_tce_tables
  KVM: PPC: Account TCE-containing pages in locked_vm
  KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K
  KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
  KVM: PPC: Add support for multiple-TCE hcalls

 Documentation/virtual/kvm/api.txt|  25 +++
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 -
 arch/powerpc/include/asm/kvm_host.h  |   1 +
 arch/powerpc/include/asm/kvm_ppc.h   |  16 ++
 arch/powerpc/kvm/book3s.c|   2 +-
 arch/powerpc/kvm/book3s_64_vio.c | 188 --
 arch/powerpc/kvm/book3s_64_vio_hv.c  | 318 ++-
 arch/powerpc/kvm/book3s_hv.c |  26 ++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   6 +-
 arch/powerpc/kvm/book3s_pr_papr.c|  35 
 arch/powerpc/kvm/powerpc.c   |   3 +
 11 files changed, 557 insertions(+), 65 deletions(-)

-- 
2.5.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel v2 4/6] KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K

2016-01-20 Thread Alexey Kardashevskiy
SPAPR_TCE_SHIFT is used in few places only and since IOMMU_PAGE_SHIFT_4K
can be easily used instead, remove SPAPR_TCE_SHIFT.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 2 --
 arch/powerpc/kvm/book3s_64_vio.c | 3 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c  | 4 ++--
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2aa79c8..7529aab 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -33,8 +33,6 @@ static inline void svcpu_put(struct kvmppc_book3s_shadow_vcpu 
*svcpu)
 }
 #endif
 
-#define SPAPR_TCE_SHIFT12
-
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 #define KVM_DEFAULT_HPT_ORDER  24  /* 16MB HPT by default */
 #endif
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index ea498b4..975f0ab 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -36,12 +36,13 @@
 #include 
 #include 
 #include 
+#include 
 
 #define TCES_PER_PAGE  (PAGE_SIZE / sizeof(u64))
 
 static unsigned long kvmppc_stt_npages(unsigned long window_size)
 {
-   return ALIGN((window_size >> SPAPR_TCE_SHIFT)
+   return ALIGN((window_size >> IOMMU_PAGE_SHIFT_4K)
 * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 862f9a2..e142171 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -99,7 +99,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long 
liobn,
if (ret != H_SUCCESS)
return ret;
 
-   idx = ioba >> SPAPR_TCE_SHIFT;
+   idx = ioba >> IOMMU_PAGE_SHIFT_4K;
page = stt->pages[idx / TCES_PER_PAGE];
tbl = (u64 *)page_address(page);
 
@@ -127,7 +127,7 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long 
liobn,
if (ret != H_SUCCESS)
return ret;
 
-   idx = ioba >> SPAPR_TCE_SHIFT;
+   idx = ioba >> IOMMU_PAGE_SHIFT_4K;
page = stt->pages[idx / TCES_PER_PAGE];
tbl = (u64 *)page_address(page);
 
-- 
2.5.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel v2 3/6] KVM: PPC: Account TCE-containing pages in locked_vm

2016-01-20 Thread Alexey Kardashevskiy
At the moment pages used for TCE tables (in addition to pages addressed
by TCEs) are not counted in locked_vm counter so a malicious userspace
tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
lock a lot of memory.

This adds counting for pages used for TCE tables.

This counts the number of pages required for a table plus pages for
the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.

This changes release_spapr_tce_table() to store @npages on stack to
avoid calling kvmppc_stt_npages() in the loop (tiny optimization,
probably).

This does not change the amount of (de)allocated memory.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* switched from long to unsigned long types
* added WARN_ON_ONCE() in locked_vm decrement case
---
 arch/powerpc/kvm/book3s_64_vio.c | 55 +---
 1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 9526c34..ea498b4 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -39,19 +39,62 @@
 
 #define TCES_PER_PAGE  (PAGE_SIZE / sizeof(u64))
 
-static long kvmppc_stt_npages(unsigned long window_size)
+static unsigned long kvmppc_stt_npages(unsigned long window_size)
 {
return ALIGN((window_size >> SPAPR_TCE_SHIFT)
 * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
+static long kvmppc_account_memlimit(unsigned long npages, bool inc)
+{
+   long ret = 0;
+   const unsigned long bytes = sizeof(struct kvmppc_spapr_tce_table) +
+   (npages * sizeof(struct page *));
+   const unsigned long stt_pages = ALIGN(bytes, PAGE_SIZE) / PAGE_SIZE;
+
+   if (!current || !current->mm)
+   return ret; /* process exited */
+
+   npages += stt_pages;
+
+   down_write(>mm->mmap_sem);
+
+   if (inc) {
+   unsigned long locked, lock_limit;
+
+   locked = current->mm->locked_vm + npages;
+   lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+   if (locked > lock_limit && !capable(CAP_IPC_LOCK))
+   ret = -ENOMEM;
+   else
+   current->mm->locked_vm += npages;
+   } else {
+   if (WARN_ON_ONCE(npages > current->mm->locked_vm))
+   npages = current->mm->locked_vm;
+
+   current->mm->locked_vm -= npages;
+   }
+
+   pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
+   inc ? '+' : '-',
+   npages << PAGE_SHIFT,
+   current->mm->locked_vm << PAGE_SHIFT,
+   rlimit(RLIMIT_MEMLOCK),
+   ret ? " - exceeded" : "");
+
+   up_write(>mm->mmap_sem);
+
+   return ret;
+}
+
 static void release_spapr_tce_table(struct rcu_head *head)
 {
struct kvmppc_spapr_tce_table *stt = container_of(head,
struct kvmppc_spapr_tce_table, rcu);
int i;
+   unsigned long npages = kvmppc_stt_npages(stt->window_size);
 
-   for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
+   for (i = 0; i < npages; i++)
__free_page(stt->pages[i]);
 
kfree(stt);
@@ -89,6 +132,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct 
file *filp)
 
kvm_put_kvm(stt->kvm);
 
+   kvmppc_account_memlimit(kvmppc_stt_npages(stt->window_size), false);
call_rcu(>rcu, release_spapr_tce_table);
 
return 0;
@@ -103,7 +147,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
   struct kvm_create_spapr_tce *args)
 {
struct kvmppc_spapr_tce_table *stt = NULL;
-   long npages;
+   unsigned long npages;
int ret = -ENOMEM;
int i;
 
@@ -114,6 +158,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
}
 
npages = kvmppc_stt_npages(args->window_size);
+   ret = kvmppc_account_memlimit(npages, true);
+   if (ret) {
+   stt = NULL;
+   goto fail;
+   }
 
stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
  GFP_KERNEL);
-- 
2.5.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers

2016-01-20 Thread Alexey Kardashevskiy
This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have following
patches applied nicer.

This moves the ioba boundaries check to a helper and adds a check for
least bits which have to be zeros.

The patch is pretty mechanical (only check for least ioba bits is added)
so no change in behaviour is expected.

Signed-off-by: Alexey Kardashevskiy 
---
Changelog:
v2:
* compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
* made error reporting cleaner
---
 arch/powerpc/kvm/book3s_64_vio_hv.c | 111 +++-
 1 file changed, 72 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 89e96b3..862f9a2 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -35,71 +35,104 @@
 #include 
 #include 
 #include 
+#include 
 
 #define TCES_PER_PAGE  (PAGE_SIZE / sizeof(u64))
 
+/*
+ * Finds a TCE table descriptor by LIOBN.
+ *
+ * WARNING: This will be called in real or virtual mode on HV KVM and virtual
+ *  mode on PR KVM
+ */
+static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
+   unsigned long liobn)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvmppc_spapr_tce_table *stt;
+
+   list_for_each_entry_lockless(stt, >arch.spapr_tce_tables, list)
+   if (stt->liobn == liobn)
+   return stt;
+
+   return NULL;
+}
+
+/*
+ * Validates IO address.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *  mode on PR KVM
+ */
+static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+   unsigned long ioba, unsigned long npages)
+{
+   unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
+   unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
+   unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
+
+   if ((ioba & mask) || (idx + npages > size))
+   return H_PARAMETER;
+
+   return H_SUCCESS;
+}
+
 /* WARNING: This will be called in real-mode on HV KVM and virtual
  *  mode on PR KVM
  */
 long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
  unsigned long ioba, unsigned long tce)
 {
-   struct kvm *kvm = vcpu->kvm;
-   struct kvmppc_spapr_tce_table *stt;
+   struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
+   long ret;
+   unsigned long idx;
+   struct page *page;
+   u64 *tbl;
 
/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
/*  liobn, ioba, tce); */
 
-   list_for_each_entry(stt, >arch.spapr_tce_tables, list) {
-   if (stt->liobn == liobn) {
-   unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
-   struct page *page;
-   u64 *tbl;
+   if (!stt)
+   return H_TOO_HARD;
 
-   /* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p  
window_size=0x%x\n", */
-   /*  liobn, stt, stt->window_size); */
-   if (ioba >= stt->window_size)
-   return H_PARAMETER;
+   ret = kvmppc_ioba_validate(stt, ioba, 1);
+   if (ret != H_SUCCESS)
+   return ret;
 
-   page = stt->pages[idx / TCES_PER_PAGE];
-   tbl = (u64 *)page_address(page);
+   idx = ioba >> SPAPR_TCE_SHIFT;
+   page = stt->pages[idx / TCES_PER_PAGE];
+   tbl = (u64 *)page_address(page);
 
-   /* FIXME: Need to validate the TCE itself */
-   /* udbg_printf("tce @ %p\n", [idx % 
TCES_PER_PAGE]); */
-   tbl[idx % TCES_PER_PAGE] = tce;
-   return H_SUCCESS;
-   }
-   }
+   /* FIXME: Need to validate the TCE itself */
+   /* udbg_printf("tce @ %p\n", [idx % TCES_PER_PAGE]); */
+   tbl[idx % TCES_PER_PAGE] = tce;
 
-   /* Didn't find the liobn, punt it to userspace */
-   return H_TOO_HARD;
+   return H_SUCCESS;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
 
 long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
- unsigned long ioba)
+   unsigned long ioba)
 {
-   struct kvm *kvm = vcpu->kvm;
-   struct kvmppc_spapr_tce_table *stt;
+   struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
+   long ret;
+   unsigned long idx;
+   struct page *page;
+   u64 *tbl;
 
-   list_for_each_entry(stt, >arch.spapr_tce_tables, list) {
-   if (stt->liobn == liobn) {
-   unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
-   struct page *page;
-   u64 *tbl;
+   if (!stt)
+   return H_TOO_HARD;
 
-   if (ioba >= stt->window_size)
-   

[PATCH kernel v2 2/6] KVM: PPC: Use RCU for arch.spapr_tce_tables

2016-01-20 Thread Alexey Kardashevskiy
At the moment spapr_tce_tables is not protected against races. This makes
use of RCU-variants of list helpers. As some bits are executed in real
mode, this makes use of just introduced list_for_each_entry_rcu_notrace().

This converts release_spapr_tce_table() to a RCU scheduled handler.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
---
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s.c   |  2 +-
 arch/powerpc/kvm/book3s_64_vio.c| 20 +++-
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 271fefb..c7ee696 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -184,6 +184,7 @@ struct kvmppc_spapr_tce_table {
struct kvm *kvm;
u64 liobn;
u32 window_size;
+   struct rcu_head rcu;
struct page *pages[0];
 };
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 638c6d9..b34220d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -807,7 +807,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 {
 
 #ifdef CONFIG_PPC64
-   INIT_LIST_HEAD(>arch.spapr_tce_tables);
+   INIT_LIST_HEAD_RCU(>arch.spapr_tce_tables);
INIT_LIST_HEAD(>arch.rtas_tokens);
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 54cf9bc..9526c34 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -45,19 +45,16 @@ static long kvmppc_stt_npages(unsigned long window_size)
 * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
-static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)
+static void release_spapr_tce_table(struct rcu_head *head)
 {
-   struct kvm *kvm = stt->kvm;
+   struct kvmppc_spapr_tce_table *stt = container_of(head,
+   struct kvmppc_spapr_tce_table, rcu);
int i;
 
-   mutex_lock(>lock);
-   list_del(>list);
for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
__free_page(stt->pages[i]);
+
kfree(stt);
-   mutex_unlock(>lock);
-
-   kvm_put_kvm(kvm);
 }
 
 static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault 
*vmf)
@@ -88,7 +85,12 @@ static int kvm_spapr_tce_release(struct inode *inode, struct 
file *filp)
 {
struct kvmppc_spapr_tce_table *stt = filp->private_data;
 
-   release_spapr_tce_table(stt);
+   list_del_rcu(>list);
+
+   kvm_put_kvm(stt->kvm);
+
+   call_rcu(>rcu, release_spapr_tce_table);
+
return 0;
 }
 
@@ -131,7 +133,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
kvm_get_kvm(kvm);
 
mutex_lock(>lock);
-   list_add(>list, >arch.spapr_tce_tables);
+   list_add_rcu(>list, >arch.spapr_tce_tables);
 
mutex_unlock(>lock);
 
-- 
2.5.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls

2016-01-20 Thread kbuild test robot
Hi Alexey,

[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v4.4 next-20160121]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Alexey-Kardashevskiy/KVM-PPC-Add-in-kernel-multitce-handling/20160121-154336
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: powerpc-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All error/warnings (new ones prefixed by >>):

   arch/powerpc/kvm/book3s_64_vio_hv.c: In function 'kvmppc_find_table':
   arch/powerpc/kvm/book3s_64_vio_hv.c:58:2: error: implicit declaration of 
function 'list_for_each_entry_lockless' [-Werror=implicit-function-declaration]
 list_for_each_entry_lockless(stt, >arch.spapr_tce_tables, list)
 ^
   arch/powerpc/kvm/book3s_64_vio_hv.c:58:65: error: 'list' undeclared (first 
use in this function)
 list_for_each_entry_lockless(stt, >arch.spapr_tce_tables, list)
^
   arch/powerpc/kvm/book3s_64_vio_hv.c:58:65: note: each undeclared identifier 
is reported only once for each function it appears in
   arch/powerpc/kvm/book3s_64_vio_hv.c:59:3: error: expected ';' before 'if'
  if (stt->liobn == liobn)
  ^
   arch/powerpc/kvm/book3s_64_vio_hv.c: In function 
'kvmppc_rm_h_put_tce_indirect':
>> arch/powerpc/kvm/book3s_64_vio_hv.c:263:18: error: implicit declaration of 
>> function 'vmalloc_to_phys' [-Werror=implicit-function-declaration]
 rmap = (void *) vmalloc_to_phys(rmap);
 ^
>> arch/powerpc/kvm/book3s_64_vio_hv.c:263:9: warning: cast to pointer from 
>> integer of different size [-Wint-to-pointer-cast]
 rmap = (void *) vmalloc_to_phys(rmap);
^
   cc1: some warnings being treated as errors

vim +/vmalloc_to_phys +263 arch/powerpc/kvm/book3s_64_vio_hv.c

   257  if (ret != H_SUCCESS)
   258  return ret;
   259  
   260  if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , ))
   261  return H_TOO_HARD;
   262  
 > 263  rmap = (void *) vmalloc_to_phys(rmap);
   264  
   265  lock_rmap(rmap);
   266  if (kvmppc_rm_ua_to_hpa(vcpu, ua, )) {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel] powerpc: Make vmalloc_to_phys() public

2016-01-20 Thread Alexey Kardashevskiy
This makes vmalloc_to_phys() public as there will be another user
(in-kernel VFIO acceleration) for it soon.

As a part of future little optimization, this changes the helper to call
vmalloc_to_pfn() instead of vmalloc_to_page() as the size of the
struct page may not be power-of-two aligned which will make gcc use
multiply instructions instead of shifts.

Signed-off-by: Alexey Kardashevskiy 
---

A couple of notes:

1. real_vmalloc_addr() will be reworked later by Paul separately;

2. the optimization note it not valid at the moment as
vmalloc_to_pfn() calls vmalloc_to_page() which does the actual
search; these helpers functionality will be swapped later
(also, by Paul).

---
 arch/powerpc/include/asm/pgtable.h | 3 +++
 arch/powerpc/mm/pgtable.c  | 8 
 arch/powerpc/perf/hv-24x7.c| 8 
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index ac9fb11..47897a3 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -78,6 +78,9 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, 
unsigned long ea,
}
return __find_linux_pte_or_hugepte(pgdir, ea, is_thp, shift);
 }
+
+unsigned long vmalloc_to_phys(void *vmalloc_addr);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 83dfd79..de37ff4 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -243,3 +243,11 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long 
addr)
 }
 #endif /* CONFIG_DEBUG_VM */
 
+unsigned long vmalloc_to_phys(void *va)
+{
+   unsigned long pfn = vmalloc_to_pfn(va);
+
+   BUG_ON(!pfn);
+   return __pa(pfn_to_kaddr(pfn)) + offset_in_page(va);
+}
+EXPORT_SYMBOL_GPL(vmalloc_to_phys);
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 9f9dfda..3b09ecf 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -493,14 +493,6 @@ static size_t event_to_attr_ct(struct hv_24x7_event_data 
*event)
}
 }
 
-static unsigned long vmalloc_to_phys(void *v)
-{
-   struct page *p = vmalloc_to_page(v);
-
-   BUG_ON(!p);
-   return page_to_phys(p) + offset_in_page(v);
-}
-
 /* */
 struct event_uniq {
struct rb_node node;
-- 
2.5.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel v2 5/6] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers

2016-01-20 Thread Alexey Kardashevskiy
Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
will validate TCE (not to have unexpected bits) and IO address
(to be within the DMA window boundaries).

This introduces helpers to validate TCE and IO address. The helpers are
exported as they compile into vmlinux (to work in realmode) and will be
used later by KVM kernel module in virtual mode.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* added note to the commit log about why new helpers are exported
* did not add a note that xxx_validate() validate TCEs for KVM (not for
host kernel DMA) as the helper names and file location tell what are
they for
---
 arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c | 92 -
 2 files changed, 84 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 2241d53..9513911 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
+extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+   unsigned long ioba, unsigned long npages);
+extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
+   unsigned long tce);
 extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 unsigned long ioba, unsigned long tce);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index e142171..8cd3a95 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define TCES_PER_PAGE  (PAGE_SIZE / sizeof(u64))
 
@@ -64,18 +65,90 @@ static struct kvmppc_spapr_tce_table 
*kvmppc_find_table(struct kvm_vcpu *vcpu,
  * WARNING: This will be called in real-mode on HV KVM and virtual
  *  mode on PR KVM
  */
-static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
unsigned long ioba, unsigned long npages)
 {
-   unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
+   unsigned long mask = IOMMU_PAGE_MASK_4K;
unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
 
-   if ((ioba & mask) || (idx + npages > size))
+   if ((ioba & ~mask) || (idx + npages > size))
return H_PARAMETER;
 
return H_SUCCESS;
 }
+EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);
+
+/*
+ * Validates TCE address.
+ * At the moment flags and page mask are validated.
+ * As the host kernel does not access those addresses (just puts them
+ * to the table and user space is supposed to process them), we can skip
+ * checking other things (such as TCE is a guest RAM address or the page
+ * was actually allocated).
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *  mode on PR KVM
+ */
+long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
+{
+   unsigned long mask = IOMMU_PAGE_MASK_4K | TCE_PCI_WRITE | TCE_PCI_READ;
+
+   if (tce & ~mask)
+   return H_PARAMETER;
+
+   return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
+
+/* Note on the use of page_address() in real mode,
+ *
+ * It is safe to use page_address() in real mode on ppc64 because
+ * page_address() is always defined as lowmem_page_address()
+ * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
+ * operation and does not access page struct.
+ *
+ * Theoretically page_address() could be defined different
+ * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
+ * should be enabled.
+ * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
+ * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
+ * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
+ * is not expected to be enabled on ppc32, page_address()
+ * is safe for ppc32 as well.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *  mode on PR KVM
+ */
+static u64 *kvmppc_page_address(struct page *page)
+{
+#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
+#error TODO: fix to avoid page_address() here
+#endif
+   return (u64 *) page_address(page);
+}
+
+/*
+ * Handles TCE requests for emulated devices.
+ * Puts guest TCE values to the table and expects user space to convert them.
+ * Called in both real and virtual modes.
+ * Cannot fail so kvmppc_tce_validate must be called before it.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *  mode on PR KVM
+ */

[PATCH v3 4/9] powerpc: Explicitly disable math features when copying thread

2016-01-20 Thread Cyril Bur
Currently when threads get scheduled off they always giveup the FPU,
Altivec (VMX) and Vector (VSX) units if they were using them. When they are
scheduled back on a fault is then taken to enable each facility and load
registers. As a result explicitly disabling FPU/VMX/VSX has not been
necessary.

Future changes and optimisations remove this mandatory giveup and fault
which could cause calls such as clone() and fork() to copy threads and run
them later with FPU/VMX/VSX enabled but no registers loaded.

This patch starts the process of having MSR_{FP,VEC,VSX} mean that a
threads registers are hot while not having MSR_{FP,VEC,VSX} means that the
registers must be loaded. This allows for a smarter return to userspace.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/kernel/process.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index dccc87e..e0c3d2d 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1307,6 +1307,7 @@ int copy_thread(unsigned long clone_flags, unsigned long 
usp,
 
f = ret_from_fork;
}
+   childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
sp -= STACK_FRAME_OVERHEAD;
 
/*
-- 
2.7.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/9] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall

2016-01-20 Thread Cyril Bur
Test that the non volatile floating point and Altivec registers get
correctly preserved across the fork() syscall.

fork() works nicely for this purpose, the registers should be the same for
both parent and child

Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/Makefile   |   3 +-
 tools/testing/selftests/powerpc/basic_asm.h|  26 +++
 tools/testing/selftests/powerpc/math/.gitignore|   2 +
 tools/testing/selftests/powerpc/math/Makefile  |  14 ++
 tools/testing/selftests/powerpc/math/fpu_asm.S | 161 +
 tools/testing/selftests/powerpc/math/fpu_syscall.c |  90 ++
 tools/testing/selftests/powerpc/math/vmx_asm.S | 193 +
 tools/testing/selftests/powerpc/math/vmx_syscall.c |  92 ++
 8 files changed, 580 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/basic_asm.h
 create mode 100644 tools/testing/selftests/powerpc/math/.gitignore
 create mode 100644 tools/testing/selftests/powerpc/math/Makefile
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c

diff --git a/tools/testing/selftests/powerpc/Makefile 
b/tools/testing/selftests/powerpc/Makefile
index 0c2706b..19e8191 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -22,7 +22,8 @@ SUB_DIRS = benchmarks \
   switch_endian\
   syscalls \
   tm   \
-  vphn
+  vphn \
+  math
 
 endif
 
diff --git a/tools/testing/selftests/powerpc/basic_asm.h 
b/tools/testing/selftests/powerpc/basic_asm.h
new file mode 100644
index 000..27aca79
--- /dev/null
+++ b/tools/testing/selftests/powerpc/basic_asm.h
@@ -0,0 +1,26 @@
+#include 
+#include 
+
+#define LOAD_REG_IMMEDIATE(reg,expr) \
+   lis reg,(expr)@highest; \
+   ori reg,reg,(expr)@higher;  \
+   rldicr  reg,reg,32,31;  \
+   orisreg,reg,(expr)@high;\
+   ori reg,reg,(expr)@l;
+
+#define PUSH_BASIC_STACK(size) \
+   std 2,24(sp); \
+   mflrr0; \
+   std r0,16(sp); \
+   mfcrr0; \
+   stw r0,8(sp); \
+   stdusp,-size(sp);
+
+#define POP_BASIC_STACK(size) \
+   addisp,sp,size; \
+   ld  2,24(sp); \
+   ld  r0,16(sp); \
+   mtlrr0; \
+   lwz r0,8(sp); \
+   mtcrr0; \
+
diff --git a/tools/testing/selftests/powerpc/math/.gitignore 
b/tools/testing/selftests/powerpc/math/.gitignore
new file mode 100644
index 000..b19b269
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/.gitignore
@@ -0,0 +1,2 @@
+fpu_syscall
+vmx_syscall
diff --git a/tools/testing/selftests/powerpc/math/Makefile 
b/tools/testing/selftests/powerpc/math/Makefile
new file mode 100644
index 000..418bef1
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -0,0 +1,14 @@
+TEST_PROGS := fpu_syscall vmx_syscall
+
+all: $(TEST_PROGS)
+
+$(TEST_PROGS): ../harness.c
+$(TEST_PROGS): CFLAGS += -O2 -g -pthread -m64 -maltivec
+
+fpu_syscall: fpu_asm.S
+vmx_syscall: vmx_asm.S
+
+include ../../lib.mk
+
+clean:
+   rm -f $(TEST_PROGS) *.o
diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S 
b/tools/testing/selftests/powerpc/math/fpu_asm.S
new file mode 100644
index 000..8733874
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
@@ -0,0 +1,161 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "../basic_asm.h"
+
+#define PUSH_FPU(pos) \
+   stfdf14,pos(sp); \
+   stfdf15,pos+8(sp); \
+   stfdf16,pos+16(sp); \
+   stfdf17,pos+24(sp); \
+   stfdf18,pos+32(sp); \
+   stfdf19,pos+40(sp); \
+   stfdf20,pos+48(sp); \
+   stfdf21,pos+56(sp); \
+   stfdf22,pos+64(sp); \
+   stfdf23,pos+72(sp); \
+   stfdf24,pos+80(sp); \
+   stfdf25,pos+88(sp); \
+   stfdf26,pos+96(sp); \
+   stfdf27,pos+104(sp); \
+   stfdf28,pos+112(sp); \
+   stfdf29,pos+120(sp); \
+   stfdf30,pos+128(sp); \
+   stfdf31,pos+136(sp);
+
+#define POP_FPU(pos) \
+   lfd f14,pos(sp); \
+   lfd f15,pos+8(sp); \
+   lfd f16,pos+16(sp); \
+   lfd f17,pos+24(sp); \
+   lfd f18,pos+32(sp); \
+   lfd f19,pos+40(sp); \
+   lfd f20,pos+48(sp); \
+   lfd f21,pos+56(sp); \
+   lfd f22,pos+64(sp); \
+   lfd f23,pos+72(sp); 

[PATCH v3 5/9] powerpc: Restore FPU/VEC/VSX if previously used

2016-01-20 Thread Cyril Bur
Currently the FPU, VEC and VSX facilities are lazily loaded. This is not a
problem unless a process is using these facilities.

Modern versions of GCC are very good at automatically vectorising code, new
and modernised workloads make use of floating point and vector facilities,
even the kernel makes use of vectorised memcpy.

All this combined greatly increases the cost of a syscall since the kernel
uses the facilities sometimes even in syscall fast-path making it
increasingly common for a thread to take an *_unavailable exception soon
after a syscall, not to mention potentially taking all three.

The obvious overcompensation to this problem is to simply always load all
the facilities on every exit to userspace. Loading up all FPU, VEC and VSX
registers every time can be expensive and if a workload does avoid using
them, it should not be forced to incur this penalty.

An 8bit counter is used to detect if the registers have been used in the
past and the registers are always loaded until the value wraps to back to
zero.

Several versions of the assembly in entry_64.S. 1. Always calling C, 2.
Performing a common case check and then calling C and 3. A complex check in
asm. After some benchmarking it was determined that avoiding C in the
common case is a performance benefit. The full check in asm greatly
complicated that codepath for a negligible performance gain and the
trade-off was deemed not worth it.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/processor.h |  2 +
 arch/powerpc/kernel/asm-offsets.c|  2 +
 arch/powerpc/kernel/entry_64.S   | 21 +++--
 arch/powerpc/kernel/fpu.S|  4 ++
 arch/powerpc/kernel/process.c| 84 +++-
 arch/powerpc/kernel/vector.S |  4 ++
 6 files changed, 103 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index ac23308..dcab21f 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -236,11 +236,13 @@ struct thread_struct {
 #endif
struct arch_hw_breakpoint hw_brk; /* info on the hardware breakpoint */
unsigned long   trap_nr;/* last trap # on this thread */
+   u8 load_fp;
 #ifdef CONFIG_ALTIVEC
struct thread_vr_state vr_state;
struct thread_vr_state *vr_save_area;
unsigned long   vrsave;
int used_vr;/* set if process has used altivec */
+   u8 load_vec;
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
/* VSR status */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 07cebc3..10d5eab 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -95,12 +95,14 @@ int main(void)
DEFINE(THREAD_FPSTATE, offsetof(struct thread_struct, fp_state));
DEFINE(THREAD_FPSAVEAREA, offsetof(struct thread_struct, fp_save_area));
DEFINE(FPSTATE_FPSCR, offsetof(struct thread_fp_state, fpscr));
+   DEFINE(THREAD_LOAD_FP, offsetof(struct thread_struct, load_fp));
 #ifdef CONFIG_ALTIVEC
DEFINE(THREAD_VRSTATE, offsetof(struct thread_struct, vr_state));
DEFINE(THREAD_VRSAVEAREA, offsetof(struct thread_struct, vr_save_area));
DEFINE(THREAD_VRSAVE, offsetof(struct thread_struct, vrsave));
DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
DEFINE(VRSTATE_VSCR, offsetof(struct thread_vr_state, vscr));
+   DEFINE(THREAD_LOAD_VEC, offsetof(struct thread_struct, load_vec));
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 0d525ce..038e0a1 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -210,7 +210,20 @@ system_call:   /* label this so stack 
traces look sane */
li  r11,-MAX_ERRNO
andi.   
r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
bne-syscall_exit_work
-   cmpld   r3,r11
+
+   andi.   r0,r8,MSR_FP
+   beq 2f
+#ifdef CONFIG_ALTIVEC
+   andis.  r0,r8,MSR_VEC@h
+   bne 3f
+#endif
+2: addir3,r1,STACK_FRAME_OVERHEAD
+   bl  restore_math
+   ld  r8,_MSR(r1)
+   ld  r3,RESULT(r1)
+   li  r11,-MAX_ERRNO
+
+3: cmpld   r3,r11
ld  r5,_CCR(r1)
bge-syscall_error
 .Lsyscall_error_cont:
@@ -602,8 +615,8 @@ _GLOBAL(ret_from_except_lite)
 
/* Check current_thread_info()->flags */
andi.   r0,r4,_TIF_USER_WORK_MASK
-#ifdef CONFIG_PPC_BOOK3E
bne 1f
+#ifdef CONFIG_PPC_BOOK3E
/*
 * Check to see if the dbcr0 register is set up to debug.
 * Use the internal debug mode bit to do this.
@@ -618,7 +631,9 @@ _GLOBAL(ret_from_except_lite)
mtspr   

[PATCH v3 9/9] powerpc: Add the ability to save VSX without giving it up

2016-01-20 Thread Cyril Bur
This patch adds the ability to be able to save the VSX registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.

This patch builds on a previous optimisation for the FPU and VEC registers
in the thread copy path to avoid a possibly pointless reload of VSX state.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/switch_to.h |  4 
 arch/powerpc/kernel/ppc_ksyms.c  |  4 
 arch/powerpc/kernel/process.c| 42 +---
 arch/powerpc/kernel/vector.S | 17 ---
 4 files changed, 30 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 9028822..17c8380 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -56,14 +56,10 @@ static inline void __giveup_altivec(struct task_struct *t) 
{ }
 #ifdef CONFIG_VSX
 extern void enable_kernel_vsx(void);
 extern void flush_vsx_to_thread(struct task_struct *);
-extern void giveup_vsx(struct task_struct *);
-extern void __giveup_vsx(struct task_struct *);
 static inline void disable_kernel_vsx(void)
 {
msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
 }
-#else
-static inline void __giveup_vsx(struct task_struct *t) { }
 #endif
 
 #ifdef CONFIG_SPE
diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index 41e1607..ef7024da 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -28,10 +28,6 @@ EXPORT_SYMBOL(load_vr_state);
 EXPORT_SYMBOL(store_vr_state);
 #endif
 
-#ifdef CONFIG_VSX
-EXPORT_SYMBOL_GPL(__giveup_vsx);
-#endif
-
 #ifdef CONFIG_EPAPR_PARAVIRT
 EXPORT_SYMBOL(epapr_hypercall_start);
 #endif
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index fef5b7d..7c3dd30 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -280,19 +280,31 @@ static inline int restore_altivec(struct task_struct 
*tsk) { return 0; }
 #endif /* CONFIG_ALTIVEC */
 
 #ifdef CONFIG_VSX
-void giveup_vsx(struct task_struct *tsk)
+static void __giveup_vsx(struct task_struct *tsk)
 {
-   check_if_tm_restore_required(tsk);
-
-   msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
if (tsk->thread.regs->msr & MSR_FP)
__giveup_fpu(tsk);
if (tsk->thread.regs->msr & MSR_VEC)
__giveup_altivec(tsk);
+   tsk->thread.regs->msr &= ~MSR_VSX;
+}
+
+static void giveup_vsx(struct task_struct *tsk)
+{
+   check_if_tm_restore_required(tsk);
+
+   msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
__giveup_vsx(tsk);
msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
 }
-EXPORT_SYMBOL(giveup_vsx);
+
+static void save_vsx(struct task_struct *tsk)
+{
+   if (tsk->thread.regs->msr & MSR_FP)
+   save_fpu(tsk);
+   if (tsk->thread.regs->msr & MSR_VEC)
+   save_altivec(tsk);
+}
 
 void enable_kernel_vsx(void)
 {
@@ -331,6 +343,7 @@ static int restore_vsx(struct task_struct *tsk)
 }
 #else
 static inline int restore_vsx(struct task_struct *tsk) { return 0; }
+static inline void save_vsx(struct task_struct *tsk) { }
 #endif /* CONFIG_VSX */
 
 #ifdef CONFIG_SPE
@@ -482,14 +495,19 @@ void save_all(struct task_struct *tsk)
 
msr_check_and_set(msr_all_available);
 
-   if (usermsr & MSR_FP)
-   save_fpu(tsk);
-
-   if (usermsr & MSR_VEC)
-   save_altivec(tsk);
+   /*
+* Saving the way the register space is in hardware, save_vsx boils
+* down to a save_fpu() and save_altivec()
+*/
+   if (usermsr & MSR_VSX) {
+   save_vsx(tsk);
+   } else {
+   if (usermsr & MSR_FP)
+   save_fpu(tsk);
 
-   if (usermsr & MSR_VSX)
-   __giveup_vsx(tsk);
+   if (usermsr & MSR_VEC)
+   save_altivec(tsk);
+   }
 
if (usermsr & MSR_SPE)
__giveup_spe(tsk);
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 51b0c17..1c2e7a3 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -151,23 +151,6 @@ _GLOBAL(load_up_vsx)
std r12,_MSR(r1)
b   fast_exception_return
 
-/*
- * __giveup_vsx(tsk)
- * Disable VSX for the task given as the argument.
- * Does NOT save vsx registers.
- */
-_GLOBAL(__giveup_vsx)
-   addir3,r3,THREAD/* want THREAD of task */
-   ld  r5,PT_REGS(r3)
-   cmpdi   0,r5,0
-   beq 1f
-   ld  r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-   lis r3,MSR_VSX@h
-   andcr4,r4,r3/* disable VSX for previous task */
-   std r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
-   blr
-
 #endif /* CONFIG_VSX */
 
 
-- 
2.7.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org

[PATCH v3 8/9] powerpc: Add the ability to save Altivec without giving it up

2016-01-20 Thread Cyril Bur
This patch adds the ability to be able to save the VEC registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.

This patch builds on a previous optimisation for the FPU registers in the
thread copy path to avoid a possibly pointless reload of VEC state.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/switch_to.h |  3 ++-
 arch/powerpc/kernel/process.c| 12 +++-
 arch/powerpc/kernel/vector.S | 24 
 3 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 6a201e8..9028822 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -43,12 +43,13 @@ static inline void flush_fp_to_thread(struct task_struct 
*t) { }
 extern void enable_kernel_altivec(void);
 extern void flush_altivec_to_thread(struct task_struct *);
 extern void giveup_altivec(struct task_struct *);
-extern void __giveup_altivec(struct task_struct *);
+extern void save_altivec(struct task_struct *);
 static inline void disable_kernel_altivec(void)
 {
msr_check_and_clear(MSR_VEC);
 }
 #else
+static inline void save_altivec(struct task_struct *t) { }
 static inline void __giveup_altivec(struct task_struct *t) { }
 #endif
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a3411ce..fef5b7d 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -213,6 +213,16 @@ static int restore_fp(struct task_struct *tsk) { return 0; 
}
 #ifdef CONFIG_ALTIVEC
 #define loadvec(thr) ((thr).load_vec)
 
+static void __giveup_altivec(struct task_struct *tsk)
+{
+   save_altivec(tsk);
+   tsk->thread.regs->msr &= ~MSR_VEC;
+#ifdef CONFIG_VSX
+   if (cpu_has_feature(CPU_FTR_VSX))
+   tsk->thread.regs->msr &= ~MSR_VSX;
+#endif
+}
+
 void giveup_altivec(struct task_struct *tsk)
 {
check_if_tm_restore_required(tsk);
@@ -476,7 +486,7 @@ void save_all(struct task_struct *tsk)
save_fpu(tsk);
 
if (usermsr & MSR_VEC)
-   __giveup_altivec(tsk);
+   save_altivec(tsk);
 
if (usermsr & MSR_VSX)
__giveup_vsx(tsk);
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 038cff8..51b0c17 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -106,36 +106,20 @@ _GLOBAL(load_up_altivec)
blr
 
 /*
- * __giveup_altivec(tsk)
- * Disable VMX for the task given as the argument,
- * and save the vector registers in its thread_struct.
+ * save_altivec(tsk)
+ * Save the vector registers to its thread_struct
  */
-_GLOBAL(__giveup_altivec)
+_GLOBAL(save_altivec)
addir3,r3,THREAD/* want THREAD of task */
PPC_LL  r7,THREAD_VRSAVEAREA(r3)
PPC_LL  r5,PT_REGS(r3)
PPC_LCMPI   0,r7,0
bne 2f
addir7,r3,THREAD_VRSTATE
-2: PPC_LCMPI   0,r5,0
-   SAVE_32VRS(0,r4,r7)
+2: SAVE_32VRS(0,r4,r7)
mfvscr  v0
li  r4,VRSTATE_VSCR
stvxv0,r4,r7
-   beq 1f
-   PPC_LL  r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-#ifdef CONFIG_VSX
-BEGIN_FTR_SECTION
-   lis r3,(MSR_VEC|MSR_VSX)@h
-FTR_SECTION_ELSE
-   lis r3,MSR_VEC@h
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_VSX)
-#else
-   lis r3,MSR_VEC@h
-#endif
-   andcr4,r4,r3/* disable FP for previous task */
-   PPC_STL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
blr
 
 #ifdef CONFIG_VSX
-- 
2.7.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 7/9] powerpc: Add the ability to save FPU without giving it up

2016-01-20 Thread Cyril Bur
This patch adds the ability to be able to save the FPU registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.

This patch optimises the thread copy path (as a result of a fork() or
clone()) so that the parent thread can return to userspace with hot
registers avoiding a possibly pointless reload of FPU register state.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/switch_to.h |  3 ++-
 arch/powerpc/kernel/fpu.S| 21 -
 arch/powerpc/kernel/process.c| 12 +++-
 3 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 3690041..6a201e8 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -28,13 +28,14 @@ extern void giveup_all(struct task_struct *);
 extern void enable_kernel_fp(void);
 extern void flush_fp_to_thread(struct task_struct *);
 extern void giveup_fpu(struct task_struct *);
-extern void __giveup_fpu(struct task_struct *);
+extern void save_fpu(struct task_struct *);
 static inline void disable_kernel_fp(void)
 {
msr_check_and_clear(MSR_FP);
 }
 #else
 static inline void __giveup_fpu(struct task_struct *t) { }
+static inline void save_fpu(struct task_struct *t) { }
 static inline void flush_fp_to_thread(struct task_struct *t) { }
 #endif
 
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index b063524..15da2b5 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -143,33 +143,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
blr
 
 /*
- * __giveup_fpu(tsk)
- * Disable FP for the task given as the argument,
- * and save the floating-point registers in its thread_struct.
+ * save_fpu(tsk)
+ * Save the floating-point registers in its thread_struct.
  * Enables the FPU for use in the kernel on return.
  */
-_GLOBAL(__giveup_fpu)
+_GLOBAL(save_fpu)
addir3,r3,THREAD/* want THREAD of task */
PPC_LL  r6,THREAD_FPSAVEAREA(r3)
PPC_LL  r5,PT_REGS(r3)
PPC_LCMPI   0,r6,0
bne 2f
addir6,r3,THREAD_FPSTATE
-2: PPC_LCMPI   0,r5,0
-   SAVE_32FPVSRS(0, R4, R6)
+2: SAVE_32FPVSRS(0, R4, R6)
mffsfr0
stfdfr0,FPSTATE_FPSCR(r6)
-   beq 1f
-   PPC_LL  r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-   li  r3,MSR_FP|MSR_FE0|MSR_FE1
-#ifdef CONFIG_VSX
-BEGIN_FTR_SECTION
-   orisr3,r3,MSR_VSX@h
-END_FTR_SECTION_IFSET(CPU_FTR_VSX)
-#endif
-   andcr4,r4,r3/* disable FP for previous task */
-   PPC_STL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
blr
 
 /*
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 45e37c0..a3411ce 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -133,6 +133,16 @@ void __msr_check_and_clear(unsigned long bits)
 EXPORT_SYMBOL(__msr_check_and_clear);
 
 #ifdef CONFIG_PPC_FPU
+void __giveup_fpu(struct task_struct *tsk)
+{
+   save_fpu(tsk);
+   tsk->thread.regs->msr &= ~MSR_FP;
+#ifdef CONFIG_VSX
+   if (cpu_has_feature(CPU_FTR_VSX))
+   tsk->thread.regs->msr &= ~MSR_VSX;
+#endif
+}
+
 void giveup_fpu(struct task_struct *tsk)
 {
check_if_tm_restore_required(tsk);
@@ -463,7 +473,7 @@ void save_all(struct task_struct *tsk)
msr_check_and_set(msr_all_available);
 
if (usermsr & MSR_FP)
-   __giveup_fpu(tsk);
+   save_fpu(tsk);
 
if (usermsr & MSR_VEC)
__giveup_altivec(tsk);
-- 
2.7.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 6/9] powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two

2016-01-20 Thread Cyril Bur
This prepares for the decoupling of saving {fpu,altivec,vsx} registers and
marking {fpu,altivec,vsx} as being unused by a thread.

Currently giveup_{fpu,altivec,vsx}() does both however optimisations to
task switching can be made if these two operations are decoupled.
save_all() will permit the saving of registers to thread structs and leave
threads MSR with bits enabled.

This patch introduces no functional change.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/switch_to.h |  7 +++
 arch/powerpc/kernel/process.c| 39 +++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 5b268b6..3690041 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -34,6 +34,7 @@ static inline void disable_kernel_fp(void)
msr_check_and_clear(MSR_FP);
 }
 #else
+static inline void __giveup_fpu(struct task_struct *t) { }
 static inline void flush_fp_to_thread(struct task_struct *t) { }
 #endif
 
@@ -46,6 +47,8 @@ static inline void disable_kernel_altivec(void)
 {
msr_check_and_clear(MSR_VEC);
 }
+#else
+static inline void __giveup_altivec(struct task_struct *t) { }
 #endif
 
 #ifdef CONFIG_VSX
@@ -57,6 +60,8 @@ static inline void disable_kernel_vsx(void)
 {
msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
 }
+#else
+static inline void __giveup_vsx(struct task_struct *t) { }
 #endif
 
 #ifdef CONFIG_SPE
@@ -68,6 +73,8 @@ static inline void disable_kernel_spe(void)
 {
msr_check_and_clear(MSR_SPE);
 }
+#else
+static inline void __giveup_spe(struct task_struct *t) { }
 #endif
 
 static inline void clear_task_ebb(struct task_struct *t)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 0955b7c..45e37c0 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -349,6 +349,14 @@ void flush_spe_to_thread(struct task_struct *tsk)
preempt_enable();
}
 }
+#else
+/*
+ * save_all() is going to test MSR_SPE, rather than pull in all the
+ * booke definitions all the time on a books kernel just ensure it exists
+ * but acts as a nop.
+ */
+
+#define MSR_SPE 0
 #endif /* CONFIG_SPE */
 
 static unsigned long msr_all_available;
@@ -440,12 +448,41 @@ void restore_math(struct pt_regs *regs)
regs->msr = msr;
 }
 
+void save_all(struct task_struct *tsk)
+{
+   unsigned long usermsr;
+
+   if (!tsk->thread.regs)
+   return;
+
+   usermsr = tsk->thread.regs->msr;
+
+   if ((usermsr & msr_all_available) == 0)
+   return;
+
+   msr_check_and_set(msr_all_available);
+
+   if (usermsr & MSR_FP)
+   __giveup_fpu(tsk);
+
+   if (usermsr & MSR_VEC)
+   __giveup_altivec(tsk);
+
+   if (usermsr & MSR_VSX)
+   __giveup_vsx(tsk);
+
+   if (usermsr & MSR_SPE)
+   __giveup_spe(tsk);
+
+   msr_check_and_clear(msr_all_available);
+}
+
 void flush_all_to_thread(struct task_struct *tsk)
 {
if (tsk->thread.regs) {
preempt_disable();
BUG_ON(tsk != current);
-   giveup_all(tsk);
+   save_all(tsk);
 
 #ifdef CONFIG_SPE
if (tsk->thread.regs->msr & MSR_SPE)
-- 
2.7.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 3/9] selftests/powerpc: Test FPU and VMX regs in signal ucontext

2016-01-20 Thread Cyril Bur
Load up the non volatile FPU and VMX regs and ensure that they are the
expected value in a signal handler

Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/math/.gitignore   |   2 +
 tools/testing/selftests/powerpc/math/Makefile |   4 +-
 tools/testing/selftests/powerpc/math/fpu_signal.c | 135 +
 tools/testing/selftests/powerpc/math/vmx_signal.c | 138 ++
 4 files changed, 278 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c

diff --git a/tools/testing/selftests/powerpc/math/.gitignore 
b/tools/testing/selftests/powerpc/math/.gitignore
index 1a6f09e..4fe13a4 100644
--- a/tools/testing/selftests/powerpc/math/.gitignore
+++ b/tools/testing/selftests/powerpc/math/.gitignore
@@ -2,3 +2,5 @@ fpu_syscall
 vmx_syscall
 fpu_preempt
 vmx_preempt
+fpu_signal
+vmx_signal
diff --git a/tools/testing/selftests/powerpc/math/Makefile 
b/tools/testing/selftests/powerpc/math/Makefile
index b6f4158..5b88875 100644
--- a/tools/testing/selftests/powerpc/math/Makefile
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -1,4 +1,4 @@
-TEST_PROGS := fpu_syscall fpu_preempt vmx_syscall vmx_preempt
+TEST_PROGS := fpu_syscall fpu_preempt fpu_signal vmx_syscall vmx_preempt 
vmx_signal
 
 all: $(TEST_PROGS)
 
@@ -7,9 +7,11 @@ $(TEST_PROGS): CFLAGS += -O2 -g -pthread -m64 -maltivec
 
 fpu_syscall: fpu_asm.S
 fpu_preempt: fpu_asm.S
+fpu_signal:  fpu_asm.S
 
 vmx_syscall: vmx_asm.S
 vmx_preempt: vmx_asm.S
+vmx_signal: vmx_asm.S
 
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/math/fpu_signal.c 
b/tools/testing/selftests/powerpc/math/fpu_signal.c
new file mode 100644
index 000..888aa51
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_signal.c
@@ -0,0 +1,135 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the FPU registers are correctly reported in a
+ * signal context. Each worker just spins checking its FPU registers, at some
+ * point a signal will interrupt it and C code will check the signal context
+ * ensuring it is also the same.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+
+/* Number of times each thread should receive the signal */
+#define ITERATIONS 10
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
+1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
+2.1};
+
+bool bad_context;
+int threads_starting;
+int running;
+
+extern long preempt_fpu(double *darray, int *threads_starting, int *running);
+
+void signal_fpu_sig(int sig, siginfo_t *info, void *context)
+{
+   int i;
+   ucontext_t *uc = context;
+   mcontext_t *mc = >uc_mcontext;
+
+   /* Only the non volatiles were loaded up */
+   for (i = 14; i < 32; i++) {
+   if (mc->fp_regs[i] != darray[i - 14]) {
+   bad_context = true;
+   break;
+   }
+   }
+}
+
+void *signal_fpu_c(void *p)
+{
+   int i;
+   long rc;
+   struct sigaction act;
+   act.sa_sigaction = signal_fpu_sig;
+   act.sa_flags = SA_SIGINFO;
+   rc = sigaction(SIGUSR1, , NULL);
+   if (rc)
+   return p;
+
+   srand(pthread_self());
+   for (i = 0; i < 21; i++)
+   darray[i] = rand();
+
+   rc = preempt_fpu(darray, _starting, );
+
+   return (void *) rc;
+}
+
+int test_signal_fpu(void)
+{
+   int i, j, rc, threads;
+   void *rc_p;
+   pthread_t *tids;
+
+   threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+   tids = malloc(threads * sizeof(pthread_t));
+   FAIL_IF(!tids);
+
+   running = true;
+   threads_starting = threads;
+   for (i = 0; i < threads; i++) {
+   rc = pthread_create([i], NULL, signal_fpu_c, NULL);
+   FAIL_IF(rc);
+   }
+
+   setbuf(stdout, NULL);
+   printf("\tWaiting for all workers to start...");
+   while (threads_starting)
+   asm volatile("": : :"memory");
+   printf("done\n");
+
+   printf("\tSending signals to all threads %d times...", ITERATIONS);
+   for (i = 0; i < ITERATIONS; i++) {
+   for (j = 0; j < threads; j++) {
+   pthread_kill(tids[j], SIGUSR1);
+   }
+   sleep(1);
+   }
+   printf("done\n");
+
+   printf("\tStopping workers...");
+   running = 0;
+  

[PATCH v3 0/9] FP/VEC/VSX switching optimisations

2016-01-20 Thread Cyril Bur
Cover-letter for V1 of the series is at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-November/136350.html

Cover-letter for V2 of the series is at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-January/138054.html

Changes in V3:
Addressed review comments from Michael Neuling
 - Made commit message in 4/9 better reflect the patch
 - Removed overuse of #ifdef blocks and redundant condition in 5/9
 - Split 6/8 in two to better prepare for 7,8,9
 - Removed #ifdefs in 6/9


Cyril Bur (9):
  selftests/powerpc: Test the preservation of FPU and VMX regs across
syscall
  selftests/powerpc: Test preservation of FPU and VMX regs across
preemption
  selftests/powerpc: Test FPU and VMX regs in signal ucontext
  powerpc: Explicitly disable math features when copying thread
  powerpc: Restore FPU/VEC/VSX if previously used
  powerpc: Prepare for splitting giveup_{fpu,altivec,vsx} in two
  powerpc: Add the ability to save FPU without giving it up
  powerpc: Add the ability to save Altivec without giving it up
  powerpc: Add the ability to save VSX without giving it up

 arch/powerpc/include/asm/processor.h   |   2 +
 arch/powerpc/include/asm/switch_to.h   |  13 +-
 arch/powerpc/kernel/asm-offsets.c  |   2 +
 arch/powerpc/kernel/entry_64.S |  21 +-
 arch/powerpc/kernel/fpu.S  |  25 +--
 arch/powerpc/kernel/ppc_ksyms.c|   4 -
 arch/powerpc/kernel/process.c  | 172 ++--
 arch/powerpc/kernel/vector.S   |  45 +---
 tools/testing/selftests/powerpc/Makefile   |   3 +-
 tools/testing/selftests/powerpc/basic_asm.h|  26 +++
 tools/testing/selftests/powerpc/math/.gitignore|   6 +
 tools/testing/selftests/powerpc/math/Makefile  |  19 ++
 tools/testing/selftests/powerpc/math/fpu_asm.S | 195 ++
 tools/testing/selftests/powerpc/math/fpu_preempt.c | 113 ++
 tools/testing/selftests/powerpc/math/fpu_signal.c  | 135 
 tools/testing/selftests/powerpc/math/fpu_syscall.c |  90 
 tools/testing/selftests/powerpc/math/vmx_asm.S | 229 +
 tools/testing/selftests/powerpc/math/vmx_preempt.c | 113 ++
 tools/testing/selftests/powerpc/math/vmx_signal.c  | 138 +
 tools/testing/selftests/powerpc/math/vmx_syscall.c |  92 +
 20 files changed, 1360 insertions(+), 83 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/basic_asm.h
 create mode 100644 tools/testing/selftests/powerpc/math/.gitignore
 create mode 100644 tools/testing/selftests/powerpc/math/Makefile
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c

-- 
2.7.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Different SIGSEGV codes (x86 and ppc64le)

2016-01-20 Thread Linus Torvalds
On Tue, Jan 19, 2016 at 9:34 PM, Michael Ellerman  wrote:
>
> The kernel describes those error codes as:
>
>   #define SEGV_MAPERR   (__SI_FAULT|1)  /* address not mapped to object */
>   #define SEGV_ACCERR   (__SI_FAULT|2)  /* invalid permissions for mapped 
> object */
>
> Which one is correct in this case isn't entirely clear. There is a stack
> mapping, but you're not allowed to use it because of the stack ulimit, so
> arguably ACCERR is more accurate.
>
> However that's only true because of the stack guard page, which is supposed to
> be somewhat invisible to userspace. If I disable the stack guard page logic,
> userspace sees SEGV_MAPERR, so it seems that historically that's what is
> expected.

I think MAPERR is likely the right thing for a guard page access.

That said, I'd also warn people from caring too mucbh about the
details of si_code. We've not traditionally been very good at filling
it in. So any program that uses it for any actual semantic behavior is
likely just broken. Print it it in debuggers by all means, but relying
on it in any other way is just crazy. It's just not historically
reliable enough.

So I wouldn't worry about it excessively.

  Linus
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/4] kallsyms: add support for relative offsets in kallsyms address table

2016-01-20 Thread Kees Cook
On Wed, Jan 20, 2016 at 1:05 AM, Ard Biesheuvel
 wrote:
> Similar to how relative extables are implemented, it is possible to emit
> the kallsyms table in such a way that it contains offsets relative to some
> anchor point in the kernel image rather than absolute addresses. The benefit
> is that such table entries are no longer subject to dynamic relocation when
> the build time and runtime offsets of the kernel image are different. Also,
> on 64-bit architectures, it essentially cuts the size of the address table
> in half since offsets can typically be expressed in 32 bits.
>
> Since it is useful for some architectures (like x86) to retain the ability
> to emit absolute values as well, this patch adds support for both, by
> emitting absolute addresses as positive 32-bit values, and addresses
> relative to _text as negative values, which are subtracted from the runtime
> address of _text to produce the actual address. Positive values are used as
> they are found in the table.
>
> Support for the above is enabled by setting CONFIG_KALLSYMS_TEXT_RELATIVE.
>
> Signed-off-by: Ard Biesheuvel 

Reviewed-by: Kees Cook 

A nice space-saver! :)

-Kees

> ---
>  init/Kconfig| 14 
>  kernel/kallsyms.c   | 35 +-
>  scripts/kallsyms.c  | 38 +---
>  scripts/link-vmlinux.sh |  4 +++
>  scripts/namespace.pl|  1 +
>  5 files changed, 79 insertions(+), 13 deletions(-)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 5b86082fa238..73e00b040572 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1427,6 +1427,20 @@ config KALLSYMS_ALL
>
>Say N unless you really need all symbols.
>
> +config KALLSYMS_TEXT_RELATIVE
> +   bool
> +   help
> + Instead of emitting them as absolute values in the native word size,
> + emit the symbol references in the kallsyms table as 32-bit entries,
> + each containing either an absolute value in the range [0, S32_MAX] 
> or
> + a text relative value in the range [_text, _text + S32_MAX], encoded
> + as negative values.
> +
> + On 64-bit builds, this reduces the size of the address table by 50%,
> + but more importantly, it results in entries whose values are build
> + time constants, and no relocation pass is required at runtime to fix
> + up the entries based on the runtime load address of the kernel.
> +
>  config PRINTK
> default y
> bool "Enable support for printk" if EXPERT
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 5c5987f10819..e612f7f9e71b 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -38,6 +38,7 @@
>   * during the second link stage.
>   */
>  extern const unsigned long kallsyms_addresses[] __weak;
> +extern const int kallsyms_offsets[] __weak;
>  extern const u8 kallsyms_names[] __weak;
>
>  /*
> @@ -176,6 +177,19 @@ static unsigned int get_symbol_offset(unsigned long pos)
> return name - kallsyms_names;
>  }
>
> +static unsigned long kallsyms_sym_address(int idx)
> +{
> +   if (!IS_ENABLED(CONFIG_KALLSYMS_TEXT_RELATIVE))
> +   return kallsyms_addresses[idx];
> +
> +   /* positive offsets are absolute values */
> +   if (kallsyms_offsets[idx] >= 0)
> +   return kallsyms_offsets[idx];
> +
> +   /* negative offsets are relative to _text - 1 */
> +   return (unsigned long)_text - 1 - kallsyms_offsets[idx];
> +}
> +
>  /* Lookup the address for this symbol. Returns 0 if not found. */
>  unsigned long kallsyms_lookup_name(const char *name)
>  {
> @@ -187,7 +201,7 @@ unsigned long kallsyms_lookup_name(const char *name)
> off = kallsyms_expand_symbol(off, namebuf, 
> ARRAY_SIZE(namebuf));
>
> if (strcmp(namebuf, name) == 0)
> -   return kallsyms_addresses[i];
> +   return kallsyms_sym_address(i);
> }
> return module_kallsyms_lookup_name(name);
>  }
> @@ -204,7 +218,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char 
> *, struct module *,
>
> for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
> off = kallsyms_expand_symbol(off, namebuf, 
> ARRAY_SIZE(namebuf));
> -   ret = fn(data, namebuf, NULL, kallsyms_addresses[i]);
> +   ret = fn(data, namebuf, NULL, kallsyms_sym_address(i));
> if (ret != 0)
> return ret;
> }
> @@ -220,7 +234,10 @@ static unsigned long get_symbol_pos(unsigned long addr,
> unsigned long i, low, high, mid;
>
> /* This kernel should never had been booted. */
> -   BUG_ON(!kallsyms_addresses);
> +   if (!IS_ENABLED(CONFIG_KALLSYMS_TEXT_RELATIVE))
> +   BUG_ON(!kallsyms_addresses);
> +   else
> +   BUG_ON(!kallsyms_offsets);
>
> /* Do a binary search on the sorted kallsyms_addresses array. 

Re: [PATCH 4/4] x86_64: enable text relative kallsyms for 64-bit targets

2016-01-20 Thread Kees Cook
On Wed, Jan 20, 2016 at 1:05 AM, Ard Biesheuvel
 wrote:
> This enables the newly introduced text-relative kallsyms support when
> building 64-bit targets. This cuts the size of the kallsyms address
> table in half, reducing the memory footprint of the kernel .rodata
> section by about 400 KB for a KALLSYMS_ALL build, and about 100 KB
> reduction in compressed size. (with CONFIG_RELOCATABLE=y)
>
> Signed-off-by: Ard Biesheuvel 

Tested-by: Kees Cook 

-Kees

> ---
> I tested this with my Ubuntu Wily box's config-4.2.0-23-generic, and
> got the following results:
>
> BEFORE:
> ===
> $ size vmlinux
>textdata bss dec hex filename
> 129729492213240 1482752 16668941 fe590d vmlinux
>
> $ readelf -S .tmp_kallsyms2.o |less
> There are 9 section headers, starting at offset 0x3e0788:
>
> Section Headers:
>   [Nr] Name  Type Address   Offset
>Size  EntSize  Flags  Link  Info  Align
>   ...
>   [ 4] .rodata   PROGBITS   0040
>001c7738     A   0 0 8
>   [ 5] .rela.rodata  RELA   001c7950
>00218e38  0018   I   7 4 8
>   [ 6] .shstrtab STRTAB     001c7778
>0039     0 0 1
>
> $ ls -l arch/x86/boot/bzImage
> -rw-rw-r-- 1 ard ard 6893168 Jan 20 09:36 arch/x86/boot/bzImage
>
> AFTER:
> ==
> $ size vmlinux
>textdata bss dec hex filename
> 126045012213240 1482752 16300493 f8b9cd vmlinux
>
> $ readelf -S .tmp_kallsyms2.o |less
> There are 8 section headers, starting at offset 0x16dd10:
>
> Section Headers:
>   [Nr] Name  Type Address   Offset
>Size  EntSize  Flags  Link  Info  Align
>   ...
>   [ 4] .rodata   PROGBITS   0040
>0016db20     A   0 0 8
>   [ 5] .shstrtab STRTAB     0016db60
>0034     0 0 1
>   ...
>
> $ ls -l arch/x86/boot/bzImage
> -rw-rw-r-- 1 ard ard 6790224 Jan 19 22:24 arch/x86/boot/bzImage
> ---
>  arch/x86/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 4a10ba9e95da..180a94bda8d4 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -142,6 +142,7 @@ config X86
> select HAVE_UNSTABLE_SCHED_CLOCK
> select HAVE_USER_RETURN_NOTIFIER
> select IRQ_FORCED_THREADING
> +   select KALLSYMS_TEXT_RELATIVE   if X86_64
> select MODULES_USE_ELF_RELA if X86_64
> select MODULES_USE_ELF_REL  if X86_32
> select OLD_SIGACTIONif X86_32
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/eeh: Fix PE location code

2016-01-20 Thread Sam Mendoza-Jonas
On Wed, Jan 20, 2016 at 02:56:13PM +1100, Russell Currey wrote:
> On Wed, 2015-12-02 at 16:25 +1100, Gavin Shan wrote:
> > In eeh_pe_loc_get(), the PE location code is retrieved from the
> > "ibm,loc-code" property of the device node for the bridge of the
> > PE's primary bus. It's not correct because the property indicates
> > the parent PE's location code.
> > 
> > This reads the correct PE location code from "ibm,io-base-loc-code"
> > or "ibm,slot-location-code" property of PE parent bus's device node.
> > 
> > Signed-off-by: Gavin Shan 
> > ---
> 
> Tested-by: Russell Currey 

Thanks Russell!

W.R.T including this in stable, I don't believe anything actively breaks
without the patch, but in the event of an EEH freeze the wrong slot for
the device will be identified, making troubleshooting more difficult.
> >  arch/powerpc/kernel/eeh_pe.c | 33 +++--
> >  1 file changed, 15 insertions(+), 18 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
> > index 8654cb1..ca9e537 100644
> > --- a/arch/powerpc/kernel/eeh_pe.c
> > +++ b/arch/powerpc/kernel/eeh_pe.c
> > @@ -883,32 +883,29 @@ void eeh_pe_restore_bars(struct eeh_pe *pe)
> >  const char *eeh_pe_loc_get(struct eeh_pe *pe)
> >  {
> >     struct pci_bus *bus = eeh_pe_bus_get(pe);
> > -   struct device_node *dn = pci_bus_to_OF_node(bus);
> > +   struct device_node *dn;
> >     const char *loc = NULL;
> >  
> > -   if (!dn)
> > -   goto out;
> > +   while (bus) {
> > +   dn = pci_bus_to_OF_node(bus);
> > +   if (!dn) {
> > +   bus = bus->parent;
> > +   continue;
> > +   }
> >  
> > -   /* PHB PE or root PE ? */
> > -   if (pci_is_root_bus(bus)) {
> > -   loc = of_get_property(dn, "ibm,loc-code", NULL);
> > -   if (!loc)
> > +   if (pci_is_root_bus(bus))
> >     loc = of_get_property(dn, "ibm,io-base-loc-
> > code", NULL);
> > +   else
> > +   loc = of_get_property(dn, "ibm,slot-location-
> > code",
> > +     NULL);
> > +
> >     if (loc)
> > -   goto out;
> > +   return loc;
> >  
> > -   /* Check the root port */
> > -   dn = dn->child;
> > -   if (!dn)
> > -   goto out;
> > +   bus = bus->parent;
> >     }
> >  
> > -   loc = of_get_property(dn, "ibm,loc-code", NULL);
> > -   if (!loc)
> > -   loc = of_get_property(dn, "ibm,slot-location-code",
> > NULL);
> > -
> > -out:
> > -   return loc ? loc : "N/A";
> > +   return "N/A";
> >  }
> >  
> >  /**
> 
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: Enable VMX copy on PPC970 (G5)

2016-01-20 Thread Michael Ellerman
There's no reason I'm aware of that the VMX copy loop shouldn't work on
PPC970. And in fact it seems to boot and generally be happy.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/cputable.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index b118072670fb..f412666fafe2 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -411,7 +411,7 @@ enum {
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_201 | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_CAN_NAP | CPU_FTR_MMCRA | \
CPU_FTR_CP_USE_DCBTZ | CPU_FTR_STCX_CHECKS_ADDRESS | \
-   CPU_FTR_HVMODE | CPU_FTR_DABRX)
+   CPU_FTR_HVMODE | CPU_FTR_DABRX | CPU_FTR_VMX_COPY)
 #define CPU_FTRS_POWER5(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_MMCRA | CPU_FTR_SMT | \
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/4] powerpc: enable text relative kallsyms for ppc64

2016-01-20 Thread Ard Biesheuvel
This enables the newly introduced text-relative kallsyms support when
building 64-bit targets. This cuts the size of the kallsyms address
table in half, and drastically reduces the size of the PIE dynamic
relocation section when building with CONFIG_RELOCATABLE=y (by about
3 MB for ppc64_defconfig)

Signed-off-by: Ard Biesheuvel 
---

Results for ppc64_defconfig:

BEFORE:
===
$ size vmlinux
   textdata bss dec hex filename
198279962008456  849612 2268606415a2970 vmlinux

$ readelf -S .tmp_kallsyms2.o
There are 9 section headers, starting at offset 0x4513f8:

Section Headers:
  [Nr] Name  Type Address   Offset
   Size  EntSize  Flags  Link  Info  Align
  ...
  [ 4] .rodata   PROGBITS   0100
   001fcf00     A   0 0 256
  [ 5] .rela.rodata  RELA   001fd1d8
   00254220  0018   I   7 4 8
  [ 6] .shstrtab STRTAB     001fd000
   0039     0 0 1
  ...

$ ls -l arch/powerpc/boot/zImage
-rwxrwxr-x 2 ard ard 7533160 Jan 20 08:43 arch/powerpc/boot/zImage

AFTER:
==
$ size vmlinux
   textdata bss dec hex filename
169795162009992  849612 1983912012eb890 vmlinux

$ readelf -S .tmp_kallsyms2.o
There are 8 section headers, starting at offset 0x199bb0:

Section Headers:
  [Nr] Name  Type Address   Offset
   Size  EntSize  Flags  Link  Info  Align
  ...
  [ 4] .rodata   PROGBITS   0100
   00199900     A   0 0 256
  [ 5] .shstrtab STRTAB     00199a00
   0034     0 0 1
  ...

$ ls -l arch/powerpc/boot/zImage
-rwxrwxr-x 2 ard ard 6985672 Jan 20 08:45 arch/powerpc/boot/zImage
---
 arch/powerpc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 94f6c5089e0c..d1c26749632b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -158,6 +158,7 @@ config PPC
select ARCH_HAS_DMA_SET_COHERENT_MASK
select ARCH_HAS_DEVMEM_IS_ALLOWED
select HAVE_ARCH_SECCOMP_FILTER
+   select KALLSYMS_TEXT_RELATIVE if PPC64
 
 config GENERIC_CSUM
def_bool CPU_LITTLE_ENDIAN
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/4] support for text-relative kallsyms table

2016-01-20 Thread Ard Biesheuvel
This implements text-relative kallsyms address tables. This was developed
as part of my series to implement KASLR/CONFIG_RELOCATABLE for arm64, but
I think it may be beneficial to other architectures as well, so I am
presenting it as a separate series.

The idea is that on 64-bit builds, it is rather wasteful to use absolute
addressing for kernel symbols since they are all within a couple of MBs
of each other. On top of that, the absolute addressing implies that, when
the kernel is relocated at runtime, each address in the table needs to be
fixed up individually.

Since all section-relative addresses are already emitted relative to _text,
it is quite straight-forward to record only the offset, and add the absolute
address of _text at runtime when referring to the address table.

The reduction ranges from around 250 KB uncompressed vmlinux size and 10 KB
compressed size (s390) to 3 MB/500 KB for ppc64 (although, in the latter case,
the reduction in uncompressed size is primarily __init data)

Kees Cook was so kind to test these against x86_64, and confirmed that KASLR
still operates as expected.

Ard Biesheuvel (4):
  kallsyms: add support for relative offsets in kallsyms address table
  powerpc: enable text relative kallsyms for ppc64
  s390: enable text relative kallsyms for 64-bit targets
  x86_64: enable text relative kallsyms for 64-bit targets

 arch/powerpc/Kconfig|  1 +
 arch/s390/Kconfig   |  1 +
 arch/x86/Kconfig|  1 +
 init/Kconfig| 14 
 kernel/kallsyms.c   | 35 +-
 scripts/kallsyms.c  | 38 +---
 scripts/link-vmlinux.sh |  4 +++
 scripts/namespace.pl|  1 +
 8 files changed, 82 insertions(+), 13 deletions(-)

-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/4] s390: enable text relative kallsyms for 64-bit targets

2016-01-20 Thread Ard Biesheuvel
This enables the newly introduced text-relative kallsyms support when
building 64-bit targets. This cuts the size of the kallsyms address
table in half, reducing the memory footprint of the kernel .rodata
section by about 250 KB for a defconfig build.

Signed-off-by: Ard Biesheuvel 
---

BEFORE:
===
$ size vmlinux
   textdata bss dec hex filename
123295863107008 14727792301643861cc45a2 vmlinux

$ readelf -S .tmp_kallsyms2.o
There are 9 section headers, starting at offset 0x125b50:

Section Headers:
  [Nr] Name  Type Address   Offset
   Size  EntSize  Flags  Link  Info  Align
  ...
  [ 4] .rodata   PROGBITS   0040
   00125ad0     A   0 0 8
  [ 5] .rela.rodata  RELA   00125f28
   0015ead8  0018   7 4 8
  [ 6] .shstrtab STRTAB     00125b10
   0039     0 0 1
  ...

$ ls -l arch/s390/boot/bzImage
-rwxrwxr-x 1 ard ard 5234224 Jan 20 08:22 arch/s390/boot/bzImage

AFTER:
==
$ size vmlinux
   textdata bss dec hex filename
120881143102912 14727792299188181c88662 vmlinux

$ readelf -S .tmp_kallsyms2.o
There are 8 section headers, starting at offset 0xeb428:

Section Headers:
  [Nr] Name  Type Address   Offset
  ...
  [ 4] .rodata   PROGBITS   0040
   000eb3b0     A   0 0 8
  [ 5] .shstrtab STRTAB     000eb3f0
   0034     0 0 1
  ...

$ ls -l arch/s390/boot/bzImage
-rwxrwxr-x 1 ard ard 5224256 Jan 20 08:23 arch/s390/boot/bzImage
---
 arch/s390/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index dbeeb3a049f2..588160fd1db0 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -149,6 +149,7 @@ config S390
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
+   select KALLSYMS_TEXT_RELATIVE if 64BIT
select MODULES_USE_ELF_RELA
select NO_BOOTMEM
select OLD_SIGACTION
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/4] x86_64: enable text relative kallsyms for 64-bit targets

2016-01-20 Thread Ard Biesheuvel
This enables the newly introduced text-relative kallsyms support when
building 64-bit targets. This cuts the size of the kallsyms address
table in half, reducing the memory footprint of the kernel .rodata
section by about 400 KB for a KALLSYMS_ALL build, and about 100 KB
reduction in compressed size. (with CONFIG_RELOCATABLE=y)

Signed-off-by: Ard Biesheuvel 
---
I tested this with my Ubuntu Wily box's config-4.2.0-23-generic, and
got the following results:

BEFORE:
===
$ size vmlinux
   textdata bss dec hex filename
129729492213240 1482752 16668941 fe590d vmlinux

$ readelf -S .tmp_kallsyms2.o |less
There are 9 section headers, starting at offset 0x3e0788:

Section Headers:
  [Nr] Name  Type Address   Offset
   Size  EntSize  Flags  Link  Info  Align
  ...
  [ 4] .rodata   PROGBITS   0040
   001c7738     A   0 0 8
  [ 5] .rela.rodata  RELA   001c7950
   00218e38  0018   I   7 4 8
  [ 6] .shstrtab STRTAB     001c7778
   0039     0 0 1

$ ls -l arch/x86/boot/bzImage
-rw-rw-r-- 1 ard ard 6893168 Jan 20 09:36 arch/x86/boot/bzImage

AFTER:
==
$ size vmlinux
   textdata bss dec hex filename
126045012213240 1482752 16300493 f8b9cd vmlinux

$ readelf -S .tmp_kallsyms2.o |less
There are 8 section headers, starting at offset 0x16dd10:

Section Headers:
  [Nr] Name  Type Address   Offset
   Size  EntSize  Flags  Link  Info  Align
  ...
  [ 4] .rodata   PROGBITS   0040
   0016db20     A   0 0 8
  [ 5] .shstrtab STRTAB     0016db60
   0034     0 0 1
  ...

$ ls -l arch/x86/boot/bzImage
-rw-rw-r-- 1 ard ard 6790224 Jan 19 22:24 arch/x86/boot/bzImage
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4a10ba9e95da..180a94bda8d4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -142,6 +142,7 @@ config X86
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_USER_RETURN_NOTIFIER
select IRQ_FORCED_THREADING
+   select KALLSYMS_TEXT_RELATIVE   if X86_64
select MODULES_USE_ELF_RELA if X86_64
select MODULES_USE_ELF_REL  if X86_32
select OLD_SIGACTIONif X86_32
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/4] kallsyms: add support for relative offsets in kallsyms address table

2016-01-20 Thread Ard Biesheuvel
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to some
anchor point in the kernel image rather than absolute addresses. The benefit
is that such table entries are no longer subject to dynamic relocation when
the build time and runtime offsets of the kernel image are different. Also,
on 64-bit architectures, it essentially cuts the size of the address table
in half since offsets can typically be expressed in 32 bits.

Since it is useful for some architectures (like x86) to retain the ability
to emit absolute values as well, this patch adds support for both, by
emitting absolute addresses as positive 32-bit values, and addresses
relative to _text as negative values, which are subtracted from the runtime
address of _text to produce the actual address. Positive values are used as
they are found in the table.

Support for the above is enabled by setting CONFIG_KALLSYMS_TEXT_RELATIVE.

Signed-off-by: Ard Biesheuvel 
---
 init/Kconfig| 14 
 kernel/kallsyms.c   | 35 +-
 scripts/kallsyms.c  | 38 +---
 scripts/link-vmlinux.sh |  4 +++
 scripts/namespace.pl|  1 +
 5 files changed, 79 insertions(+), 13 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 5b86082fa238..73e00b040572 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1427,6 +1427,20 @@ config KALLSYMS_ALL
 
   Say N unless you really need all symbols.
 
+config KALLSYMS_TEXT_RELATIVE
+   bool
+   help
+ Instead of emitting them as absolute values in the native word size,
+ emit the symbol references in the kallsyms table as 32-bit entries,
+ each containing either an absolute value in the range [0, S32_MAX] or
+ a text relative value in the range [_text, _text + S32_MAX], encoded
+ as negative values.
+
+ On 64-bit builds, this reduces the size of the address table by 50%,
+ but more importantly, it results in entries whose values are build
+ time constants, and no relocation pass is required at runtime to fix
+ up the entries based on the runtime load address of the kernel.
+
 config PRINTK
default y
bool "Enable support for printk" if EXPERT
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 5c5987f10819..e612f7f9e71b 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -38,6 +38,7 @@
  * during the second link stage.
  */
 extern const unsigned long kallsyms_addresses[] __weak;
+extern const int kallsyms_offsets[] __weak;
 extern const u8 kallsyms_names[] __weak;
 
 /*
@@ -176,6 +177,19 @@ static unsigned int get_symbol_offset(unsigned long pos)
return name - kallsyms_names;
 }
 
+static unsigned long kallsyms_sym_address(int idx)
+{
+   if (!IS_ENABLED(CONFIG_KALLSYMS_TEXT_RELATIVE))
+   return kallsyms_addresses[idx];
+
+   /* positive offsets are absolute values */
+   if (kallsyms_offsets[idx] >= 0)
+   return kallsyms_offsets[idx];
+
+   /* negative offsets are relative to _text - 1 */
+   return (unsigned long)_text - 1 - kallsyms_offsets[idx];
+}
+
 /* Lookup the address for this symbol. Returns 0 if not found. */
 unsigned long kallsyms_lookup_name(const char *name)
 {
@@ -187,7 +201,7 @@ unsigned long kallsyms_lookup_name(const char *name)
off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
 
if (strcmp(namebuf, name) == 0)
-   return kallsyms_addresses[i];
+   return kallsyms_sym_address(i);
}
return module_kallsyms_lookup_name(name);
 }
@@ -204,7 +218,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, 
struct module *,
 
for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
-   ret = fn(data, namebuf, NULL, kallsyms_addresses[i]);
+   ret = fn(data, namebuf, NULL, kallsyms_sym_address(i));
if (ret != 0)
return ret;
}
@@ -220,7 +234,10 @@ static unsigned long get_symbol_pos(unsigned long addr,
unsigned long i, low, high, mid;
 
/* This kernel should never had been booted. */
-   BUG_ON(!kallsyms_addresses);
+   if (!IS_ENABLED(CONFIG_KALLSYMS_TEXT_RELATIVE))
+   BUG_ON(!kallsyms_addresses);
+   else
+   BUG_ON(!kallsyms_offsets);
 
/* Do a binary search on the sorted kallsyms_addresses array. */
low = 0;
@@ -228,7 +245,7 @@ static unsigned long get_symbol_pos(unsigned long addr,
 
while (high - low > 1) {
mid = low + (high - low) / 2;
-   if (kallsyms_addresses[mid] <= addr)
+   if (kallsyms_sym_address(mid) <= addr)
low = mid;
else
high = mid;

Re: [PATCH v5 0/9] ftrace with regs + live patching for ppc64 LE (ABI v2)

2016-01-20 Thread Torsten Duwe
On Wed, Jan 20, 2016 at 05:03:23PM +1100, Michael Ellerman wrote:
> On Wed, 2016-01-06 at 15:17 +0100, Petr Mladek wrote:
> > On Fri 2015-12-04 15:45:29, Torsten Duwe wrote:
> > > Changes since v4:
> > >   * change comment style in entry_64.S to C89
> > > (nobody is using assembler syntax comments there).
> > >   * the bool function restore_r2 shouldn't return 2,
> > > that's a little confusing.
> > >   * Test whether the compiler supports -mprofile-kernel
> > > and only then define CC_USING_MPROFILE_KERNEL
> > >   * also make the return value of klp_check_compiler_support
> > > depend on that.
> >
> > Note that there is still needed the extra patch from
> > http://thread.gmane.org/gmane.linux.kernel/2093867/focus=2099603
> > to get the livepatching working.
> 
> Sorry which extra patch?
Message-ID: <20151203160004.ge8...@pathway.suse.cz>
By Petr Mladek, "Re: [PATCH v4 0/9] ftrace with regs + live patching..."
2015-12-03. It is further up in the function call hierarchy and basically
tells the arch-independent KLP to call the normal entry point on ppc64le, and
that the _mcount call site is 16 bytes further.


> > Both ftrace with regs and live patching works for me with this patch
> > set and the extra patch. So. for the whole patchset:
> >
> > Tested-by: Petr Mladek 
> 
> Can you give me some more info on how you're testing it? What config options,
> toolchain etc.?
> 
> For me the series doesn't even boot, even with livepatching disabled.

May indeed be a toolchain issue. I had to fix gcc-4.8.5 to get "notrace" working
for -mprofile-kernel. That's a gcc bug.

What are you using?

The config in the v5 patch series should be waterproof, especially with KLP 
disabled
ftrace with regs must work (all self-tests succeeded). If you send me your 
config
(via PM I suggest, spare the lists) I can verify it with the toolchain here.
Petr made a suggestion to reshuffle the config options to have it cleaner;
I suggest to patch that separately.

Torsten

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported

2016-01-20 Thread Yongji Xie

On 2016/1/16 1:24, David Laight wrote:

From: Yongji Xie

Sent: 15 January 2016 07:06

MSI-X tables are not allowed to be mmapped in vfio-pci
driver in case that user get to touch this directly.
This will cause some performance issues when when PCI
adapters have critical registers in the same page as
the MSI-X table.

...
If the driver wants to generate an incorrect MSI-X interrupt
it can do so by requesting the device do a normal memory transfer
to the target address area that raises MSI-X interrupts.


IOMMUs supporting interrupt remapping can prevent this case.


So disabling writes to the MSI-X table (and pending bit array)
areas only raises the bar very slightly.
A device may also give the driver write access to the MSI-X
table through other addresses.

This seems to make disallowing the mapping of the MSI-X table
rather pointless.


If we allow the mapping of the MSI-X table, it seems the guest
kernels of some architectures can write invalid data to MSI-X table
when device drivers initialize MSI-X interrupts.

Regards,
Yongji Xie


I've also dumped out the MSI-X table (during development) to
check that the values are being written there correctly.

David



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] s390: enable text relative kallsyms for 64-bit targets

2016-01-20 Thread Heiko Carstens
On Wed, Jan 20, 2016 at 10:05:37AM +0100, Ard Biesheuvel wrote:
> This enables the newly introduced text-relative kallsyms support when
> building 64-bit targets. This cuts the size of the kallsyms address
> table in half, reducing the memory footprint of the kernel .rodata
> section by about 250 KB for a defconfig build.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
> 
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index dbeeb3a049f2..588160fd1db0 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -149,6 +149,7 @@ config S390
>   select HAVE_REGS_AND_STACK_ACCESS_API
>   select HAVE_SYSCALL_TRACEPOINTS
>   select HAVE_VIRT_CPU_ACCOUNTING
> + select KALLSYMS_TEXT_RELATIVE if 64BIT

Please remove the "if 64BIT" since s390 is always 64BIT in the meantime.
Tested on s390 and everything seems still to work ;)

Acked-by: Heiko Carstens 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] s390: enable text relative kallsyms for 64-bit targets

2016-01-20 Thread Ard Biesheuvel
On 20 January 2016 at 10:43, Heiko Carstens  wrote:
> On Wed, Jan 20, 2016 at 10:05:37AM +0100, Ard Biesheuvel wrote:
>> This enables the newly introduced text-relative kallsyms support when
>> building 64-bit targets. This cuts the size of the kallsyms address
>> table in half, reducing the memory footprint of the kernel .rodata
>> section by about 250 KB for a defconfig build.
>>
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>
>> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
>> index dbeeb3a049f2..588160fd1db0 100644
>> --- a/arch/s390/Kconfig
>> +++ b/arch/s390/Kconfig
>> @@ -149,6 +149,7 @@ config S390
>>   select HAVE_REGS_AND_STACK_ACCESS_API
>>   select HAVE_SYSCALL_TRACEPOINTS
>>   select HAVE_VIRT_CPU_ACCOUNTING
>> + select KALLSYMS_TEXT_RELATIVE if 64BIT
>
> Please remove the "if 64BIT" since s390 is always 64BIT in the meantime.
> Tested on s390 and everything seems still to work ;)
>
> Acked-by: Heiko Carstens 
>

Thanks! Did you take a look at /proc/kallsyms, by any chance? It
should look identical with and without these patches
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] s390: enable text relative kallsyms for 64-bit targets

2016-01-20 Thread Ard Biesheuvel
On 20 January 2016 at 11:17, Heiko Carstens  wrote:
> On Wed, Jan 20, 2016 at 11:04:24AM +0100, Ard Biesheuvel wrote:
>> On 20 January 2016 at 10:43, Heiko Carstens  
>> wrote:
>> > On Wed, Jan 20, 2016 at 10:05:37AM +0100, Ard Biesheuvel wrote:
>> >> This enables the newly introduced text-relative kallsyms support when
>> >> building 64-bit targets. This cuts the size of the kallsyms address
>> >> table in half, reducing the memory footprint of the kernel .rodata
>> >> section by about 250 KB for a defconfig build.
>> >>
>> >> Signed-off-by: Ard Biesheuvel 
>> >> ---
>> >>
>> >> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
>> >> index dbeeb3a049f2..588160fd1db0 100644
>> >> --- a/arch/s390/Kconfig
>> >> +++ b/arch/s390/Kconfig
>> >> @@ -149,6 +149,7 @@ config S390
>> >>   select HAVE_REGS_AND_STACK_ACCESS_API
>> >>   select HAVE_SYSCALL_TRACEPOINTS
>> >>   select HAVE_VIRT_CPU_ACCOUNTING
>> >> + select KALLSYMS_TEXT_RELATIVE if 64BIT
>> >
>> > Please remove the "if 64BIT" since s390 is always 64BIT in the meantime.
>> > Tested on s390 and everything seems still to work ;)
>> >
>> > Acked-by: Heiko Carstens 
>> >
>>
>> Thanks! Did you take a look at /proc/kallsyms, by any chance? It
>> should look identical with and without these patches
>
> Close to identical, since the generated code and offsets change a bit with
> your new config option enabled and disabled. But only those parts that are
> linked behind kernel/kallsyms.c.
>
> However I did run a couple of ftrace, kprobes tests and enforced call
> backtraces. Everything still works.
>
> So it looks all good.
>

Thanks a lot!
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V10 2/4] perf/powerpc: add support for sampling intr machine state

2016-01-20 Thread Michael Ellerman
On Mon, 2016-01-11 at 15:58 +0530, Anju T wrote:
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 9a7057e..c4ce60d 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -119,6 +119,7 @@ config PPC
>   select GENERIC_ATOMIC64 if PPC32
>   select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>   select HAVE_PERF_EVENTS
> + select HAVE_PERF_REGS
>   select HAVE_REGS_AND_STACK_ACCESS_API
>   select HAVE_HW_BREAKPOINT if PERF_EVENTS && PPC_BOOK3S_64
>   select ARCH_WANT_IPC_PARSE_VERSION
> diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c
> new file mode 100644
> index 000..d32581763
> --- /dev/null
> +++ b/arch/powerpc/perf/perf_regs.c
...
> +
> +u64 perf_reg_abi(struct task_struct *task)
> +{
> + return PERF_SAMPLE_REGS_ABI_64;

What is this value used for exactly?

It seems like on 32-bit kernels we should be returning PERF_SAMPLE_REGS_ABI_32.

> +}
> +
> +void perf_get_regs_user(struct perf_regs *regs_user,
> + struct pt_regs *regs,
> + struct pt_regs *regs_user_copy)
> +{
> + regs_user->regs = task_pt_regs(current);
> + regs_user->abi  = perf_reg_abi(current);
> +}

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 0/9] ftrace with regs + live patching for ppc64 LE (ABI v2)

2016-01-20 Thread Petr Mladek
On Wed 2016-01-20 17:03:23, Michael Ellerman wrote:
> On Wed, 2016-01-06 at 15:17 +0100, Petr Mladek wrote:
> > On Fri 2015-12-04 15:45:29, Torsten Duwe wrote:
> > > Changes since v4:
> > >   * change comment style in entry_64.S to C89
> > > (nobody is using assembler syntax comments there).
> > >   * the bool function restore_r2 shouldn't return 2,
> > > that's a little confusing.
> > >   * Test whether the compiler supports -mprofile-kernel
> > > and only then define CC_USING_MPROFILE_KERNEL
> > >   * also make the return value of klp_check_compiler_support
> > > depend on that.
> >
> > Note that there is still needed the extra patch from
> > http://thread.gmane.org/gmane.linux.kernel/2093867/focus=2099603
> > to get the livepatching working.
> 
> Sorry which extra patch?

It was in an older reply and can be found at
http://thread.gmane.org/gmane.linux.kernel/2093867/focus=2099603


> > Both ftrace with regs and live patching works for me with this patch
> > set and the extra patch. So. for the whole patchset:
> >
> > Tested-by: Petr Mladek 
> 
> Can you give me some more info on how you're testing it? What config options,
> toolchain etc.?

You need to fulfill all dependencies for CONFIG_LIVEPATCH, see
kernel/livepatch/Kconfig. Please, find attached the config that
that I used.

I did the testing on PPC64LE with a kernel based on 4.4.0-rc8
using the attached config. I used the following stuff:

$> gcc --version
gcc (SUSE Linux) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$> rpm -q binutils
binutils-2.25.0-13.1.ppc64le


I tested it the following way:

# booted the compiled kernel and printed the default cmdline
$> cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.4.0-rc3-11-default+ root=UUID=...

# tried function_graph tracer to check ftrace with regs
echo function_graph >/sys/kernel/debug/tracing/current_tracer ; \
echo 1 >/sys/kernel/debug/tracing/tracing_on ; \
sleep 1 ; \
/usr/bin/ls /proc ; \
echo 0 >/sys/kernel/debug/tracing/tracing_on ; \
less /sys/kernel/debug/tracing/trace

# loaded the patch and printed the patch cmdline
$> modprobe livepatch-sample
$> cat /proc/cmdline
this has been live patched

# tried to disable and enable the patch
$> echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
$> cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.4.0-rc3-11-default+ root=UUID=...
$> echo 1 > /sys/kernel/livepatch/livepatch_sample/enabled
$> cat /proc/cmdline
this has been live patched

# also checked messages
$> dmesg | tail -n 4
[   33.673057] livepatch: tainting kernel with TAINT_LIVEPATCH
[   33.673068] livepatch: enabling patch 'livepatch_sample'
[ 1997.098257] livepatch: disabling patch 'livepatch_sample'
[ 2079.696277] livepatch: enabling patch 'livepatch_sample'


> For me the series doesn't even boot, even with livepatching disabled.

I wonder if you have enabled CONFIG_FTRACE_STARTUP_TEST and if
the ftrace with regs fails on your setup.

Best Regards,
Petr
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] s390: enable text relative kallsyms for 64-bit targets

2016-01-20 Thread Heiko Carstens
On Wed, Jan 20, 2016 at 11:04:24AM +0100, Ard Biesheuvel wrote:
> On 20 January 2016 at 10:43, Heiko Carstens  wrote:
> > On Wed, Jan 20, 2016 at 10:05:37AM +0100, Ard Biesheuvel wrote:
> >> This enables the newly introduced text-relative kallsyms support when
> >> building 64-bit targets. This cuts the size of the kallsyms address
> >> table in half, reducing the memory footprint of the kernel .rodata
> >> section by about 250 KB for a defconfig build.
> >>
> >> Signed-off-by: Ard Biesheuvel 
> >> ---
> >>
> >> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> >> index dbeeb3a049f2..588160fd1db0 100644
> >> --- a/arch/s390/Kconfig
> >> +++ b/arch/s390/Kconfig
> >> @@ -149,6 +149,7 @@ config S390
> >>   select HAVE_REGS_AND_STACK_ACCESS_API
> >>   select HAVE_SYSCALL_TRACEPOINTS
> >>   select HAVE_VIRT_CPU_ACCOUNTING
> >> + select KALLSYMS_TEXT_RELATIVE if 64BIT
> >
> > Please remove the "if 64BIT" since s390 is always 64BIT in the meantime.
> > Tested on s390 and everything seems still to work ;)
> >
> > Acked-by: Heiko Carstens 
> >
> 
> Thanks! Did you take a look at /proc/kallsyms, by any chance? It
> should look identical with and without these patches

Close to identical, since the generated code and offsets change a bit with
your new config option enabled and disabled. But only those parts that are
linked behind kernel/kallsyms.c.

However I did run a couple of ftrace, kprobes tests and enforced call
backtraces. Everything still works.

So it looks all good.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v10 3/4] tools/perf: Map the ID values with register names

2016-01-20 Thread Michael Ellerman
On Mon, 2016-01-11 at 15:58 +0530, Anju T wrote:
> diff --git a/tools/perf/arch/powerpc/include/perf_regs.h 
> b/tools/perf/arch/powerpc/include/perf_regs.h
> new file mode 100644
> index 000..93080f5
> --- /dev/null
> +++ b/tools/perf/arch/powerpc/include/perf_regs.h
> @@ -0,0 +1,64 @@
> +#ifndef ARCH_PERF_REGS_H
> +#define ARCH_PERF_REGS_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define PERF_REGS_MASK  ((1ULL << PERF_REG_POWERPC_MAX) - 1)
> +#define PERF_REGS_MAX   PERF_REG_POWERPC_MAX
> +#define PERF_SAMPLE_REGS_ABI   PERF_SAMPLE_REGS_ABI_64

That looks wrong if perf is built 32-bit ?

> +#define PERF_REG_IP PERF_REG_POWERPC_NIP
> +#define PERF_REG_SP PERF_REG_POWERPC_GPR1
> +
> +static const char *reg_names[] = {
> + [PERF_REG_POWERPC_GPR0] = "gpr0",

Can you instead call them "r0" etc.

That is much more common on powerpc than "gpr0".

> + [PERF_REG_POWERPC_GPR1] = "gpr1",
> + [PERF_REG_POWERPC_GPR2] = "gpr2",
> + [PERF_REG_POWERPC_GPR3] = "gpr3",
> + [PERF_REG_POWERPC_GPR4] = "gpr4",
> + [PERF_REG_POWERPC_GPR5] = "gpr5",
> + [PERF_REG_POWERPC_GPR6] = "gpr6",
> + [PERF_REG_POWERPC_GPR7] = "gpr7",
> + [PERF_REG_POWERPC_GPR8] = "gpr8",
> + [PERF_REG_POWERPC_GPR9] = "gpr9",
> + [PERF_REG_POWERPC_GPR10] = "gpr10",
> + [PERF_REG_POWERPC_GPR11] = "gpr11",
> + [PERF_REG_POWERPC_GPR12] = "gpr12",
> + [PERF_REG_POWERPC_GPR13] = "gpr13",
> + [PERF_REG_POWERPC_GPR14] = "gpr14",
> + [PERF_REG_POWERPC_GPR15] = "gpr15",
> + [PERF_REG_POWERPC_GPR16] = "gpr16",
> + [PERF_REG_POWERPC_GPR17] = "gpr17",
> + [PERF_REG_POWERPC_GPR18] = "gpr18",
> + [PERF_REG_POWERPC_GPR19] = "gpr19",
> + [PERF_REG_POWERPC_GPR20] = "gpr20",
> + [PERF_REG_POWERPC_GPR21] = "gpr21",
> + [PERF_REG_POWERPC_GPR22] = "gpr22",
> + [PERF_REG_POWERPC_GPR23] = "gpr23",
> + [PERF_REG_POWERPC_GPR24] = "gpr24",
> + [PERF_REG_POWERPC_GPR25] = "gpr25",
> + [PERF_REG_POWERPC_GPR26] = "gpr26",
> + [PERF_REG_POWERPC_GPR27] = "gpr27",
> + [PERF_REG_POWERPC_GPR28] = "gpr28",
> + [PERF_REG_POWERPC_GPR29] = "gpr29",
> + [PERF_REG_POWERPC_GPR30] = "gpr30",
> + [PERF_REG_POWERPC_GPR31] = "gpr31",
> + [PERF_REG_POWERPC_NIP] = "nip",
> + [PERF_REG_POWERPC_MSR] = "msr",
> + [PERF_REG_POWERPC_ORIG_R3] = "orig_r3",
> + [PERF_REG_POWERPC_CTR] = "ctr",
> + [PERF_REG_POWERPC_LNK] = "link",
> + [PERF_REG_POWERPC_XER] = "xer",
> + [PERF_REG_POWERPC_CCR] = "ccr",
> + [PERF_REG_POWERPC_TRAP] = "trap",
> + [PERF_REG_POWERPC_DAR] = "dar",
> + [PERF_REG_POWERPC_DSISR] = "dsisr"
> +};
> +
> +static inline const char *perf_reg_name(int id)
> +{
> + return reg_names[id];
> +}
> +#endif /* ARCH_PERF_REGS_H */
> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
> index 38a0853..62a2f2d 100644
> --- a/tools/perf/config/Makefile
> +++ b/tools/perf/config/Makefile
> @@ -23,6 +23,11 @@ $(call detected_var,ARCH)
>  
>  NO_PERF_REGS := 1
>  
> +# Additional ARCH settings for ppc64
> +ifeq ($(ARCH),powerpc)

powerpc also includes ppc, ie. 32-bit, so the comment is wrong.

> +  NO_PERF_REGS := 0
> +endif
> +
>  # Additional ARCH settings for x86
>  ifeq ($(ARCH),x86)
>$(call detected,CONFIG_X86)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/4] support for text-relative kallsyms table

2016-01-20 Thread Ingo Molnar

* Ard Biesheuvel  wrote:

> This implements text-relative kallsyms address tables. This was developed as 
> part of my series to implement KASLR/CONFIG_RELOCATABLE for arm64, but I 
> think 
> it may be beneficial to other architectures as well, so I am presenting it as 
> a 
> separate series.
> 
> The idea is that on 64-bit builds, it is rather wasteful to use absolute 
> addressing for kernel symbols since they are all within a couple of MBs of 
> each 
> other. On top of that, the absolute addressing implies that, when the kernel 
> is 
> relocated at runtime, each address in the table needs to be fixed up 
> individually.
> 
> Since all section-relative addresses are already emitted relative to _text, 
> it 
> is quite straight-forward to record only the offset, and add the absolute 
> address of _text at runtime when referring to the address table.
> 
> The reduction ranges from around 250 KB uncompressed vmlinux size and 10 KB 
> compressed size (s390) to 3 MB/500 KB for ppc64 (although, in the latter 
> case, 
> the reduction in uncompressed size is primarily __init data)

So since kallsyms is in unswappable kernel RAM, the uncompressed size reduction 
is 
what we care about mostly. How much bootloader load times are impacted is a 
third 
order concern.

IOW a nice change!

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V10 1/4] perf/powerpc: assign an id to each powerpc register

2016-01-20 Thread Michael Ellerman
Hi Anju,

On Mon, 2016-01-11 at 15:58 +0530, Anju T wrote:

> The enum definition assigns an 'id' to each register in "struct pt_regs"
> of arch/powerpc. The order of these values in the enum definition are
> based on the corresponding macros in arch/powerpc/include/uapi/asm/ptrace.h.

Sorry one thing ...

> diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h 
> b/arch/powerpc/include/uapi/asm/perf_regs.h
> new file mode 100644
> index 000..cfbd068
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/perf_regs.h
> @@ -0,0 +1,49 @@
> +#ifndef _ASM_POWERPC_PERF_REGS_H
> +#define _ASM_POWERPC_PERF_REGS_H
> +
> +enum perf_event_powerpc_regs {
> + PERF_REG_POWERPC_GPR0,
> + PERF_REG_POWERPC_GPR1,
> + PERF_REG_POWERPC_GPR2,
> + PERF_REG_POWERPC_GPR3,
> + PERF_REG_POWERPC_GPR4,
> + PERF_REG_POWERPC_GPR5,
> + PERF_REG_POWERPC_GPR6,
> + PERF_REG_POWERPC_GPR7,
> + PERF_REG_POWERPC_GPR8,
> + PERF_REG_POWERPC_GPR9,
> + PERF_REG_POWERPC_GPR10,
> + PERF_REG_POWERPC_GPR11,
> + PERF_REG_POWERPC_GPR12,
> + PERF_REG_POWERPC_GPR13,
> + PERF_REG_POWERPC_GPR14,
> + PERF_REG_POWERPC_GPR15,
> + PERF_REG_POWERPC_GPR16,
> + PERF_REG_POWERPC_GPR17,
> + PERF_REG_POWERPC_GPR18,
> + PERF_REG_POWERPC_GPR19,
> + PERF_REG_POWERPC_GPR20,
> + PERF_REG_POWERPC_GPR21,
> + PERF_REG_POWERPC_GPR22,
> + PERF_REG_POWERPC_GPR23,
> + PERF_REG_POWERPC_GPR24,
> + PERF_REG_POWERPC_GPR25,
> + PERF_REG_POWERPC_GPR26,
> + PERF_REG_POWERPC_GPR27,
> + PERF_REG_POWERPC_GPR28,
> + PERF_REG_POWERPC_GPR29,
> + PERF_REG_POWERPC_GPR30,
> + PERF_REG_POWERPC_GPR31,
> + PERF_REG_POWERPC_NIP,
> + PERF_REG_POWERPC_MSR,
> + PERF_REG_POWERPC_ORIG_R3,
> + PERF_REG_POWERPC_CTR,
> + PERF_REG_POWERPC_LNK,
> + PERF_REG_POWERPC_XER,
> + PERF_REG_POWERPC_CCR,

You skipped SOFTE here at my suggestion, because it's called MQ on 32-bit.

But I've changed my mind, I think we *should* define SOFTE, and ignore MQ,
because MQ is unused. So just add:

  + PERF_REG_POWERPC_SOFTE,


> + PERF_REG_POWERPC_TRAP,
> + PERF_REG_POWERPC_DAR,
> + PERF_REG_POWERPC_DSISR,
> + PERF_REG_POWERPC_MAX,
> +};
> +#endif /* _ASM_POWERPC_PERF_REGS_H */

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/4] support for text-relative kallsyms table

2016-01-20 Thread Arnd Bergmann
On Wednesday 20 January 2016 11:33:25 Ingo Molnar wrote:
> > The reduction ranges from around 250 KB uncompressed vmlinux size and 10 KB 
> > compressed size (s390) to 3 MB/500 KB for ppc64 (although, in the latter 
> > case, 
> > the reduction in uncompressed size is primarily __init data)
> 
> So since kallsyms is in unswappable kernel RAM, the uncompressed size 
> reduction is 
> what we care about mostly. How much bootloader load times are impacted is a 
> third 
> order concern.
> 
> IOW a nice change!

I think some people care a lot about the compressed size as well:

http://git.openwrt.org/?p=openwrt.git;a=blob;f=target/linux/generic/patches-4.4/203-kallsyms_uncompressed.patch;h=cf8a447bbcd5b1621d4edc36a69fe0ad384fe53f;hb=HEAD

This has been in openwrt.git for ages, because a lot of the target devices
are much more limited on flash memory size (4MB typically) than they are
on RAM size (at least 32MB).

Arnd
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v9 2/6] Documentation, dt, arm64/arm: dt bindings for numa.

2016-01-20 Thread Rob Herring
On Mon, Jan 18, 2016 at 10:06:01PM +0530, Ganapatrao Kulkarni wrote:
> DT bindings for numa mapping of memory, cores and IOs.
> 
> Reviewed-by: Robert Richter 
> Signed-off-by: Ganapatrao Kulkarni 
> ---
>  Documentation/devicetree/bindings/arm/numa.txt | 272 
> +
>  1 file changed, 272 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/numa.txt

This is looks okay to me, but some cosmetic things on the example.

> +==
> +4 - Example dts
> +==
> +
> +2 sockets system consists of 2 boards connected through ccn bus and
> +each board having one socket/soc of 8 cpus, memory and pci bus.
> +
> + memory@00c0 {

Drop the leading 0s on unit addresses.

> + device_type = "memory";
> + reg = <0x0 0x00c0 0x0 0x8000>;
> + /* node 0 */
> + numa-node-id = <0>;
> + };
> +
> + memory@100 {
> + device_type = "memory";
> + reg = <0x100 0x 0x0 0x8000>;
> + /* node 1 */
> + numa-node-id = <1>;
> + };
> +
> + cpus {
> + #address-cells = <2>;
> + #size-cells = <0>;
> +
> + cpu@000 {

Same here (leaving one of course).

> + device_type = "cpu";
> + compatible =  "arm,armv8";
> + reg = <0x0 0x000>;
> + enable-method = "psci";
> + /* node 0 */
> + numa-node-id = <0>;
> + };
> + cpu@001 {

and so on...

> + device_type = "cpu";
> + compatible =  "arm,armv8";
> + reg = <0x0 0x001>;

Either all leading 0s or none.

> + reg = <0x0 0x008>;
> + enable-method = "psci";
> + /* node 1 */

Kind of a pointless comment.

Wouldn't each cluster of cpus for a given numa node be in a different 
cpu affinity? Certainly not required by the architecture, but the common 
case at least.

> + numa-node-id = <1>;
> + };

[...]

> + pcie0: pcie0@0x8480, {

Drop the 0x and the comma.

> + compatible = "arm,armv8";
> + device_type = "pci";
> + bus-range = <0 255>;
> + #size-cells = <2>;
> + #address-cells = <3>;
> + reg = <0x8480 0x 0 0x1000>;  /* Configuration space 
> */
> + ranges = <0x0300 0x8010 0x 0x8010 0x 0x70 
> 0x>;
> + /* node 0 */
> + numa-node-id = <0>;
> +};
> +
> + pcie1: pcie1@0x9480, {

ditto

> + compatible = "arm,armv8";
> + device_type = "pci";
> + bus-range = <0 255>;
> + #size-cells = <2>;
> + #address-cells = <3>;
> + reg = <0x9480 0x 0 0x1000>;  /* Configuration space 
> */
> + ranges = <0x0300 0x9010 0x 0x9010 0x 0x70 
> 0x>;
> + /* node 1 */
> + numa-node-id = <1>;
> +};
> +
> + distance-map {
> + compatible = "numa-distance-map-v1";
> + distance-matrix = <0 0 10>,
> +   <0 1 20>,
> +   <1 1 10>;
> + };
> -- 
> 1.8.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev