date:20170623

Re: [PATCH 12/17] tty: New RISC-V SBI Console Driver

2017-06-23 Thread Palmer Dabbelt

On Wed, 07 Jun 2017 00:58:04 PDT (-0700), Arnd Bergmann wrote:
> On Wed, Jun 7, 2017 at 9:15 AM, Geert Uytterhoeven  
> wrote:
>> CC (hypervisor) console folks
>>
>> On Wed, Jun 7, 2017 at 1:00 AM, Palmer Dabbelt  wrote:
>>> This patch adds a new driver for the console availiable via the RISC-V
>>> SBI.  This console is specified to be used for early boot messages, and
>>> is designed to be a very simple (albiet somewhat slow) console that is
>>> always availiable.  All RISC-V systems have an SBI console.
>>>
>>> The SBI console is made availiable for early printk messages and is also
>>> availiable as a regular console.
>>>
>>> Signed-off-by: Palmer Dabbelt 
>>> ---
>>>  drivers/tty/hvc/Kconfig   |  11 +
>>>  drivers/tty/hvc/Makefile  |   1 +
>>>  drivers/tty/hvc/hvc_sbi.c | 102 
>>> ++
>>>  3 files changed, 114 insertions(+)
>>>  create mode 100644 drivers/tty/hvc/hvc_sbi.c
>>>
>>> diff --git a/drivers/tty/hvc/Kconfig b/drivers/tty/hvc/Kconfig
>>> index 574da15fe618..f3774adab240 100644
>>> --- a/drivers/tty/hvc/Kconfig
>>> +++ b/drivers/tty/hvc/Kconfig
>>> @@ -114,4 +114,15 @@ config HVCS
>>>   which will also be compiled when this driver is built as a
>>>   module.
>>>
>>> +config HVC_SBI
>>> +   bool "SBI console support"
>>> +   depends on RISCV
>>> +   select HVC_DRIVER
>>> +   default y
>>> +   help
>>> + This enables support for console output via RISC-V SBI calls, 
>>> which
>>> + is normally used only during boot to output printk.
>>> +
>>> + If you don't know what do to here, say Y.
>>> +
>>>  endif # TTY
>
> Please move this a little higher along with the other HVC_DRIVER
> implementations.

OK: 
https://github.com/riscv/riscv-linux/commit/1c769cad7931b7b08644d2d4a7b6985777a8e0be

>>> + * RISC-V SBI interface to hvc_console.c
>>> + *  based on drivers-tty/hvc/hvc_udbg.c
>>> + *
>>> + * Copyright (C) 2008 David Gibson, IBM Corporation
>>> + * Copyright (C) 2012 Regents of the University of California
>
> 2017?

https://github.com/riscv/riscv-linux/commit/dafa678d26886076a8a92486f1bbbfa44aa8

[GIT PULL] Please pull powerpc/linux.git powerpc-4.12-7 tag

2017-06-23 Thread Michael Ellerman

Hi Linus,

Please pull some more powerpc fixes for 4.12. Most of these actually came in
last week but got held up for some more testing.

The following changes since commit a093c92dc7f96a15de98ec8cfe38e6f7610a5969:

  powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path 
(2017-06-16 16:10:37 +1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.12-7

for you to fetch changes up to 34f19ff1b5a0d11e46df479623d6936460105c9f:

  powerpc/64: Initialise thread_info for emergency stacks (2017-06-23 13:25:38 
+1000)


powerpc fixes for 4.12 #7

 - three fixes for kprobes/ftrace/livepatch interactions.

 - properly handle data breakpoints when using the Radix MMU.

 - fix for perf sampling of registers during call_usermodehelper().

 - properly initialise the thread_info on our emergency stacks.

 - add an explicit flush when doing TLB invalidations for a process
   using NPU2.

Thanks to:
  Alistair Popple, Naveen N. Rao, Nicholas Piggin, Ravi Bangoria,
  Masami Hiramatsu.


Alistair Popple (1):
  powerpc/powernv/npu-dma: Add explicit flush when sending an ATSD

Naveen N. Rao (4):
  powerpc/kprobes: Pause function_graph tracing during jprobes handling
  powerpc/ftrace: Pass the correct stack pointer for 
DYNAMIC_FTRACE_WITH_REGS
  powerpc/kprobes: Skip livepatch_handler() for jprobes
  powerpc/64s: Handle data breakpoints in Radix mode

Nicholas Piggin (1):
  powerpc/64: Initialise thread_info for emergency stacks

Ravi Bangoria (1):
  powerpc/perf: Fix oops when kthread execs user process

 arch/powerpc/include/asm/kprobes.h |  1 +
 arch/powerpc/kernel/exceptions-64s.S   | 11 +--
 arch/powerpc/kernel/kprobes.c  | 17 +
 arch/powerpc/kernel/setup_64.c | 31 -
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 59 
 arch/powerpc/perf/perf_regs.c  |  3 +-
 arch/powerpc/platforms/powernv/npu-dma.c   | 94 ++
 7 files changed, 166 insertions(+), 50 deletions(-)


signature.asc
Description: PGP signature

[PATCH v2 1/5] binfmt_elf: Use ELF_ET_DYN_BASE only for PIE

2017-06-23 Thread Kees Cook

The ELF_ET_DYN_BASE position was originally intended to keep loaders
away from ET_EXEC binaries. (For example, running "/lib/ld-linux.so.2
/bin/cat" might cause the subsequent load of /bin/cat into where the
loader had been loaded.) With the advent of PIE (ET_DYN binaries with
an INTERP Program Header), ELF_ET_DYN_BASE continued to be used since
the kernel was only looking at ET_DYN. However, since ELF_ET_DYN_BASE
is traditionally set at the top 1/3rd of the TASK_SIZE, a substantial
portion of the address space is unused.

For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs
are loaded below the mmap region. This means they can be made to collide
(CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with pathological
stack regions. Lowering ELF_ET_DYN_BASE solves both by moving programs
above the mmap region in all cases, and will now additionally avoid
programs falling back to the mmap region by enforcing MAP_FIXED for
program loads (i.e. if it would have collided with the stack, now it
will fail to load instead of falling back to the mmap region).

To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP)
are loaded into the mmap region, leaving space available for either an
ET_EXEC binary with a fixed location or PIE being loaded into mmap by the
loader. Only PIE programs are loaded offset from ELF_ET_DYN_BASE, which
means architectures can now safely lower their values without risk of
loaders colliding with their subsequently loaded programs.

For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to
use the entire 32-bit address space for 32-bit pointers. For 32-bit,
4MB is used as the traditional minimum load location, likely to avoid
historically requiring a 4MB page table entry when only a portion of the
first 4MB would be used (since the NULL address is avoided).

Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and
suggestions on how to implement this solution.

Fixes: d1fd836dcf00 ("mm: split ET_DYN ASLR from mmap ASLR")
Cc: sta...@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: Kees Cook 
Acked-by: Rik van Riel 
---
 arch/x86/include/asm/elf.h | 13 +-
 fs/binfmt_elf.c| 59 +++---
 2 files changed, 58 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index e8ab9a46bc68..1c18d83d3f09 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -245,12 +245,13 @@ extern int force_personality32;
 #define CORE_DUMP_USE_REGSET
 #define ELF_EXEC_PAGESIZE  4096
 
-/* This is the location that an ET_DYN program is loaded if exec'ed.  Typical
-   use of this is to invoke "./ld.so someprog" to test out a new version of
-   the loader.  We need to make sure that it is out of the way of the program
-   that it will "exec", and that there is sufficient room for the brk.  */
-
-#define ELF_ET_DYN_BASE(TASK_SIZE / 3 * 2)
+/*
+ * This is the base location for PIE (ET_DYN with INTERP) loads. On
+ * 64-bit, this is raised to 4GB to leave the entire 32-bit address
+ * space open for things that want to use the area for 32-bit pointers.
+ */
+#define ELF_ET_DYN_BASE(mmap_is_ia32() ? 0x00040UL : \
+ 0x1UL)
 
 /* This yields a mask that user programs can use to figure out what
instruction set this CPU supports.  This could be done in user space,
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index ef4fb234bb5b..879ff9c7ffd0 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -925,17 +925,60 @@ static int load_elf_binary(struct linux_binprm *bprm)
elf_flags = MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE;
 
vaddr = elf_ppnt->p_vaddr;
+   /*
+* If we are loading ET_EXEC or we have already performed
+* the ET_DYN load_addr calculations, proceed normally.
+*/
if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
elf_flags |= MAP_FIXED;
} else if (loc->elf_ex.e_type == ET_DYN) {
-   /* Try and get dynamic programs out of the way of the
-* default mmap base, as well as whatever program they
-* might try to exec.  This is because the brk will
-* follow the loader, and is not movable.  */
-   load_bias = ELF_ET_DYN_BASE - vaddr;
-   if (current->flags & PF_RANDOMIZE)
-   load_bias += arch_mmap_rnd();
-   load_bias = ELF_PAGESTART(load_bias);
+   /*
+* This logic is run once for the first LOAD Program
+* Header for ET_DYN binaries to calculate the
+* randomization (load_bias) for all the LOAD

[PATCH v2 2/5] arm: Move ELF_ET_DYN_BASE to 4MB

2017-06-23 Thread Kees Cook

Now that explicitly executed loaders are loaded in the mmap region, we
have more freedom to decide where we position PIE binaries in the address
space to avoid possible collisions with mmap or stack regions.

4MB is chosen here mainly to have parity with x86, where this is the
traditional minimum load location, likely to avoid historically requiring
a 4MB page table entry when only a portion of the first 4MB would be used
(since the NULL address is avoided). For ARM the position could be 0x8000,
the standard ET_EXEC load address, but that is needlessly close to the
NULL address, and anyone running PIE on 32-bit ARM will have an MMU, so
the tight mapping is not needed.

Cc: sta...@vger.kernel.org
Cc: Russell King 
Signed-off-by: Kees Cook 
---
 arch/arm/include/asm/elf.h | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/elf.h b/arch/arm/include/asm/elf.h
index d2315ffd8f12..f13ae153fb24 100644
--- a/arch/arm/include/asm/elf.h
+++ b/arch/arm/include/asm/elf.h
@@ -112,12 +112,8 @@ int dump_task_regs(struct task_struct *t, elf_gregset_t 
*elfregs);
 #define CORE_DUMP_USE_REGSET
 #define ELF_EXEC_PAGESIZE  4096
 
-/* This is the location that an ET_DYN program is loaded if exec'ed.  Typical
-   use of this is to invoke "./ld.so someprog" to test out a new version of
-   the loader.  We need to make sure that it is out of the way of the program
-   that it will "exec", and that there is sufficient room for the brk.  */
-
-#define ELF_ET_DYN_BASE(TASK_SIZE / 3 * 2)
+/* This is the base location for PIE (ET_DYN with INTERP) loads. */
+#define ELF_ET_DYN_BASE0x40UL
 
 /* When the program starts, a1 contains a pointer to a function to be 
registered with atexit, as per the SVR4 ABI.  A value of 0 means we 
-- 
2.7.4

[PATCH v2 5/5] s390: Move ELF_ET_DYN_BASE to 4GB / 4MB

2017-06-23 Thread Kees Cook

Now that explicitly executed loaders are loaded in the mmap region, we
have more freedom to decide where we position PIE binaries in the address
space to avoid possible collisions with mmap or stack regions.

For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
address space for 32-bit pointers. On 32-bit use 4MB, which is the
traditional x86 minimum load location, likely to avoid historically
requiring a 4MB page table entry when only a portion of the first 4MB
would be used (since the NULL address is avoided). For s390 the position
could be 0x1, but that is needlessly close to the NULL address.

Cc: sta...@vger.kernel.org
Cc: Heiko Carstens 
Cc: Martin Schwidefsky 
Signed-off-by: Kees Cook 
---
 arch/s390/include/asm/elf.h | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h
index e8f623041769..7c58d599f91b 100644
--- a/arch/s390/include/asm/elf.h
+++ b/arch/s390/include/asm/elf.h
@@ -161,14 +161,13 @@ extern unsigned int vdso_enabled;
 #define CORE_DUMP_USE_REGSET
 #define ELF_EXEC_PAGESIZE  4096
 
-/* This is the location that an ET_DYN program is loaded if exec'ed.  Typical
-   use of this is to invoke "./ld.so someprog" to test out a new version of
-   the loader.  We need to make sure that it is out of the way of the program
-   that it will "exec", and that there is sufficient room for the brk. 64-bit
-   tasks are aligned to 4GB. */
-#define ELF_ET_DYN_BASE (is_compat_task() ? \
-   (STACK_TOP / 3 * 2) : \
-   (STACK_TOP / 3 * 2) & ~((1UL << 32) - 1))
+/*
+ * This is the base location for PIE (ET_DYN with INTERP) loads. On
+ * 64-bit, this is raised to 4GB to leave the entire 32-bit address
+ * space open for things that want to use the area for 32-bit pointers.
+ */
+#define ELF_ET_DYN_BASE(is_compat_task() ? 0x00040UL : \
+   0x1UL)
 
 /* This yields a mask that user programs can use to figure out what
instruction set this CPU supports. */
-- 
2.7.4

[PATCH v2 4/5] powerpc: Move ELF_ET_DYN_BASE to 4GB / 4MB

2017-06-23 Thread Kees Cook

Now that explicitly executed loaders are loaded in the mmap region, we
have more freedom to decide where we position PIE binaries in the address
space to avoid possible collisions with mmap or stack regions.

For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
address space for 32-bit pointers. On 32-bit use 4MB, which is the
traditional x86 minimum load location, likely to avoid historically
requiring a 4MB page table entry when only a portion of the first 4MB
would be used (since the NULL address is avoided).

Cc: sta...@vger.kernel.org
Signed-off-by: Kees Cook 
Acked-by: Michael Ellerman 
---
 arch/powerpc/include/asm/elf.h | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h
index 09bde6e34f5d..548d9a411a0d 100644
--- a/arch/powerpc/include/asm/elf.h
+++ b/arch/powerpc/include/asm/elf.h
@@ -23,12 +23,13 @@
 #define CORE_DUMP_USE_REGSET
 #define ELF_EXEC_PAGESIZE  PAGE_SIZE
 
-/* This is the location that an ET_DYN program is loaded if exec'ed.  Typical
-   use of this is to invoke "./ld.so someprog" to test out a new version of
-   the loader.  We need to make sure that it is out of the way of the program
-   that it will "exec", and that there is sufficient room for the brk.  */
-
-#define ELF_ET_DYN_BASE0x2000
+/*
+ * This is the base location for PIE (ET_DYN with INTERP) loads. On
+ * 64-bit, this is raised to 4GB to leave the entire 32-bit address
+ * space open for things that want to use the area for 32-bit pointers.
+ */
+#define ELF_ET_DYN_BASE(is_32bit_task() ? 0x00040UL : \
+  0x1UL)
 
 #define ELF_CORE_EFLAGS (is_elf2_task() ? 2 : 0)
 
-- 
2.7.4

[PATCH v2 3/5] arm64: Move ELF_ET_DYN_BASE to 4GB / 4MB

2017-06-23 Thread Kees Cook

Now that explicitly executed loaders are loaded in the mmap region, we
have more freedom to decide where we position PIE binaries in the address
space to avoid possible collisions with mmap or stack regions.

For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
address space for 32-bit pointers. On 32-bit use 4MB, to match ARM. This
could be 0x8000, the standard ET_EXEC load address, but that is needlessly
close to the NULL address, and anyone running arm compat PIE will have an
MMU, so the tight mapping is not needed.

Cc: sta...@vger.kernel.org
Cc: Ard Biesheuvel 
Cc: Catalin Marinas 
Cc: Mark Rutland 
Signed-off-by: Kees Cook 
---
 arch/arm64/include/asm/elf.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index 5d1700425efe..8790fb09f689 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -113,12 +113,11 @@
 #define ELF_EXEC_PAGESIZE  PAGE_SIZE
 
 /*
- * This is the location that an ET_DYN program is loaded if exec'ed.  Typical
- * use of this is to invoke "./ld.so someprog" to test out a new version of
- * the loader.  We need to make sure that it is out of the way of the program
- * that it will "exec", and that there is sufficient room for the brk.
+ * This is the base location for PIE (ET_DYN with INTERP) loads. On
+ * 64-bit, this is raised to 4GB to leave the entire 32-bit address
+ * space open for things that want to use the area for 32-bit pointers.
  */
-#define ELF_ET_DYN_BASE(2 * TASK_SIZE_64 / 3)
+#define ELF_ET_DYN_BASE0x1UL
 
 #ifndef __ASSEMBLY__
 
@@ -173,7 +172,8 @@ extern int arch_setup_additional_pages(struct linux_binprm 
*bprm,
 
 #ifdef CONFIG_COMPAT
 
-#define COMPAT_ELF_ET_DYN_BASE (2 * TASK_SIZE_32 / 3)
+/* PIE load location for compat arm. Must match ARM ELF_ET_DYN_BASE. */
+#define COMPAT_ELF_ET_DYN_BASE 0x00040UL
 
 /* AArch32 registers. */
 #define COMPAT_ELF_NGREG   18
-- 
2.7.4

[PATCH v2 0/5] Use ELF_ET_DYN_BASE only for PIE

2017-06-23 Thread Kees Cook

This is v2 (to refresh the 5 patches in -mm) for moving ELF_ET_DYN_BASE
safely lower. Changes are clarifications in the commit logs (suggested
by mpe), a compat think-o fix for arm64 (thanks to Ard), and to add
Rik and mpe's Acks.

Quoting patch 1/5:

The ELF_ET_DYN_BASE position was originally intended to keep loaders
away from ET_EXEC binaries. (For example, running "/lib/ld-linux.so.2
/bin/cat" might cause the subsequent load of /bin/cat into where the
loader had been loaded.) With the advent of PIE (ET_DYN binaries with
an INTERP Program Header), ELF_ET_DYN_BASE continued to be used since
the kernel was only looking at ET_DYN. However, since ELF_ET_DYN_BASE
is traditionally set at the top 1/3rd of the TASK_SIZE, a substantial
portion of the address space is unused.

For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs
are loaded below the mmap region. This means they can be made to collide
(CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with pathological
stack regions. Lowering ELF_ET_DYN_BASE solves both by moving programs
above the mmap region in all cases, and will now additionally avoid
programs falling back to the mmap region by enforcing MAP_FIXED for
program loads (i.e. if it would have collided with the stack, now it
will fail to load instead of falling back to the mmap region).

To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP)
are loaded into the mmap region, leaving space available for either an
ET_EXEC binary with a fixed location or PIE being loaded into mmap by the
loader. Only PIE programs are loaded offset from ELF_ET_DYN_BASE, which
means architectures can now safely lower their values without risk of
loaders colliding with their subsequently loaded programs.

For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to
use the entire 32-bit address space for 32-bit pointers. For 32-bit,
4MB is used as the traditional minimum load location, likely to avoid
historically requiring a 4MB page table entry when only a portion of the
first 4MB would be used (since the NULL address is avoided).

Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and
suggestions on how to implement this solution.

-Kees

Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3

2017-06-23 Thread Al Viro

On Fri, Jun 23, 2017 at 01:49:16PM -0500, Larry Finger wrote:

> > BTW, could you try to check what happens if you kill the
> > if (__builtin_constant_p(n) && (n <= 8))
> > bits in raw_copy_{to,from}_user()?  The usefulness of those (in 
> > __copy_from_user()
> > originally) had always been dubious and the things are simpler without them.
> > If _that_ turns out to cure breakage, I would be very surprised, though.
> > 
> Sorry I was gone so long. Installing jessie on this box resulted in a crash
> on boot. Lubuntu 14.04 yielded a desktop with a functioning cursor, but
> nothing else. Finally, Ubuntu 12.04 resulted in a working system. I hate
> Unity, but I guess I'm stuck for now.

Ho-hum...  Jessie is 3.16, so whatever is crashing there, it's something
different...  Ubuntu 12.04 is what, 3.2?

> I know how easy it is to screw up a long bisection by booting the wrong
> kernel. To help that problem and to work around the yaconf/yboot nonsense on
> the MAC, my /etc/yaconf has always had generic kernel stanzas with only
> default, old, and original kernels mentioned. From there I use a local
> script to finish a kernel installation by moving the default links to the
> old ones and creating the new default links pointing to the current kernel.
> With those long-tested scripts, I'm sure that I am booting the one I want.
> 
> With the new installation, kernel 4.12-rc6 failed, as did 3448890c with the
> backported 46f401c4 added.
> 
> Replacing "if (__builtin_constant_p(n) && (n <= 8))" with "if (0)" had no 
> effect.

OK, that simplifies things a bit.  Just to make sure we are on the same page:

* f2ed8bebee69 + cherry-pick of 46f401c4 boots (Ubuntu 12.04 userland)
* 3448890c32c3 + cherry-pick of 46f401c4 fails (Ubuntu 12.04 userland), ditto
  with removal of constant-size bits in raw_copy_..._user().  Failure appears
  to be on udev getting EFAULT on some syscalls.
* straight Ubuntu 12.04 works
* jessie crashes on boot.

Could you post the boot logs of the first two?

Re: [PATCH 3/4] powerpc: Reduce ELF_ET_DYN_BASE

2017-06-23 Thread Kees Cook

On Fri, Jun 23, 2017 at 12:01 AM, Michael Ellerman  wrote:
> Kees Cook  writes:
>
>> Now that explicitly executed loaders are loaded in the mmap region,
>> position PIE binaries lower in the address space to avoid possible
>> collisions with mmap or stack regions. For 64-bit, align to 4GB to
>> allow runtimes to use the entire 32-bit address space for 32-bit
>> pointers.
>
> The change log and subject are a bit out of whack with the actual patch
> because previously we used 512MB.
>
> How about?
>
>   powerpc: Move ELF_ET_DYN_BASE to 4GB / 4MB
>
>   Now that explicitly executed loaders are loaded in the mmap region,
>   we have more freedom to decide where we position PIE binaries in the
>   address space to avoid possible collisions with mmap or stack regions.
>
>   For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
>   address space for 32-bit pointers. On 32-bit use 4MB.

Good idea, thanks. I'll resend the series with the commit logs updated.

> Is there any particular reasoning behind the 4MB value on 32-bit?

So, I've dug around a bit on this, and I *think* the rationale is to
avoid mapping a possible 4MB page table entry when it won't be using
at least a portion near the lower end (NULL address area covered
blocked by mmap_min_addr). It seems to be mainly tradition, though.

> I gave this a quick spin and it booted OK on all my test boxes, which
> covers 64-bit/32-bit kernel and userspace. So seems to work!

Awesome, thanks for the testing!

> Acked-by: Michael Ellerman 
>
> cheers

-Kees

-- 
Kees Cook
Pixel Security

Re: 1M hugepage size being registered on Linux

2017-06-23 Thread victora


Em 2017-06-22 00:59, Michael Ellerman escreveu:

Hi Victor,

Someone refreshed my memory on this, coffee was involved ...

victora  writes:

Hi Alistair/Jeremy,

I am working on a bug related to 1M hugepage size being registered on
Linux (Power 8 Baremetal - Garrison).


On those machines the property in the device tree comes straight from
hostboot, and it includes 1M:

# lsprop ibm,segment-page-sizes
ibm,segment-page-sizes
 000c    0003  000c
 baseshift slbenclpnum shift
   0010  0007  0018
 penc  shift penc  shift
 0038  0010  0110  0002
 penc  baseshift slbenclpnum
 0010  0001  0018  0008
 shift penc  shift penc
 0014  0130  0001  0014 <--- 1MB = 2^0x14
 baseshift slbenclpnum shift
 0002  0018  0100  0001
 penc  baseshift slbenclpnum
 0018    0022  0120
 shift penc  baseshift slbenc
 0001  0022  0003
 lpnum shift penc



I was checking dmesg and it seems that 1M page size is coming from
firmware to Linux.

[0.00] base_shift=20: shift=20, sllp=0x0130, avpnm=0x, 
tlbiel=0, penc=2
[1.528867] HugeTLB registered 1 MB page size, pre-allocated 0 
pages


Which is why you see that message.


Should Linux support this page size? As afar as I know, this was an
unsupported page size in the past isn't it? If this should be 
supported

now, is there any specific reason for that?


It's unsupported in Linux because it doesn't match the page table
geometry.

We merged a patch from Aneesh to filter it out in 4.12-rc1:

  a525108cf1cc ("powerpc/mm/hugetlb: Filter out hugepage size not
supported by page table layout")

I guess we should probably send that patch to stable et. al.

cheers


Hi Michael,

Sorry for the delay. Thanks for merging that patch.
Was that patch also sent to stable et. al.?

Thanks
Victor

Re: [PATCH 7/7] crypto: caam: cleanup CONFIG_64BIT ifdefs when using io{read|write}64

2017-06-23 Thread Logan Gunthorpe

Thanks Horia.

I'm inclined to just use your patch verbatim. I can set you as author,
but no matter how I do it, I'll need your Signed-off-by.

Logan

On 23/06/17 12:51 AM, Horia Geantă wrote:
> On 6/22/2017 7:49 PM, Logan Gunthorpe wrote:
>> Now that ioread64 and iowrite64 are always available we don't
>> need the ugly ifdefs to change their implementation when they
>> are not.
>>
> Thanks Logan.
> 
> Note however this is not equivalent - it changes the behaviour, since
> CAAM engine on i.MX6S/SL/D/Q platforms is broken in terms of 64-bit
> register endianness - see CONFIG_CRYPTO_DEV_FSL_CAAM_IMX usage in code
> you are removing.
> 
> [Yes, current code has its problems, as it does not differentiate b/w
> i.MX platforms with and without the (unofficial) erratum, but this
> should be fixed separately.]
> 
> Below is the change that would keep current logic - still forcing i.MX
> to write CAAM 64-bit registers in BE even if the engine is LE (yes, diff
> is doing a poor job).
> 
> Horia
> 
> diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h
> index 84d2f838a063..b893ebb24e65 100644
> --- a/drivers/crypto/caam/regs.h
> +++ b/drivers/crypto/caam/regs.h
> @@ -134,50 +134,25 @@ static inline void clrsetbits_32(void __iomem
> *reg, u32 clear, u32 set)
>   *base + 0x : least-significant 32 bits
>   *base + 0x0004 : most-significant 32 bits
>   */
> -#ifdef CONFIG_64BIT
>  static inline void wr_reg64(void __iomem *reg, u64 data)
>  {
> +#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
> if (caam_little_end)
> iowrite64(data, reg);
> else
> -   iowrite64be(data, reg);
> -}
> -
> -static inline u64 rd_reg64(void __iomem *reg)
> -{
> -   if (caam_little_end)
> -   return ioread64(reg);
> -   else
> -   return ioread64be(reg);
> -}
> -
> -#else /* CONFIG_64BIT */
> -static inline void wr_reg64(void __iomem *reg, u64 data)
> -{
> -#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
> -   if (caam_little_end) {
> -   wr_reg32((u32 __iomem *)(reg) + 1, data >> 32);
> -   wr_reg32((u32 __iomem *)(reg), data);
> -   } else
>  #endif
> -   {
> -   wr_reg32((u32 __iomem *)(reg), data >> 32);
> -   wr_reg32((u32 __iomem *)(reg) + 1, data);
> -   }
> +   iowrite64be(data, reg);
>  }
> 
>  static inline u64 rd_reg64(void __iomem *reg)
>  {
>  #ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
> if (caam_little_end)
> -   return ((u64)rd_reg32((u32 __iomem *)(reg) + 1) << 32 |
> -   (u64)rd_reg32((u32 __iomem *)(reg)));
> +   return ioread64(reg);
> else
>  #endif
> -   return ((u64)rd_reg32((u32 __iomem *)(reg)) << 32 |
> -   (u64)rd_reg32((u32 __iomem *)(reg) + 1));
> +   return ioread64be(reg);
>  }
> -#endif /* CONFIG_64BIT  */
> 
>  #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>  #ifdef CONFIG_SOC_IMX7D
> 
> 
>> Signed-off-by: Logan Gunthorpe 
>> Cc: "Horia Geantă" 
>> Cc: Dan Douglass 
>> Cc: Herbert Xu 
>> Cc: "David S. Miller" 
>> ---
>>  drivers/crypto/caam/regs.h | 29 -
>>  1 file changed, 29 deletions(-)
>>
>> diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h
>> index 84d2f838a063..26fc19dd0c39 100644
>> --- a/drivers/crypto/caam/regs.h
>> +++ b/drivers/crypto/caam/regs.h
>> @@ -134,7 +134,6 @@ static inline void clrsetbits_32(void __iomem *reg, u32 
>> clear, u32 set)
>>   *base + 0x : least-significant 32 bits
>>   *base + 0x0004 : most-significant 32 bits
>>   */
>> -#ifdef CONFIG_64BIT
>>  static inline void wr_reg64(void __iomem *reg, u64 data)
>>  {
>>  if (caam_little_end)
>> @@ -151,34 +150,6 @@ static inline u64 rd_reg64(void __iomem *reg)
>>  return ioread64be(reg);
>>  }
>>  
>> -#else /* CONFIG_64BIT */
>> -static inline void wr_reg64(void __iomem *reg, u64 data)
>> -{
>> -#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
>> -if (caam_little_end) {
>> -wr_reg32((u32 __iomem *)(reg) + 1, data >> 32);
>> -wr_reg32((u32 __iomem *)(reg), data);
>> -} else
>> -#endif
>> -{
>> -wr_reg32((u32 __iomem *)(reg), data >> 32);
>> -wr_reg32((u32 __iomem *)(reg) + 1, data);
>> -}
>> -}
>> -
>> -static inline u64 rd_reg64(void __iomem *reg)
>> -{
>> -#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
>> -if (caam_little_end)
>> -return ((u64)rd_reg32((u32 __iomem *)(reg) + 1) << 32 |
>> -(u64)rd_reg32((u32 __iomem *)(reg)));
>> -else
>> -#endif
>> -return ((u64)rd_reg32((u32 __iomem *)(reg)) << 32 |
>> -(u64)rd_reg32((u32 __iomem *)(reg) + 1));
>> -}
>> -#endif /* CONFIG_64BIT  */
>> -
>>  #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>>  #ifdef CONFIG_SOC_IMX7D
>>  #define

Re: [PATCH] powerpc/kernel: Avoid redundancies on giveup_all

2017-06-23 Thread Breno Leitao

Hi Cyril,

On Fri, Jun 23, 2017 at 04:03:12PM +1000, Cyril Bur wrote:
> On Thu, 2017-06-22 at 17:27 -0300, Breno Leitao wrote:
> > Currently giveup_all() calls __giveup_fpu(), __giveup_altivec(), and
> > __giveup_vsx(). But __giveup_vsx() also calls __giveup_fpu() and
> > __giveup_altivec() again, in a redudant manner.
> > 
> > Other than giving up FP and Altivec, __giveup_vsx() also disables
> > MSR_VSX on MSR, but this is already done by __giveup_{fpu,altivec}().
> > As VSX can not be enabled alone on MSR (without FP and/or VEC
> > enabled), this is also a redundancy. VSX is never enabled alone (without
> > FP and VEC) because every time VSX is enabled, as through load_up_vsx()
> > and restore_math(), FP is also enabled together.
> > 
> > This change improves giveup_all() in average just 3%, but since
> > giveup_all() is called very frequently, around 8x per CPU per second on
> > an idle machine, this change might show some noticeable improvement.
> > 
> 
> So I totally agree except this makes me quite nervous. I know we're
> quite good at always disabling VSX when we disable FPU and ALTIVEC and
> we do always turn VSX on when we enable FPU AND ALTIVEC. But still, if
> we ever get that wrong...

Right, I understand your point, we can consider this code as a
'fallback' if we, somehow, forget to disable VSX when disabling
FPU/ALTIVEC. Good point.

> I'm more interested in how this improves giveup_all() performance by so
> much, but then hardware often surprises - I guess that's the cost of a
> function call.

I got this number using ftrace. I used the 'funcgraph' tracer with the
trace_options set to 'funcgraph-duration'. Then I set set_ftrace_filter with
giveup_all().

There is also a tool that helps with it if you wish. It uses the
exactly same mechanism I used but in a more automated way. The tool name
is funcgraph by Brendan.

https://github.com/brendangregg/perf-tools/blob/master/kernel/funcgraph

> Perhaps caching the thread.regs->msr isn't a good idea.

Yes, I looked at it, but it seems that the compiler is optimizing it, keeping
it at r30, and not saving in the memory/stack. This is the code being generated
here, where r9 contains the task pointer.

 usermsr = tsk->thread.regs->msr;
c00199c4:   08 01 c9 eb ld r30,264(r9)

 if ((usermsr & msr_all_available) == 0)
  
c00199c8:   60 5f 2a e9 ld r9,24416(r10)
c00199cc:   39 48 ca 7f and.  r10,r30,r9
c00199d0:   20 00 82 40 bne c00199f0 


> If we could
> branch over in the common case and but still have the call to the
> function in case something goes horribly wrong?

Yes, we can revisit it on a future opportunity. Thanks for sharing your opinion.

Re: [PATCH 1/2] trace/kprobes: Sanitize derived event names

2017-06-23 Thread Masami Hiramatsu

On Fri, 23 Jun 2017 00:33:45 +0530
"Naveen N. Rao"  wrote:

> On 2017/06/22 06:29PM, Masami Hiramatsu wrote:
> > On Thu, 22 Jun 2017 00:20:27 +0530
> > "Naveen N. Rao"  wrote:
> > 
> > > When we derive event names, convert some expected symbols (such as ':'
> > > used to specify module:name and '.' present in some symbols) into
> > > underscores so that the event name is not rejected.
> > 
> > Oops, ok, this is my mistake.
> > 
> > Acked-by: Masami Hiramatsu 
> > 
> > This must be marked as bugfix for stable trees.
> > 
> > Could you also add a testcase for this (module name) bug?
> > 
> > MODNAME=`lsmod | head -n 2 | tail -n 1 | cut -f 1 -d " "`
> > FUNCNAME=`grep -m 1 "\\[$MODNAME\\]" /proc/kallsyms | xargs | cut -f 3 -d " 
> > "`
> > 
> > May gives you a target name :)
> 
> Sure. Here is a test.
> 
> Thanks for the review,
> Naveen
> 
> -
> [PATCH] selftests/ftrace: Add a test to probe module functions
> 
> Add a kprobes test to ensure that we are able to add a probe on a
> module function using 'p :' format, without having to
> specify a probe name.
> 
> Suggested-by: Masami Hiramatsu 
> Signed-off-by: Naveen N. Rao 

Perfect! :)

Acked-by: Masami Hiramatsu 

Thanks!

> ---
>  .../testing/selftests/ftrace/test.d/kprobe/probe_module.tc | 14 
> ++
>  1 file changed, 14 insertions(+)
>  create mode 100644 
> tools/testing/selftests/ftrace/test.d/kprobe/probe_module.tc
> 
> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/probe_module.tc 
> b/tools/testing/selftests/ftrace/test.d/kprobe/probe_module.tc
> new file mode 100644
> index ..ea7657041ba6
> --- /dev/null
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/probe_module.tc
> @@ -0,0 +1,14 @@
> +#!/bin/sh
> +# description: Kprobe dynamic event - probing module
> +
> +[ -f kprobe_events ] || exit_unsupported # this is configurable
> +
> +echo 0 > events/enable
> +echo > kprobe_events
> +export MOD=`lsmod | head -n 2 | tail -n 1 | cut -f1 -d" "`
> +export FUNC=`grep -m 1 ".* t .*\\[$MOD\\]" /proc/kallsyms | xargs | cut -f3 
> -d" "`
> +[ "x" != "x$MOD" -a "y" != "y$FUNC" ] || exit_untested
> +echo p $MOD:$FUNC > kprobe_events
> +grep $MOD kprobe_events
> +echo > kprobe_events
> +clear_trace
> -- 
> 2.13.1
> 


-- 
Masami Hiramatsu

Re: [PATCH 2/2] selftests/ftrace: Update multiple kprobes test for powerpc

2017-06-23 Thread Masami Hiramatsu

On Thu, 22 Jun 2017 22:33:25 +0530
"Naveen N. Rao"  wrote:

> On 2017/06/22 06:07PM, Masami Hiramatsu wrote:
> > On Thu, 22 Jun 2017 00:20:28 +0530
> > "Naveen N. Rao"  wrote:
> > 
> > > KPROBES_ON_FTRACE is only available on powerpc64le. Update comment to
> > > clarify this.
> > > 
> > > Also, we should use an offset of 8 to ensure that the probe does not
> > > fall on ftrace location. The current offset of 4 will fall before the
> > > function local entry point and won't fire, while an offset of 12 or 16
> > > will fall on ftrace location. Offset 8 is currently guaranteed to not be
> > > the ftrace location.
> > 
> > OK, these part seems good to me.
> > 
> > > 
> > > Finally, do not filter out symbols with a dot. Powerpc Elfv1 uses dot
> > > prefix for all functions and this prevents us from testing some of those
> > > symbols. Furthermore, with the patch to derive event names properly in
> > > the presence of ':' and '.', such names are accepted by kprobe_events
> > > and constitutes a good test for those symbols.
> > 
> > Hmm, the reason why I added such filter was to avoid symbols including
> > gcc-generated suffixes like as .constprop or .isra etc.
> 
> I see.
> 
> I do wonder -- is there a problem if we try probing those symbols? On my 
> local x86 vm, I don't see an issue probing it especially with the 
> previous patch to enable probing with symbols having a '.' or ':'.
> 
> Furthermore, since this is for testing kprobe_events, I feel it is good 
> to try probing those symbols too to catch any weird errors we may hit.

Yes, and that is not what this testcase is aiming to. That testcase should
be a separated one, with correct error handling.

Thank you,

> 
> Thanks for the review!
> - Naveen
> 
> 
> > So if the Powerpc Elfv1 use dot prefix, that is OK, in that case,
> > could you update the filter as "^.*\\..*" ?
> > 
> > Thank you,
> > 
> > > 
> > > Signed-off-by: Naveen N. Rao 
> > > ---
> > >  tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc | 8 
> > > 
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git 
> > > a/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc 
> > > b/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
> > > index f4d1ff785d67..d209c071b2c0 100644
> > > --- a/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
> > > +++ b/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
> > > @@ -2,16 +2,16 @@
> > >  # description: Register/unregister many kprobe events
> > >  
> > >  # ftrace fentry skip size depends on the machine architecture.
> > > -# Currently HAVE_KPROBES_ON_FTRACE defined on x86 and powerpc
> > > +# Currently HAVE_KPROBES_ON_FTRACE defined on x86 and powerpc64le
> > >  case `uname -m` in
> > >x86_64|i[3456]86) OFFS=5;;
> > > -  ppc*) OFFS=4;;
> > > +  ppc64le) OFFS=8;;
> > >*) OFFS=0;;
> > >  esac
> > >  
> > >  echo "Setup up to 256 kprobes"
> > > -grep t /proc/kallsyms | cut -f3 -d" " | grep -v .*\\..* | \
> > > -head -n 256 | while read i; do echo p ${i}+${OFFS} ; done > 
> > > kprobe_events ||:
> > > +grep t /proc/kallsyms | cut -f3 -d" " | head -n 256 | \
> > > +while read i; do echo p ${i}+${OFFS} ; done > kprobe_events ||:
> > >  
> > >  echo 1 > events/kprobes/enable
> > >  echo 0 > events/kprobes/enable
> > > -- 
> > > 2.13.1
> > > 
> > 
> > 
> > -- 
> > Masami Hiramatsu 
> > 
> 


-- 
Masami Hiramatsu

Re: [RFC PATCH 1/2] powerpc/xive: guest exploitation of the XIVE interrupt controller

2017-06-23 Thread Cédric Le Goater

On 06/22/2017 02:20 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2017-06-22 at 11:29 +0200, Cédric Le Goater wrote:
>> This is the framework for using XIVE in a PowerVM guest. The support
>> is very similar to the native one in a much simpler form.
> 
> Looks really good. Minor nits & comments...
> 
>> Instead of OPAL calls, a set of Hypervisors call are used to configure
>> the interrupt sources and the event/notification queues of the guest:
>>
>>H_INT_GET_SOURCE_INFO
>>H_INT_SET_SOURCE_CONFIG
>>H_INT_GET_SOURCE_CONFIG
>>H_INT_GET_QUEUE_INFO
>>H_INT_SET_QUEUE_CONFIG
>>H_INT_GET_QUEUE_CONFIG
>>H_INT_RESET
> 
> There are the base ones.
> 
>> Calls that still need to be addressed :
>>
>>H_INT_SET_OS_REPORTING_LINE
>>H_INT_GET_OS_REPORTING_LINE
> 
> Ah so those have to do with that magic cache line you can register with
> the HW so that when you get an interrupt, you can do an MMIO store very
> early on in the interrupt entry path to the XIVE, which will
> asynchronously write the NSR etc... to that cache line which you can
> then poke at later one.
>
> I don't know if it's worth exploiting in Linux, but we should support
> it in qemu/kvm.
 
>From a QEMU point of view, it's not a big deal I think. I just haven't 
introduced a NVT structure yet, which would be needed to hold the address 
of the reporting cache line, or something similar.

>>H_INT_ESB
> 
> This is a h-call that performs the basic ESB operations. Some
> interrupts can have a flag telling the OS to do the operations using
> that hcall rather than directly. This can be used to workaround HW
> issues with some interrupts sources if needed.

The hcall is implemented in QEMU. It has a lot in common with the 
MMIO, that's why. For Linux, it should not require too much changes.
We could use a XIVE_IRQ_FLAG_H_INT_ESB flag in xive_poke_esb() to do 
the hcall instead of the out* calls.

xive_do_source_eoi() needs some wrapper calls around ->eoi_mmio also.

> 
>>H_INT_SYNC
> 
> This will be needed for queue accounting in some cases, such as CPU
> hotplug I think etc... For example if you mask an interrupt in the ESB,
> a sync will ensure that any previous occurrence of this interrupt has
> reached its target queue (and thus is visible in memory).

ok. The way this will be handled is still a little fuzzy for me. I need
to study the question. 

>> As for XICS, the XIVE interface for the guest is described in the
>> device tree under the interrupt controller node. A couple of new
>> properties are specific to XIVE :
>>
>>  - "reg"
>>
>>contains the base address and size of the thread interrupt
>>managnement areas (TIMA) for the user level for the OS level. Only
>>the OS level is taken into account.
>>
>>  - "ibm,xive-eq-sizes"
>>
>>the size of the event queues.
>>
>>  - "ibm,xive-lisn-ranges"
>>
>>the interrupt numbers ranges assigned to the guest. These are
>>allocated using a simple bitmap.
>>
>> This is work in progress. It was only tested with a QEMU XIVE model
>> for pseries.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/asm/hvcall.h  |  13 +-
>>  arch/powerpc/include/asm/xive.h|   1 +
>>  arch/powerpc/platforms/pseries/Kconfig |   1 +
>>  arch/powerpc/platforms/pseries/setup.c |   8 +-
>>  arch/powerpc/platforms/pseries/smp.c   |  18 +-
>>  arch/powerpc/sysdev/xive/Kconfig   |   5 +
>>  arch/powerpc/sysdev/xive/Makefile  |   1 +
>>  arch/powerpc/sysdev/xive/xive-hv.c | 523 
>> +
>>  8 files changed, 566 insertions(+), 4 deletions(-)
>>  create mode 100644 arch/powerpc/sysdev/xive/xive-hv.c
>>
>> diff --git a/arch/powerpc/include/asm/hvcall.h 
>> b/arch/powerpc/include/asm/hvcall.h
>> index d73755fafbb0..3c019e9f451a 100644
>> --- a/arch/powerpc/include/asm/hvcall.h
>> +++ b/arch/powerpc/include/asm/hvcall.h
>> @@ -280,7 +280,18 @@
>>  #define H_RESIZE_HPT_COMMIT 0x370
>>  #define H_REGISTER_PROC_TBL 0x37C
>>  #define H_SIGNAL_SYS_RESET  0x380
>> -#define MAX_HCALL_OPCODEH_SIGNAL_SYS_RESET
>> +#define H_INT_GET_SOURCE_INFO   0x3A8
>> +#define H_INT_SET_SOURCE_CONFIG 0x3AC
>> +#define H_INT_GET_SOURCE_CONFIG 0x3B0
>> +#define H_INT_GET_QUEUE_INFO0x3B4
>> +#define H_INT_SET_QUEUE_CONFIG  0x3B8
>> +#define H_INT_GET_QUEUE_CONFIG  0x3BC
>> +#define H_INT_SET_OS_REPORTING_LINE 0x3C0
>> +#define H_INT_GET_OS_REPORTING_LINE 0x3C4
>> +#define H_INT_ESB   0x3C8
>> +#define H_INT_SYNC  0x3CC
>> +#define H_INT_RESET 0x3D0
>> +#define MAX_HCALL_OPCODEH_INT_RESET
>>  
>>  /* H_VIOCTL functions */
>>  #define H_GET_VIOA_DUMP_SIZE0x01
>> diff --git a/arch/powerpc/include/asm/xive.h 
>> b/arch/powerpc/include/asm/xive.h
>> index c23ff4389ca2..c947952ed934 100644
>> --- a/arch/powerpc/include/asm/xive.h
>> +++ b/arch/powerpc/include/asm/xive.h
>> @@ -110,6 +110,7 @@ extern bool __xive_enabled;
>>  
>>  static inline bool xive_enabled(void) { return

Re: [PATCH 1/2] powerpc/powernv/pci: Add helper to check if a PE has a single vendor

2017-06-23 Thread kbuild test robot

Hi Russell,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.12-rc6 next-20170623]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Russell-Currey/powerpc-powernv-pci-Add-helper-to-check-if-a-PE-has-a-single-vendor/20170623-201801
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-ppc64_defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

Note: the 
linux-review/Russell-Currey/powerpc-powernv-pci-Add-helper-to-check-if-a-PE-has-a-single-vendor/20170623-201801
 HEAD cc13610eab0edbbdf3d9a8b29b33b1bc02859672 builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

>> arch/powerpc/platforms/powernv/pci-ioda.c:1721:13: error: 
>> 'pnv_pci_ioda_pe_single_vendor' defined but not used 
>> [-Werror=unused-function]
static bool pnv_pci_ioda_pe_single_vendor(struct pnv_ioda_pe *pe)
^
   cc1: all warnings being treated as errors

vim +/pnv_pci_ioda_pe_single_vendor +1721 
arch/powerpc/platforms/powernv/pci-ioda.c

  1715   * for physical PE: the device is already added by now;
  1716   * for virtual PE: sysfs entries are not ready yet and
  1717   * tce_iommu_bus_notifier will add the device to a group later.
  1718   */
  1719  }
  1720  
> 1721  static bool pnv_pci_ioda_pe_single_vendor(struct pnv_ioda_pe *pe)
  1722  {
  1723  unsigned short vendor = 0;
  1724  struct pci_dev *pdev;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[PATCH 1/2] mmc/host: add FSP2(ppc476fpe) into depends for sdhci-st

2017-06-23 Thread Ivan Mikhaylov

* shdci-st driver can be used for ppc476 fsp2 soc

Signed-off-by: Ivan Mikhaylov 
---
 drivers/mmc/host/Kconfig |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 2eb9701..e6c0d86 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -354,8 +354,8 @@ config MMC_MOXART
 
 config MMC_SDHCI_ST
tristate "SDHCI support on STMicroelectronics SoC"
-   depends on ARCH_STI
depends on MMC_SDHCI_PLTFM
+   depends on ARCH_STI || FSP2
select MMC_SDHCI_IO_ACCESSORS
help
  This selects the Secure Digital Host Controller Interface in
-- 
1.7.1

[PATCH 2/2] 44x/fsp2: enable eMMC arasan for fsp2 platform

2017-06-23 Thread Ivan Mikhaylov

* add mmc0 section into dts for arasan
* change defconfig appropriately

Signed-off-by: Ivan Mikhaylov 
---
 arch/powerpc/boot/dts/fsp2.dts  |   19 +++
 arch/powerpc/configs/44x/fsp2_defconfig |2 ++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsp2.dts b/arch/powerpc/boot/dts/fsp2.dts
index de9d606..6a63026 100644
--- a/arch/powerpc/boot/dts/fsp2.dts
+++ b/arch/powerpc/boot/dts/fsp2.dts
@@ -52,6 +52,7 @@
clocks {
mmc_clk: mmc_clk {
compatible = "fixed-clock";
+   #clock-cells = <0>;
clock-frequency = <5000>;
clock-output-names = "mmc_clk";
};
@@ -487,6 +488,24 @@
 /*RXDE*/  4 _2 13 0x4>;
};
 
+   mmc0: sdhci@020c {
+   compatible  = "st,sdhci-stih407", "st,sdhci";
+   reg = <0x020c 0x2>;
+   reg-names   = "mmc";
+   interrupts  = <21 0x4>;
+   interrupt-parent = <_3>;
+   interrupt-names = "mmcirq";
+   pinctrl-names   = "default";
+   pinctrl-0   = <>;
+   clock-names = "mmc";
+   clocks  = <_clk>;
+   bus-width   = <4>;
+   non-removable;
+   sd-uhs-sdr50;
+   sd-uhs-sdr104;
+   sd-uhs-ddr50;
+   };
+
opb {
compatible = "ibm,opb";
#address-cells = <1>;
diff --git a/arch/powerpc/configs/44x/fsp2_defconfig 
b/arch/powerpc/configs/44x/fsp2_defconfig
index e8e6a69..935aabe 100644
--- a/arch/powerpc/configs/44x/fsp2_defconfig
+++ b/arch/powerpc/configs/44x/fsp2_defconfig
@@ -92,8 +92,10 @@ CONFIG_MMC_DEBUG=y
 CONFIG_MMC_SDHCI=y
 CONFIG_MMC_SDHCI_PLTFM=y
 CONFIG_MMC_SDHCI_OF_ARASAN=y
+CONFIG_MMC_SDHCI_ST=y
 CONFIG_RTC_CLASS=y
 CONFIG_RTC_DRV_M41T80=y
+CONFIG_RESET_CONTROLLER=y
 CONFIG_EXT2_FS=y
 CONFIG_EXT4_FS=y
 CONFIG_EXT4_FS_POSIX_ACL=y
-- 
1.7.1

[PATCH 0/2] add eMMC support for fsp2 board

2017-06-23 Thread Ivan Mikhaylov

fsp2 based on powerpc 476fpe using eMMC arasan, so we added support
of this inside our dts and also enabled via additional
dependence for sdhci-st driver in Kconfig.

this code depends on commit c4b56b023daa91953e9ebe91143e6ca156f0bcb7
which is located in powerpc linux tree in 'next' branch.

Ivan Mikhaylov (2):
  mmc/host: add FSP2(ppc476fpe) into depends for sdhci-st
  44x/fsp2: enable eMMC arasan for fsp2 platform

 arch/powerpc/boot/dts/fsp2.dts  |   19 +++
 arch/powerpc/configs/44x/fsp2_defconfig |2 ++
 drivers/mmc/host/Kconfig|2 +-
 3 files changed, 22 insertions(+), 1 deletions(-)

Re: [PATCH net-next] udp: fix poll()

2017-06-23 Thread David Miller

From: Paolo Abeni 
Date: Fri, 23 Jun 2017 14:19:51 +0200

> Michael reported an UDP breakage caused by the commit b65ac44674dd
> ("udp: try to avoid 2 cache miss on dequeue").
> The function __first_packet_length() can update the checksum bits
> of the pending skb, making the scratched area out-of-sync, and
> setting skb->csum, if the skb was previously in need of checksum
> validation.
> 
> On later recvmsg() for such skb, checksum validation will be
> invoked again - due to the wrong udp_skb_csum_unnecessary()
> value - and will fail, causing the valid skb to be dropped.
> 
> This change addresses the issue refreshing the scratch area in
> __first_packet_length() after the possible checksum update.
> 
> Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")
> Reported-by: Michael Ellerman 
> Signed-off-by: Hannes Frederic Sowa 
> Signed-off-by: Paolo Abeni 

Thanks for fixing this so quickly, applied.

Re: [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table

2017-06-23 Thread Alex Williamson

On Fri, 23 Jun 2017 15:06:37 +1000
Alexey Kardashevskiy  wrote:

> On 23/06/17 07:11, Alex Williamson wrote:
> > On Thu, 15 Jun 2017 15:48:42 +1000
> > Alexey Kardashevskiy  wrote:
> >   
> >> Here is a patchset which Yongji was working on before
> >> leaving IBM LTC. Since we still want to have this functionality
> >> in the kernel (DPDK is the first user), here is a rebase
> >> on the current upstream.
> >>
> >>
> >> Current vfio-pci implementation disallows to mmap the page
> >> containing MSI-X table in case that users can write directly
> >> to MSI-X table and generate an incorrect MSIs.
> >>
> >> However, this will cause some performance issue when there
> >> are some critical device registers in the same page as the
> >> MSI-X table. We have to handle the mmio access to these
> >> registers in QEMU emulation rather than in guest.
> >>
> >> To solve this issue, this series allows to expose MSI-X table
> >> to userspace when hardware enables the capability of interrupt
> >> remapping which can ensure that a given PCI device can only
> >> shoot the MSIs assigned for it. And we introduce a new bus_flags
> >> PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
> >> for different archs.
> >>
> >> The patch 3 are based on the proposed patchset[1].
> >>
> >> Changelog
> >> v3:
> >> - rebased on the current upstream  
> > 
> > There's something not forthcoming here, the last version I see from
> > Yongji is this one:
> > 
> > https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017245.html
> > 
> > Which was a 6-patch series where patches 2-4 tried to apply
> > PCI_BUS_FLAGS_MSI_REMAP for cases that supported other platforms.  That
> > doesn't exist here, so it's not simply a rebase.  Patch 1/ seems to
> > equate this new flag to the IOMMU capability IOMMU_CAP_INTR_REMAP, but
> > nothing is done here to match them together.  That patch also mentions
> > the work Eric has done for similar features on ARM, but again those
> > patches are dropped.  It seems like an incomplete feature now.  Thanks,  
> 
> 
> Thanks! I suspected this is not the latest but could not find anything
> better than we use internally for tests, and I could not reach Yongji for
> comments whether this was the latest update.
> 
> As I am reading the patches, I notice that the "msi remap" term is used all
> over the place. While this remapping capability may be the case for x86/arm
> (and therefore the IOMMU_CAP_INTR_REMAP flag makes sense), powernv does not
> do remapping but provides hardware isolation. When we are allowing MSIX BAR
> mapping to the userspace - the isolation is what we really care about. Will
> it make sense to rename PCI_BUS_FLAGS_MSI_REMAP to
> PCI_BUS_FLAGS_MSI_ISOLATED ?

I don't have a strong opinion either way, so long as it's fully
described what the flag indicates.

> Another thing - the patchset enables PCI_BUS_FLAGS_MSI_REMAP when IOMMU
> just advertises IOMMU_CAP_INTR_REMAP, not necessarily uses it, should the
> patchset actually look at something like irq_remapping_enabled in
> drivers/iommu/amd_iommu.c instead?

Interrupt remapping being enabled is implicit in IOMMU_CAP_INTR_REMAP,
neither intel or amd iommu export the capability unless enabled.
Nobody cares if it's supported but not enabled.  Thanks,

Alex

> >> v2:
> >> - Make the commit log more clear
> >> - Replace pci_bus_check_msi_remapping() with pci_bus_msi_isolated()
> >>   so that we could clearly know what the function does
> >> - Set PCI_BUS_FLAGS_MSI_REMAP in pci_create_root_bus() instead
> >>   of iommu_bus_notifier()
> >> - Reserve VFIO_REGION_INFO_FLAG_CAPS when we allow to mmap MSI-X
> >>   table so that we can know whether we allow to mmap MSI-X table
> >>   in QEMU
> >>
> >> [1] 
> >> https://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg1138820.html
> >>
> >>
> >> This is based on sha1
> >> 63f700aab4c1 Linus Torvalds "Merge tag 'xtensa-20170612' of 
> >> git://github.com/jcmvbkbc/linux-xtensa".
> >>
> >> Please comment. Thanks.
> >>
> >>
> >>
> >> Yongji Xie (3):
> >>   PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag
> >>   pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
> >>   vfio-pci: Allow to expose MSI-X table to userspace if interrupt
> >> remapping is enabled
> >>
> >>  include/linux/pci.h   |  1 +
> >>  arch/powerpc/platforms/powernv/pci-ioda.c |  8 
> >>  drivers/vfio/pci/vfio_pci.c   | 18 +++---
> >>  drivers/vfio/pci/vfio_pci_rdwr.c  |  3 ++-
> >>  4 files changed, 26 insertions(+), 4 deletions(-)
> >>  
> >   
> 
>

Re: [kernel-hardening] [PATCH 2/4] arm64: Reduce ELF_ET_DYN_BASE

2017-06-23 Thread Ard Biesheuvel

On 23 June 2017 at 14:02, Kees Cook  wrote:
> On Fri, Jun 23, 2017 at 6:52 AM, Kees Cook  wrote:
>> On Thu, Jun 22, 2017 at 11:57 PM, Ard Biesheuvel
>>  wrote:
>>> Hi Kees,
>>>
>>> On 22 June 2017 at 18:06, Kees Cook  wrote:
 Now that explicitly executed loaders are loaded in the mmap region,
 position PIE binaries lower in the address space to avoid possible
 collisions with mmap or stack regions. For 64-bit, align to 4GB to
 allow runtimes to use the entire 32-bit address space for 32-bit
 pointers.

 Signed-off-by: Kees Cook 
 ---
  arch/arm64/include/asm/elf.h | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)

 diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
 index 5d1700425efe..f742af8f7c42 100644
 --- a/arch/arm64/include/asm/elf.h
 +++ b/arch/arm64/include/asm/elf.h
 @@ -113,12 +113,13 @@
  #define ELF_EXEC_PAGESIZE  PAGE_SIZE

  /*
 - * This is the location that an ET_DYN program is loaded if exec'ed.  
 Typical
 - * use of this is to invoke "./ld.so someprog" to test out a new version 
 of
 - * the loader.  We need to make sure that it is out of the way of the 
 program
 - * that it will "exec", and that there is sufficient room for the brk.
 + * This is the base location for PIE (ET_DYN with INTERP) loads. On
 + * 64-bit, this is raised to 4GB to leave the entire 32-bit address
 + * space open for things that want to use the area for 32-bit pointers.
   */
 -#define ELF_ET_DYN_BASE(2 * TASK_SIZE_64 / 3)
 +#define ELF_ET_DYN_BASE(test_thread_flag(TIF_32BIT) ?  \
 +   0x00040UL : \
 +   0x1UL)

>>>
>>> Why are you merging this with the COMPAT definition?
>>
>> It seemed like the right thing to do since a single definition could
>> handle both cases. Is there something I'm overlooking in the arm64
>> case?
>
> And like 5 minutes later there was a loud head-slapping noise in my
> office. Durr, yeah, arm64 doesn't have to deal with a "native 32-bit"
> mode, so the merge isn't needed. Yes yes, I will split it back up and
> drop the thread flag test.
>

Oh, is that what I heard?

Re: [kernel-hardening] [PATCH 2/4] arm64: Reduce ELF_ET_DYN_BASE

2017-06-23 Thread Kees Cook

On Fri, Jun 23, 2017 at 6:52 AM, Kees Cook  wrote:
> On Thu, Jun 22, 2017 at 11:57 PM, Ard Biesheuvel
>  wrote:
>> Hi Kees,
>>
>> On 22 June 2017 at 18:06, Kees Cook  wrote:
>>> Now that explicitly executed loaders are loaded in the mmap region,
>>> position PIE binaries lower in the address space to avoid possible
>>> collisions with mmap or stack regions. For 64-bit, align to 4GB to
>>> allow runtimes to use the entire 32-bit address space for 32-bit
>>> pointers.
>>>
>>> Signed-off-by: Kees Cook 
>>> ---
>>>  arch/arm64/include/asm/elf.h | 13 ++---
>>>  1 file changed, 6 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
>>> index 5d1700425efe..f742af8f7c42 100644
>>> --- a/arch/arm64/include/asm/elf.h
>>> +++ b/arch/arm64/include/asm/elf.h
>>> @@ -113,12 +113,13 @@
>>>  #define ELF_EXEC_PAGESIZE  PAGE_SIZE
>>>
>>>  /*
>>> - * This is the location that an ET_DYN program is loaded if exec'ed.  
>>> Typical
>>> - * use of this is to invoke "./ld.so someprog" to test out a new version of
>>> - * the loader.  We need to make sure that it is out of the way of the 
>>> program
>>> - * that it will "exec", and that there is sufficient room for the brk.
>>> + * This is the base location for PIE (ET_DYN with INTERP) loads. On
>>> + * 64-bit, this is raised to 4GB to leave the entire 32-bit address
>>> + * space open for things that want to use the area for 32-bit pointers.
>>>   */
>>> -#define ELF_ET_DYN_BASE(2 * TASK_SIZE_64 / 3)
>>> +#define ELF_ET_DYN_BASE(test_thread_flag(TIF_32BIT) ?  \
>>> +   0x00040UL : \
>>> +   0x1UL)
>>>
>>
>> Why are you merging this with the COMPAT definition?
>
> It seemed like the right thing to do since a single definition could
> handle both cases. Is there something I'm overlooking in the arm64
> case?

And like 5 minutes later there was a loud head-slapping noise in my
office. Durr, yeah, arm64 doesn't have to deal with a "native 32-bit"
mode, so the merge isn't needed. Yes yes, I will split it back up and
drop the thread flag test.

Thanks!

-Kees

>>>  #ifndef __ASSEMBLY__
>>>
>>> @@ -173,8 +174,6 @@ extern int arch_setup_additional_pages(struct 
>>> linux_binprm *bprm,
>>>
>>>  #ifdef CONFIG_COMPAT
>>>
>>> -#define COMPAT_ELF_ET_DYN_BASE (2 * TASK_SIZE_32 / 3)
>>> -
>>>  /* AArch32 registers. */
>>>  #define COMPAT_ELF_NGREG   18
>>>  typedef unsigned int   compat_elf_greg_t;
>>> --
>>> 2.7.4
>>>
>
>
>
> --
> Kees Cook
> Pixel Security



-- 
Kees Cook
Pixel Security

Re: [kernel-hardening] [PATCH 2/4] arm64: Reduce ELF_ET_DYN_BASE

2017-06-23 Thread Kees Cook

On Thu, Jun 22, 2017 at 11:57 PM, Ard Biesheuvel
 wrote:
> Hi Kees,
>
> On 22 June 2017 at 18:06, Kees Cook  wrote:
>> Now that explicitly executed loaders are loaded in the mmap region,
>> position PIE binaries lower in the address space to avoid possible
>> collisions with mmap or stack regions. For 64-bit, align to 4GB to
>> allow runtimes to use the entire 32-bit address space for 32-bit
>> pointers.
>>
>> Signed-off-by: Kees Cook 
>> ---
>>  arch/arm64/include/asm/elf.h | 13 ++---
>>  1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
>> index 5d1700425efe..f742af8f7c42 100644
>> --- a/arch/arm64/include/asm/elf.h
>> +++ b/arch/arm64/include/asm/elf.h
>> @@ -113,12 +113,13 @@
>>  #define ELF_EXEC_PAGESIZE  PAGE_SIZE
>>
>>  /*
>> - * This is the location that an ET_DYN program is loaded if exec'ed.  
>> Typical
>> - * use of this is to invoke "./ld.so someprog" to test out a new version of
>> - * the loader.  We need to make sure that it is out of the way of the 
>> program
>> - * that it will "exec", and that there is sufficient room for the brk.
>> + * This is the base location for PIE (ET_DYN with INTERP) loads. On
>> + * 64-bit, this is raised to 4GB to leave the entire 32-bit address
>> + * space open for things that want to use the area for 32-bit pointers.
>>   */
>> -#define ELF_ET_DYN_BASE(2 * TASK_SIZE_64 / 3)
>> +#define ELF_ET_DYN_BASE(test_thread_flag(TIF_32BIT) ?  \
>> +   0x00040UL : \
>> +   0x1UL)
>>
>
> Why are you merging this with the COMPAT definition?

It seemed like the right thing to do since a single definition could
handle both cases. Is there something I'm overlooking in the arm64
case?

-Kees

>
>>  #ifndef __ASSEMBLY__
>>
>> @@ -173,8 +174,6 @@ extern int arch_setup_additional_pages(struct 
>> linux_binprm *bprm,
>>
>>  #ifdef CONFIG_COMPAT
>>
>> -#define COMPAT_ELF_ET_DYN_BASE (2 * TASK_SIZE_32 / 3)
>> -
>>  /* AArch32 registers. */
>>  #define COMPAT_ELF_NGREG   18
>>  typedef unsigned int   compat_elf_greg_t;
>> --
>> 2.7.4
>>



-- 
Kees Cook
Pixel Security

[PATCH net-next] udp: fix poll()

2017-06-23 Thread Paolo Abeni

Michael reported an UDP breakage caused by the commit b65ac44674dd
("udp: try to avoid 2 cache miss on dequeue").
The function __first_packet_length() can update the checksum bits
of the pending skb, making the scratched area out-of-sync, and
setting skb->csum, if the skb was previously in need of checksum
validation.

On later recvmsg() for such skb, checksum validation will be
invoked again - due to the wrong udp_skb_csum_unnecessary()
value - and will fail, causing the valid skb to be dropped.

This change addresses the issue refreshing the scratch area in
__first_packet_length() after the possible checksum update.

Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")
Reported-by: Michael Ellerman 
Signed-off-by: Hannes Frederic Sowa 
Signed-off-by: Paolo Abeni 
---
 net/ipv4/udp.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 067a607..47c7aa0 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1446,16 +1446,23 @@ static struct sk_buff *__first_packet_length(struct 
sock *sk,
 {
struct sk_buff *skb;
 
-   while ((skb = skb_peek(rcvq)) != NULL &&
-  udp_lib_checksum_complete(skb)) {
-   __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
-   IS_UDPLITE(sk));
-   __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
-   IS_UDPLITE(sk));
-   atomic_inc(>sk_drops);
-   __skb_unlink(skb, rcvq);
-   *total += skb->truesize;
-   kfree_skb(skb);
+   while ((skb = skb_peek(rcvq)) != NULL) {
+   if (udp_lib_checksum_complete(skb)) {
+   __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
+   IS_UDPLITE(sk));
+   __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
+   IS_UDPLITE(sk));
+   atomic_inc(>sk_drops);
+   __skb_unlink(skb, rcvq);
+   *total += skb->truesize;
+   kfree_skb(skb);
+   } else {
+   /* the csum related bits could be changed, refresh
+* the scratch area
+*/
+   udp_set_dev_scratch(skb);
+   break;
+   }
}
return skb;
 }
-- 
2.9.4

Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)

2017-06-23 Thread Paolo Abeni

On Fri, 2017-06-23 at 16:59 +1000, Michael Ellerman wrote:
> Hannes Frederic Sowa  writes:
> 
> > On Thu, Jun 22, 2017, at 22:57, Paolo Abeni wrote:
> > > 
> > > Can you please check if the following patch fixes the issue? Only
> > > compiled tested here.
> > > 
> > > Thanks!!!
> > > ---
> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > index 067a607..80d89fe 100644
> > > --- a/net/ipv4/udp.c
> > > +++ b/net/ipv4/udp.c
> > > @@ -1446,16 +1446,19 @@ static struct sk_buff
> > > *__first_packet_length(struct sock *sk,
> > >  {
> > >   struct sk_buff *skb;
> > >  
> > > -   while ((skb = skb_peek(rcvq)) != NULL &&
> > > -      udp_lib_checksum_complete(skb)) {
> > > -   __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
> > > -   IS_UDPLITE(sk));
> > > -   __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
> > > -   IS_UDPLITE(sk));
> > > -   atomic_inc(>sk_drops);
> > > -   __skb_unlink(skb, rcvq);
> > > -   *total += skb->truesize;
> > > -   kfree_skb(skb);
> > > +   while ((skb = skb_peek(rcvq)) != NULL) {
> > > +   if (udp_lib_checksum_complete(skb)) {
> > > +   __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
> > > +   IS_UDPLITE(sk));
> > > +   __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
> > > +   IS_UDPLITE(sk));
> > > +   atomic_inc(>sk_drops);
> > > +   __skb_unlink(skb, rcvq);
> > > +   *total += skb->truesize;
> > > +   kfree_skb(skb);
> > > +   } else {
> > > +   udp_set_dev_scratch(skb);
> > 
> > It needs a "break;" here.
> > 
> > > +   }
> > >   }
> > >   return skb;
> > >  }
> 
> That works!
> 
> $ wget google.com
> --2017-06-23 16:56:31--  http://google.com/
> Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
> Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
> Proxy request sent, awaiting response... 302 Found
> Location: http://www.google.com.au/?gfe_rd=cr=n7tMWeb9JYPr8wfg4LXYAQ 
> [following]
> --2017-06-23 16:56:31--  
> http://www.google.com.au/?gfe_rd=cr=n7tMWeb9JYPr8wfg4LXYAQ
> Reusing existing connection to proxy.pmdw.com:3128.
> Proxy request sent, awaiting response... 200 OK
> Length: unspecified [text/html]
> Saving to: ‘index.html’
> 
> 
> The patch had whitespace issues or something and I had to apply it by
> hand, here's what I actually tested.

Thank you!

I'll submit formally the patch after some more testing.

I noticed this version has entered the ppc patchwork, but I think that
the formal submission should go towards the net-next tree.

Cheers,

Paolo

Re: [PATCH] powerpc: Invalidate ERAT on powersave wakeup for POWER9

2017-06-23 Thread Benjamin Herrenschmidt

On Fri, 2017-06-23 at 19:33 +1000, Michael Ellerman wrote:
> Michael Neuling  writes:
> 
> > On POWER9 the ERAT may be incorrect on wakeup from some stop states
> > that lose state. This causes random segvs and illegal instructions
> > when these stop states are enabled.
> 
> Incorrect how?

As in stale. Not sure about the details.

> Because with the ERAT flush where you've put it, there's still a good
> amount of code executed prior to the flush isn't there?

In real mode, should be ok.

> ie. we come in at 0x100, do some of the prolog, do IDLE_TEST which takes
> us to pnv_powersave_wakeup, which then restores state from the paca
> (memory), that returns and then we check KVM ... and then finally we end
> up at pnv_wakeup_loss.
> 
> Or is there some other path? Or is the ERAT incorrect in some specific
> way which means we only need to flush there?

I think real mode translations are ok but I'll ask around.

Cheers,
Ben.

> cheers
> 
> > diff --git a/arch/powerpc/kernel/idle_book3s.S 
> > b/arch/powerpc/kernel/idle_book3s.S
> > index 1ea14b96f1..ace2ad50c8 100644
> > --- a/arch/powerpc/kernel/idle_book3s.S
> > +++ b/arch/powerpc/kernel/idle_book3s.S
> > @@ -793,6 +793,9 @@ fastsleep_workaround_at_exit:
> >   */
> >  .global pnv_wakeup_loss
> >  pnv_wakeup_loss:
> > +BEGIN_FTR_SECTION
> > +   PPC_INVALIDATE_ERAT
> > +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
> > ld  r1,PACAR1(r13)
> >  BEGIN_FTR_SECTION
> > CHECK_HMI_INTERRUPT
> > -- 
> > 2.11.0

Re: [PATCH] powerpc: Invalidate ERAT on powersave wakeup for POWER9

2017-06-23 Thread Nicholas Piggin

On Fri, 23 Jun 2017 19:33:23 +1000
Michael Ellerman  wrote:

> Michael Neuling  writes:
> 
> > On POWER9 the ERAT may be incorrect on wakeup from some stop states
> > that lose state. This causes random segvs and illegal instructions
> > when these stop states are enabled.  
> 
> Incorrect how?

It can have stale ERAT entries from another idle thread.

> 
> Because with the ERAT flush where you've put it, there's still a good
> amount of code executed prior to the flush isn't there?
> 
> ie. we come in at 0x100, do some of the prolog, do IDLE_TEST which takes
> us to pnv_powersave_wakeup, which then restores state from the paca
> (memory), that returns and then we check KVM ... and then finally we end
> up at pnv_wakeup_loss.

In the case of an HMI, we could call into OPAL as well.

> Or is there some other path? Or is the ERAT incorrect in some specific
> way which means we only need to flush there?

I think we're in real mode until returning from pnv_wakeup_loss so those
ERATs should be the same.

Except KVM, which can go to guest and switch on the MMU. My bad, I
suggested putting it into pnv_wakeup_loss.

Flushing at the start of pnv_powersave_wakeup should be safest. I guess
we can avoid it for non-state-loss wakeups if cr3 is lt.

Thanks,
Nick

Re: [PATCH] powerpc: Invalidate ERAT on powersave wakeup for POWER9

2017-06-23 Thread Anshuman Khandual

On 06/22/2017 10:56 PM, Michael Neuling wrote:
> On POWER9 the ERAT may be incorrect on wakeup from some stop states
> that lose state. This causes random segvs and illegal instructions
> when these stop states are enabled.
> 
> This patch invalidates the ERAT on wakeup on POWER9 to prevent this
> from causing a problem.

Cant there be any real genuine ERAT error on the wake up path
from these states ? Just being curious.

Re: [PATCH] powerpc: Invalidate ERAT on powersave wakeup for POWER9

2017-06-23 Thread Michael Ellerman

Michael Neuling  writes:

> On POWER9 the ERAT may be incorrect on wakeup from some stop states
> that lose state. This causes random segvs and illegal instructions
> when these stop states are enabled.

Incorrect how?

Because with the ERAT flush where you've put it, there's still a good
amount of code executed prior to the flush isn't there?

ie. we come in at 0x100, do some of the prolog, do IDLE_TEST which takes
us to pnv_powersave_wakeup, which then restores state from the paca
(memory), that returns and then we check KVM ... and then finally we end
up at pnv_wakeup_loss.

Or is there some other path? Or is the ERAT incorrect in some specific
way which means we only need to flush there?

cheers

> diff --git a/arch/powerpc/kernel/idle_book3s.S 
> b/arch/powerpc/kernel/idle_book3s.S
> index 1ea14b96f1..ace2ad50c8 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -793,6 +793,9 @@ fastsleep_workaround_at_exit:
>   */
>  .global pnv_wakeup_loss
>  pnv_wakeup_loss:
> +BEGIN_FTR_SECTION
> + PPC_INVALIDATE_ERAT
> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
>   ld  r1,PACAR1(r13)
>  BEGIN_FTR_SECTION
>   CHECK_HMI_INTERRUPT
> -- 
> 2.11.0

Re: powerpc/time: Fix tracing in time.c

2017-06-23 Thread Michael Ellerman

On Tue, 2017-06-20 at 07:44:47 UTC, Santosh Sivaraj wrote:
> Since trace_clock is in a different file and already marked with notrace,
> enable tracing in time.c by removing it from the disabled list in Makefile.
> Also annotate clocksource read functions and sched_clock with notrace.
> 
> Testing: Timer and ftrace selftests run with different trace clocks.
> 
> Acked-by: Naveen N. Rao 
> Signed-off-by: Santosh Sivaraj 

Applied to powerpc next-test, thanks.

https://git.kernel.org/powerpc/c/6b847d795cf4ab3e574f4fcf7193fe

cheers

Re: powerpc: Convert VDSO update function to use new update_vsyscall interface

2017-06-23 Thread Michael Ellerman

On Sat, 2017-05-27 at 08:04:52 UTC, Paul Mackerras wrote:
> This converts the powerpc VDSO time update function to use the new
> interface introduced in commit 576094b7f0aa ("time: Introduce new
> GENERIC_TIME_VSYSCALL", 2012-09-11).  Where the old interface gave
> us the time as of the last update in seconds and whole nanoseconds,
> with the new interface we get the nanoseconds part effectively in
> a binary fixed-point format with tk->tkr_mono.shift bits to the
> right of the binary point.
> 
> With the old interface, the fractional nanoseconds got truncated,
> meaning that the value returned by the VDSO clock_gettime function
> would have about 1ns of jitter in it compared to the value computed
> by the generic timekeeping code in the kernel.
> 
> The powerpc VDSO time functions (clock_gettime and gettimeofday)
> already work in units of 2^-32 seconds, or 0.23283 ns, because that
> makes it simple to split the result into seconds and fractional
> seconds, and represent the fractional seconds in either microseconds
> or nanoseconds.  This is good enough accuracy for now, so this patch
> avoids changing how the VDSO works or the interface in the VDSO data
> page.
> 
> This patch converts the powerpc update_vsyscall_old to be called
> update_vsyscall and use the new interface.  We convert the fractional
> second to units of 2^-32 seconds without truncating to whole nanoseconds.
> (There is still a conversion to whole nanoseconds for any legacy users
> of the vdso_data/systemcfg stamp_xtime field.)
> 
> In addition, this improves the accuracy of the computation of tb_to_xs
> for those systems with high-frequency timebase clocks (>= 268.5 MHz)
> by doing the right shift in two parts, one before the multiplication and
> one after, rather than doing the right shift before the multiplication.
> (We can't do all of the right shift after the multiplication unless we
> use 128-bit arithmetic.)
> 
> Signed-off-by: Paul Mackerras 
> Acked-by: John Stultz 

Applied to powerpc next-test, thanks.

https://git.kernel.org/powerpc/c/d4cfb11387ee29ba4626546c676fd2

cheers

[PATCH v3 6/6] powerpc/mm: Enable ZONE_DEVICE on powerpc

2017-06-23 Thread Oliver O'Halloran

Flip the switch. Running around and screaming "IT'S ALIVE" is optional,
but recommended.

Signed-off-by: Oliver O'Halloran 
---
v3: Only select when building for 64bit Book3-S
---
 arch/powerpc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index bf4391d18923..978f371bb103 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -138,6 +138,7 @@ config PPC
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UBSAN_SANITIZE_ALL
+   select ARCH_HAS_ZONE_DEVICE if PPC_BOOK3S_64
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
-- 
2.9.4

[PATCH v3 5/6] powerpc/mm: Wire up hpte_removebolted for powernv

2017-06-23 Thread Oliver O'Halloran

From: Anton Blanchard 

Adds support for removing bolted (i.e kernel linear mapping) mappings on
powernv. This is needed to support memory hot unplug operations which
are required for the teardown of DAX/PMEM devices.

Reviewed-by: Balbir Singh 
Reviewed-by: Rashmica Gupta 
Signed-off-by: Anton Blanchard 
Signed-off-by: Oliver O'Halloran 
---
v1 -> v2: Fixed the commit author
  Added VM_WARN_ON() if we attempt to remove an unbolted hpte
---
 arch/powerpc/mm/hash_native_64.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 65bb8f33b399..b534d041cfe8 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -407,6 +407,38 @@ static void native_hpte_updateboltedpp(unsigned long 
newpp, unsigned long ea,
tlbie(vpn, psize, psize, ssize, 0);
 }
 
+/*
+ * Remove a bolted kernel entry. Memory hotplug uses this.
+ *
+ * No need to lock here because we should be the only user.
+ */
+static int native_hpte_removebolted(unsigned long ea, int psize, int ssize)
+{
+   unsigned long vpn;
+   unsigned long vsid;
+   long slot;
+   struct hash_pte *hptep;
+
+   vsid = get_kernel_vsid(ea, ssize);
+   vpn = hpt_vpn(ea, vsid, ssize);
+
+   slot = native_hpte_find(vpn, psize, ssize);
+   if (slot == -1)
+   return -ENOENT;
+
+   hptep = htab_address + slot;
+
+   VM_WARN_ON(!(be64_to_cpu(hptep->v) & HPTE_V_BOLTED));
+
+   /* Invalidate the hpte */
+   hptep->v = 0;
+
+   /* Invalidate the TLB */
+   tlbie(vpn, psize, psize, ssize, 0);
+   return 0;
+}
+
+
 static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
   int bpsize, int apsize, int ssize, int local)
 {
@@ -725,6 +757,7 @@ void __init hpte_init_native(void)
mmu_hash_ops.hpte_invalidate= native_hpte_invalidate;
mmu_hash_ops.hpte_updatepp  = native_hpte_updatepp;
mmu_hash_ops.hpte_updateboltedpp = native_hpte_updateboltedpp;
+   mmu_hash_ops.hpte_removebolted = native_hpte_removebolted;
mmu_hash_ops.hpte_insert= native_hpte_insert;
mmu_hash_ops.hpte_remove= native_hpte_remove;
mmu_hash_ops.hpte_clear_all = native_hpte_clear;
-- 
2.9.4

[PATCH v3 4/6] powerpc/mm: Add devmap support for ppc64

2017-06-23 Thread Oliver O'Halloran

Add support for the devmap bit on PTEs and PMDs for PPC64 Book3S.  This
is used to differentiate device backed memory from transparent huge
pages since they are handled in more or less the same manner by the core
mm code.

Cc: Aneesh Kumar K.V 
Signed-off-by: Oliver O'Halloran 
---
v1 -> v2: Properly differentiate THP and PMD Devmap entries. The
mm core assumes that pmd_trans_huge() and pmd_devmap() are mutually
exclusive and v1 had pmd_trans_huge() being true on a devmap pmd.

v2 -> v3:
Remove setting of _PAGE_SPECIAL in pmd_mkdevmap()
Make pud_pfn() a BUILD_BUG()
Remove unnecessary _PAGE_DEVMAP check in hash__pmd_trans_huge()
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 46 +++-
 arch/powerpc/include/asm/book3s/64/radix.h   |  2 +-
 arch/powerpc/include/asm/string.h|  1 +
 arch/powerpc/mm/hugetlbpage.c|  2 +-
 arch/powerpc/mm/pgtable-book3s64.c   |  4 +--
 arch/powerpc/mm/pgtable-hash64.c |  4 ++-
 arch/powerpc/mm/pgtable-radix.c  |  3 +-
 arch/powerpc/mm/pgtable_64.c |  2 +-
 8 files changed, 56 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 85bc9875c3be..54b51e0fbe85 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 #include 
+#include 
 #endif
 
 /*
@@ -79,6 +80,9 @@
 
 #define _PAGE_SOFT_DIRTY   _RPAGE_SW3 /* software: software dirty tracking 
*/
 #define _PAGE_SPECIAL  _RPAGE_SW2 /* software: special page */
+#define _PAGE_DEVMAP   _RPAGE_SW1
+#define __HAVE_ARCH_PTE_DEVMAP
+
 /*
  * Drivers request for cache inhibited pte mapping using _PAGE_NO_CACHE
  * Instead of fixing all of them, add an alternate define which
@@ -599,6 +603,16 @@ static inline pte_t pte_mkhuge(pte_t pte)
return pte;
 }
 
+static inline pte_t pte_mkdevmap(pte_t pte)
+{
+   return __pte(pte_val(pte) | _PAGE_SPECIAL|_PAGE_DEVMAP);
+}
+
+static inline int pte_devmap(pte_t pte)
+{
+   return !!(pte_raw(pte) & cpu_to_be64(_PAGE_DEVMAP));
+}
+
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
/* FIXME!! check whether this need to be a conditional */
@@ -1137,7 +1151,6 @@ static inline int pmd_move_must_withdraw(struct spinlock 
*new_pmd_ptl,
return true;
 }
 
-
 #define arch_needs_pgtable_deposit arch_needs_pgtable_deposit
 static inline bool arch_needs_pgtable_deposit(void)
 {
@@ -1146,6 +1159,37 @@ static inline bool arch_needs_pgtable_deposit(void)
return true;
 }
 
+static inline pmd_t pmd_mkdevmap(pmd_t pmd)
+{
+   return __pmd(pmd_val(pmd) | (_PAGE_PTE | _PAGE_DEVMAP));
+}
+
+static inline int pmd_devmap(pmd_t pmd)
+{
+   return pte_devmap(pmd_pte(pmd));
+}
+
+static inline int pud_devmap(pud_t pud)
+{
+   return 0;
+}
+
+static inline const int pud_pfn(pud_t pud)
+{
+   /*
+* Calls to pud_pfn() are gated around a pud_devmap() check
+* so this should never be used. If it grows another user we
+* want to know about it.
+*/
+   BUILD_BUG();
+   return 0;
+}
+
+static inline int pgd_devmap(pgd_t pgd)
+{
+   return 0;
+}
+
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index ac16d1943022..ba43754e96d2 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -252,7 +252,7 @@ static inline int radix__pgd_bad(pgd_t pgd)
 
 static inline int radix__pmd_trans_huge(pmd_t pmd)
 {
-   return !!(pmd_val(pmd) & _PAGE_PTE);
+   return (pmd_val(pmd) & (_PAGE_PTE | _PAGE_DEVMAP)) == _PAGE_PTE;
 }
 
 static inline pmd_t radix__pmd_mkhuge(pmd_t pmd)
diff --git a/arch/powerpc/include/asm/string.h 
b/arch/powerpc/include/asm/string.h
index da3cdffca440..ef7c73cf7288 100644
--- a/arch/powerpc/include/asm/string.h
+++ b/arch/powerpc/include/asm/string.h
@@ -10,6 +10,7 @@
 #define __HAVE_ARCH_MEMMOVE
 #define __HAVE_ARCH_MEMCMP
 #define __HAVE_ARCH_MEMCHR
+#define __HAVE_ARCH_MEMCPY_FLUSHCACHE
 
 extern char * strcpy(char *,const char *);
 extern char * strncpy(char *,const char *, __kernel_size_t);
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index a4f33de4008e..d9958af5c98e 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -963,7 +963,7 @@ pte_t *__find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned 
long ea,
if (pmd_none(pmd))
return NULL;
 
-   if (pmd_trans_huge(pmd)) {
+   if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
if (is_thp)

[PATCH v3 3/6] powerpc/vmemmap: Add altmap support

2017-06-23 Thread Oliver O'Halloran

Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An
altmap is a driver provided region that is used to provide the backing
storage for the struct pages of ZONE_DEVICE memory. In situations where
large amount of ZONE_DEVICE memory is being added to the system the
altmap reduces pressure on main system memory by allowing the mm/
metadata to be stored on the device itself rather in main memory.

Reviewed-by: Balbir Singh 
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/mm/init_64.c | 15 +--
 arch/powerpc/mm/mem.c | 16 +---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 8851e4f5dbab..225fbb8034e6 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -171,13 +172,17 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node)
pr_debug("vmemmap_populate %lx..%lx, node %d\n", start, end, node);
 
for (; start < end; start += page_size) {
+   struct vmem_altmap *altmap;
void *p;
int rc;
 
if (vmemmap_populated(start, page_size))
continue;
 
-   p = vmemmap_alloc_block(page_size, node);
+   /* altmap lookups only work at section boundaries */
+   altmap = to_vmem_altmap(SECTION_ALIGN_DOWN(start));
+
+   p =  __vmemmap_alloc_block_buf(page_size, node, altmap);
if (!p)
return -ENOMEM;
 
@@ -242,6 +247,8 @@ void __ref vmemmap_free(unsigned long start, unsigned long 
end)
 
for (; start < end; start += page_size) {
unsigned long nr_pages, addr;
+   struct vmem_altmap *altmap;
+   struct page *section_base;
struct page *page;
 
/*
@@ -257,9 +264,13 @@ void __ref vmemmap_free(unsigned long start, unsigned long 
end)
continue;
 
page = pfn_to_page(addr >> PAGE_SHIFT);
+   section_base = pfn_to_page(vmemmap_section_start(start));
nr_pages = 1 << page_order;
 
-   if (PageReserved(page)) {
+   altmap = to_vmem_altmap((unsigned long) section_base);
+   if (altmap) {
+   vmem_altmap_free(altmap, nr_pages);
+   } else if (PageReserved(page)) {
/* allocated from bootmem */
if (page_size < PAGE_SIZE) {
/*
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 9ee536ec0739..2c0c16f11eee 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -159,11 +160,20 @@ int arch_remove_memory(u64 start, u64 size)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
-   struct zone *zone;
+   struct vmem_altmap *altmap;
+   struct page *page;
int ret;
 
-   zone = page_zone(pfn_to_page(start_pfn));
-   ret = __remove_pages(zone, start_pfn, nr_pages);
+   /*
+* If we have an altmap then we need to skip over any reserved PFNs
+* when querying the zone.
+*/
+   page = pfn_to_page(start_pfn);
+   altmap = to_vmem_altmap((unsigned long) page);
+   if (altmap)
+   page += vmem_altmap_offset(altmap);
+
+   ret = __remove_pages(page_zone(page), start_pfn, nr_pages);
if (ret)
return ret;
 
-- 
2.9.4

[PATCH v3 2/6] powerpc/vmemmap: Reshuffle vmemmap_free()

2017-06-23 Thread Oliver O'Halloran

Removes an indentation level and shuffles some code around to make the
following patch cleaner. No functional changes.

Signed-off-by: Oliver O'Halloran 
---
v1 -> v2: Remove broken initialiser
---
 arch/powerpc/mm/init_64.c | 48 ---
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index ec84b31c6c86..8851e4f5dbab 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -234,13 +234,15 @@ static unsigned long vmemmap_list_free(unsigned long 
start)
 void __ref vmemmap_free(unsigned long start, unsigned long end)
 {
unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
+   unsigned long page_order = get_order(page_size);
 
start = _ALIGN_DOWN(start, page_size);
 
pr_debug("vmemmap_free %lx...%lx\n", start, end);
 
for (; start < end; start += page_size) {
-   unsigned long addr;
+   unsigned long nr_pages, addr;
+   struct page *page;
 
/*
 * the section has already be marked as invalid, so
@@ -251,29 +253,29 @@ void __ref vmemmap_free(unsigned long start, unsigned 
long end)
continue;
 
addr = vmemmap_list_free(start);
-   if (addr) {
-   struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
-
-   if (PageReserved(page)) {
-   /* allocated from bootmem */
-   if (page_size < PAGE_SIZE) {
-   /*
-* this shouldn't happen, but if it is
-* the case, leave the memory there
-*/
-   WARN_ON_ONCE(1);
-   } else {
-   unsigned int nr_pages =
-   1 << get_order(page_size);
-   while (nr_pages--)
-   free_reserved_page(page++);
-   }
-   } else
-   free_pages((unsigned long)(__va(addr)),
-   get_order(page_size));
-
-   vmemmap_remove_mapping(start, page_size);
+   if (!addr)
+   continue;
+
+   page = pfn_to_page(addr >> PAGE_SHIFT);
+   nr_pages = 1 << page_order;
+
+   if (PageReserved(page)) {
+   /* allocated from bootmem */
+   if (page_size < PAGE_SIZE) {
+   /*
+* this shouldn't happen, but if it is
+* the case, leave the memory there
+*/
+   WARN_ON_ONCE(1);
+   } else {
+   while (nr_pages--)
+   free_reserved_page(page++);
+   }
+   } else {
+   free_pages((unsigned long)(__va(addr)), page_order);
}
+
+   vmemmap_remove_mapping(start, page_size);
}
 }
 #endif
-- 
2.9.4

[PATCH v3 1/6] mm, x86: Add ARCH_HAS_ZONE_DEVICE to Kconfig

2017-06-23 Thread Oliver O'Halloran

Currently ZONE_DEVICE depends on X86_64 and this will get unwieldly as
new architectures (and platforms) get ZONE_DEVICE support. Move to an
arch selected Kconfig option to save us the trouble.

Cc: linux...@kvack.org
Acked-by: Ingo Molnar 
Acked-by: Balbir Singh 
Signed-off-by: Oliver O'Halloran 
---
v2: Added missing hunk.
v3: No changes
---
 arch/x86/Kconfig | 1 +
 mm/Kconfig   | 6 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0efb4c9497bc..325429a3f32f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -59,6 +59,7 @@ config X86
select ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_UBSAN_SANITIZE_ALL
+   select ARCH_HAS_ZONE_DEVICE if X86_64
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_MIGHT_HAVE_PC_PARPORT
diff --git a/mm/Kconfig b/mm/Kconfig
index beb7a455915d..790e52a8a486 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -683,12 +683,16 @@ config IDLE_PAGE_TRACKING
 
  See Documentation/vm/idle_page_tracking.txt for more details.
 
+# arch_add_memory() comprehends device memory
+config ARCH_HAS_ZONE_DEVICE
+   bool
+
 config ZONE_DEVICE
bool "Device memory (pmem, etc...) hotplug support"
depends on MEMORY_HOTPLUG
depends on MEMORY_HOTREMOVE
depends on SPARSEMEM_VMEMMAP
-   depends on X86_64 #arch_add_memory() comprehends device memory
+   depends on ARCH_HAS_ZONE_DEVICE
 
help
  Device memory hotplug support allows for establishing pmem,
-- 
2.9.4

Re: [PATCH 1/1] futex: remove duplicated code and fix UB

2017-06-23 Thread Thomas Gleixner

On Wed, 21 Jun 2017, Jiri Slaby wrote:
> diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
> index f32b42e8725d..5bb2fd4674e7 100644
> --- a/arch/arm64/include/asm/futex.h
> +++ b/arch/arm64/include/asm/futex.h
> @@ -48,20 +48,10 @@ do {  
> \
>  } while (0)
>  
>  static inline int
> -futex_atomic_op_inuser(unsigned int encoded_op, u32 __user *uaddr)

That unsigned int seems to be a change from the arm64 tree in next. It's
not upstream and it'll cause a (easy to resolve) conflict.

> +static int futex_atomic_op_inuser(unsigned int encoded_op, u32 __user *uaddr)
> +{
> + int op = (encoded_op >> 28) & 7;
> + int cmp = (encoded_op >> 24) & 15;
> + int oparg = (int)(encoded_op << 8) >> 20;
> + int cmparg = (int)(encoded_op << 20) >> 20;

So this is really bad. We have implicit and explicit type casting to
int. And while we are at it can we please stop proliferating the existing
mess.

'op' and 'cmp' definitly can be unsigned int. There is no reason to cast
them to int.

oparg, cmparg and oldval are more interesting.

The logic here is "documented" in uapi/linux/futex.h

/* FUTEX_WAKE_OP will perform atomically
   int oldval = *(int *)UADDR2;
   *(int *)UADDR2 = oldval OP OPARG;
   if (oldval CMP CMPARG)
   wake UADDR2;  */

Now the FUTEX_OP macro which is supposed to compose the encoded_up does:

#define FUTEX_OP(op, oparg, cmp, cmparg) \
  (((op & 0xf) << 28) | ((cmp & 0xf) << 24) \
   | ((oparg & 0xfff) << 12) | (cmparg & 0xfff))

Of course this all is not typed, undocumented and completely ill
defined.

> + int oparg = (int)(encoded_op << 8) >> 20;
> + int cmparg = (int)(encoded_op << 20) >> 20;

So in fact we sign expand the 12 bits of oparg and cmparg. Really
intuitive.

Yes, we probably can't change that anymore, but at least we should make it
very explicit and add a comment to that effect.

Thanks,

tglx

Re: [PATCH 3/4] powerpc: Reduce ELF_ET_DYN_BASE

2017-06-23 Thread Michael Ellerman

Kees Cook  writes:

> Now that explicitly executed loaders are loaded in the mmap region,
> position PIE binaries lower in the address space to avoid possible
> collisions with mmap or stack regions. For 64-bit, align to 4GB to
> allow runtimes to use the entire 32-bit address space for 32-bit
> pointers.

The change log and subject are a bit out of whack with the actual patch
because previously we used 512MB.

How about?

  powerpc: Move ELF_ET_DYN_BASE to 4GB / 4MB

  Now that explicitly executed loaders are loaded in the mmap region,
  we have more freedom to decide where we position PIE binaries in the
  address space to avoid possible collisions with mmap or stack regions.

  For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
  address space for 32-bit pointers. On 32-bit use 4MB.

Is there any particular reasoning behind the 4MB value on 32-bit?

I gave this a quick spin and it booted OK on all my test boxes, which
covers 64-bit/32-bit kernel and userspace. So seems to work!

Acked-by: Michael Ellerman 

cheers

Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next, v3, 3/3] udp: try to avoid 2 cache miss on dequeue)

2017-06-23 Thread Michael Ellerman

Hannes Frederic Sowa  writes:

> On Thu, Jun 22, 2017, at 22:57, Paolo Abeni wrote:
>> 
>> Can you please check if the following patch fixes the issue? Only
>> compiled tested here.
>> 
>> Thanks!!!
>> ---
>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>> index 067a607..80d89fe 100644
>> --- a/net/ipv4/udp.c
>> +++ b/net/ipv4/udp.c
>> @@ -1446,16 +1446,19 @@ static struct sk_buff
>> *__first_packet_length(struct sock *sk,
>>  {
>>  struct sk_buff *skb;
>>  
>> -   while ((skb = skb_peek(rcvq)) != NULL &&
>> -      udp_lib_checksum_complete(skb)) {
>> -   __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
>> -   IS_UDPLITE(sk));
>> -   __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
>> -   IS_UDPLITE(sk));
>> -   atomic_inc(>sk_drops);
>> -   __skb_unlink(skb, rcvq);
>> -   *total += skb->truesize;
>> -   kfree_skb(skb);
>> +   while ((skb = skb_peek(rcvq)) != NULL) {
>> +   if (udp_lib_checksum_complete(skb)) {
>> +   __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
>> +   IS_UDPLITE(sk));
>> +   __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
>> +   IS_UDPLITE(sk));
>> +   atomic_inc(>sk_drops);
>> +   __skb_unlink(skb, rcvq);
>> +   *total += skb->truesize;
>> +   kfree_skb(skb);
>> +   } else {
>> +   udp_set_dev_scratch(skb);
>
> It needs a "break;" here.
>
>> +   }
>>  }
>>  return skb;
>>  }

That works!

$ wget google.com
--2017-06-23 16:56:31--  http://google.com/
Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
Proxy request sent, awaiting response... 302 Found
Location: http://www.google.com.au/?gfe_rd=cr=n7tMWeb9JYPr8wfg4LXYAQ 
[following]
--2017-06-23 16:56:31--  
http://www.google.com.au/?gfe_rd=cr=n7tMWeb9JYPr8wfg4LXYAQ
Reusing existing connection to proxy.pmdw.com:3128.
Proxy request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’


The patch had whitespace issues or something and I had to apply it by
hand, here's what I actually tested.

cheers

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 067a607917f9..d3227c1bbe8e 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1446,16 +1446,20 @@ static struct sk_buff *__first_packet_length(struct 
sock *sk,
 {
struct sk_buff *skb;
 
-   while ((skb = skb_peek(rcvq)) != NULL &&
-  udp_lib_checksum_complete(skb)) {
-   __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
-   IS_UDPLITE(sk));
-   __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
-   IS_UDPLITE(sk));
-   atomic_inc(>sk_drops);
-   __skb_unlink(skb, rcvq);
-   *total += skb->truesize;
-   kfree_skb(skb);
+   while ((skb = skb_peek(rcvq)) != NULL) {
+   if (udp_lib_checksum_complete(skb)) {
+   __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
+   IS_UDPLITE(sk));
+   __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
+   IS_UDPLITE(sk));
+   atomic_inc(>sk_drops);
+   __skb_unlink(skb, rcvq);
+   *total += skb->truesize;
+   kfree_skb(skb);
+   } else {
+   udp_set_dev_scratch(skb);
+   break;
+   }
}
return skb;
 }

Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)

2017-06-23 Thread Paolo Abeni

On Thu, 2017-06-22 at 18:43 +0200, Paolo Abeni wrote:
> On Thu, 2017-06-22 at 23:06 +1000, Michael Ellerman wrote:
> > Paolo wrote:
> > > when udp_recvmsg() is executed, on x86_64 and other archs, most skb
> > > fields are on cold cachelines.
> > > If the skb are linear and the kernel don't need to compute the udp
> > > csum, only a handful of skb fields are required by udp_recvmsg().
> > > Since we already use skb->dev_scratch to cache hot data, and
> > > there are 32 bits unused on 64 bit archs, use such field to cache
> > > as much data as we can, and try to prefetch on dequeue the relevant
> > > fields that are left out.
> > > 
> > > This can save up to 2 cache miss per packet.
> > > 
> > > v1 -> v2:
> > >   - changed udp_dev_scratch fields types to u{32,16} variant,
> > > replaced bitfiled with bool
> > > 
> > > Signed-off-by: Paolo Abeni 
> > > Acked-by: Eric Dumazet 
> > > ---
> > >  net/ipv4/udp.c | 114 
> > > +++--
> > >  1 file changed, 103 insertions(+), 11 deletions(-)
> > 
> > This appears to break wget on one of my machines.
> > 
> > Networking in general is working, I'm able to SSH in, but then I can't
> > do a wget.
> > 
> > eg:
> > 
> > $ wget google.com
> > --2017-06-22 22:45:39--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in 
> > name resolution.
> > wget: unable to resolve host address ‘proxy.pmdw.com’
> > 
> > $ host proxy.pmdw.com
> > proxy.pmdw.com is an alias for raven.pmdw.com.
> > raven.pmdw.com has address 10.1.2.3
> > 
> > $ wget google.com
> > --2017-06-22 22:52:08--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in 
> > name resolution.
> > wget: unable to resolve host address ‘proxy.pmdw.com’
> > 
> > Maybe host is using TCP but the man page says it doesn't?
> > 
> > 
> > Everything is OK if I boot back to the previous commit 0a463c78d25b
> > ("udp: avoid a cache miss on dequeue"):
> > 
> > $ wget google.com
> > --2017-06-22 23:00:01--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
> > Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
> > Proxy request sent, awaiting response... 302 Found
> > Location: http://www.google.com.au/?gfe_rd=cr=Ub9LWbPbLujDXrH1uPgE 
> > [following]
> > --2017-06-22 23:00:01--  
> > http://www.google.com.au/?gfe_rd=cr=Ub9LWbPbLujDXrH1uPgE
> > Reusing existing connection to proxy.pmdw.com:3128.
> > Proxy request sent, awaiting response... 200 OK
> > Length: unspecified [text/html]
> > Saving to: ‘index.html’
> > 
> > index.html  [ <=>   
> > ]  11.37K  --.-KB/sin 0.001s  
> > 
> > 2017-06-22 23:00:01 (22.0 MB/s) - ‘index.html’ saved [11640]
> > 
> > $ uname -a
> > Linux 4.12.0-rc4-gcc6-00988-g0a463c7 #88 SMP Thu Jun 22 22:55:12 AEST 2017 
> > ppc64 GNU/Linux
> > 
> > 
> > Haven't had time to debug any further. Any ideas?
> 
> Thank you for this report.
> 
> Can you please specify features of the relevant NIC ? (ethtool -k
> ) 
> 
> I'll try to replicate the issue as soon I'll get hands on suitable HW,

I had my hands on power7, but I can't trivially reproduce the issue so
I'm going to bug you for more info. 

Can you please specify the host CPU, the NIC in use (ethtool -i
), the compiler version used to build the kernel and possibly
provide a tcpdump of the DNS packets received/sent while running wget
and while running the host command?

Do you have the relevant kernel running on others PPC hosts?

Thank you,

Paolo

Re: [kernel-hardening] [PATCH 2/4] arm64: Reduce ELF_ET_DYN_BASE

2017-06-23 Thread Ard Biesheuvel

Hi Kees,

On 22 June 2017 at 18:06, Kees Cook  wrote:
> Now that explicitly executed loaders are loaded in the mmap region,
> position PIE binaries lower in the address space to avoid possible
> collisions with mmap or stack regions. For 64-bit, align to 4GB to
> allow runtimes to use the entire 32-bit address space for 32-bit
> pointers.
>
> Signed-off-by: Kees Cook 
> ---
>  arch/arm64/include/asm/elf.h | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
> index 5d1700425efe..f742af8f7c42 100644
> --- a/arch/arm64/include/asm/elf.h
> +++ b/arch/arm64/include/asm/elf.h
> @@ -113,12 +113,13 @@
>  #define ELF_EXEC_PAGESIZE  PAGE_SIZE
>
>  /*
> - * This is the location that an ET_DYN program is loaded if exec'ed.  Typical
> - * use of this is to invoke "./ld.so someprog" to test out a new version of
> - * the loader.  We need to make sure that it is out of the way of the program
> - * that it will "exec", and that there is sufficient room for the brk.
> + * This is the base location for PIE (ET_DYN with INTERP) loads. On
> + * 64-bit, this is raised to 4GB to leave the entire 32-bit address
> + * space open for things that want to use the area for 32-bit pointers.
>   */
> -#define ELF_ET_DYN_BASE(2 * TASK_SIZE_64 / 3)
> +#define ELF_ET_DYN_BASE(test_thread_flag(TIF_32BIT) ?  \
> +   0x00040UL : \
> +   0x1UL)
>

Why are you merging this with the COMPAT definition?

>  #ifndef __ASSEMBLY__
>
> @@ -173,8 +174,6 @@ extern int arch_setup_additional_pages(struct 
> linux_binprm *bprm,
>
>  #ifdef CONFIG_COMPAT
>
> -#define COMPAT_ELF_ET_DYN_BASE (2 * TASK_SIZE_32 / 3)
> -
>  /* AArch32 registers. */
>  #define COMPAT_ELF_NGREG   18
>  typedef unsigned int   compat_elf_greg_t;
> --
> 2.7.4
>

Re: [PATCH 7/7] crypto: caam: cleanup CONFIG_64BIT ifdefs when using io{read|write}64

2017-06-23 Thread Horia Geantă

On 6/22/2017 7:49 PM, Logan Gunthorpe wrote:
> Now that ioread64 and iowrite64 are always available we don't
> need the ugly ifdefs to change their implementation when they
> are not.
> 
Thanks Logan.

Note however this is not equivalent - it changes the behaviour, since
CAAM engine on i.MX6S/SL/D/Q platforms is broken in terms of 64-bit
register endianness - see CONFIG_CRYPTO_DEV_FSL_CAAM_IMX usage in code
you are removing.

[Yes, current code has its problems, as it does not differentiate b/w
i.MX platforms with and without the (unofficial) erratum, but this
should be fixed separately.]

Below is the change that would keep current logic - still forcing i.MX
to write CAAM 64-bit registers in BE even if the engine is LE (yes, diff
is doing a poor job).

Horia

diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h
index 84d2f838a063..b893ebb24e65 100644
--- a/drivers/crypto/caam/regs.h
+++ b/drivers/crypto/caam/regs.h
@@ -134,50 +134,25 @@ static inline void clrsetbits_32(void __iomem
*reg, u32 clear, u32 set)
  *base + 0x : least-significant 32 bits
  *base + 0x0004 : most-significant 32 bits
  */
-#ifdef CONFIG_64BIT
 static inline void wr_reg64(void __iomem *reg, u64 data)
 {
+#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
if (caam_little_end)
iowrite64(data, reg);
else
-   iowrite64be(data, reg);
-}
-
-static inline u64 rd_reg64(void __iomem *reg)
-{
-   if (caam_little_end)
-   return ioread64(reg);
-   else
-   return ioread64be(reg);
-}
-
-#else /* CONFIG_64BIT */
-static inline void wr_reg64(void __iomem *reg, u64 data)
-{
-#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
-   if (caam_little_end) {
-   wr_reg32((u32 __iomem *)(reg) + 1, data >> 32);
-   wr_reg32((u32 __iomem *)(reg), data);
-   } else
 #endif
-   {
-   wr_reg32((u32 __iomem *)(reg), data >> 32);
-   wr_reg32((u32 __iomem *)(reg) + 1, data);
-   }
+   iowrite64be(data, reg);
 }

 static inline u64 rd_reg64(void __iomem *reg)
 {
 #ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
if (caam_little_end)
-   return ((u64)rd_reg32((u32 __iomem *)(reg) + 1) << 32 |
-   (u64)rd_reg32((u32 __iomem *)(reg)));
+   return ioread64(reg);
else
 #endif
-   return ((u64)rd_reg32((u32 __iomem *)(reg)) << 32 |
-   (u64)rd_reg32((u32 __iomem *)(reg) + 1));
+   return ioread64be(reg);
 }
-#endif /* CONFIG_64BIT  */

 #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
 #ifdef CONFIG_SOC_IMX7D


> Signed-off-by: Logan Gunthorpe 
> Cc: "Horia Geantă" 
> Cc: Dan Douglass 
> Cc: Herbert Xu 
> Cc: "David S. Miller" 
> ---
>  drivers/crypto/caam/regs.h | 29 -
>  1 file changed, 29 deletions(-)
> 
> diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h
> index 84d2f838a063..26fc19dd0c39 100644
> --- a/drivers/crypto/caam/regs.h
> +++ b/drivers/crypto/caam/regs.h
> @@ -134,7 +134,6 @@ static inline void clrsetbits_32(void __iomem *reg, u32 
> clear, u32 set)
>   *base + 0x : least-significant 32 bits
>   *base + 0x0004 : most-significant 32 bits
>   */
> -#ifdef CONFIG_64BIT
>  static inline void wr_reg64(void __iomem *reg, u64 data)
>  {
>   if (caam_little_end)
> @@ -151,34 +150,6 @@ static inline u64 rd_reg64(void __iomem *reg)
>   return ioread64be(reg);
>  }
>  
> -#else /* CONFIG_64BIT */
> -static inline void wr_reg64(void __iomem *reg, u64 data)
> -{
> -#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
> - if (caam_little_end) {
> - wr_reg32((u32 __iomem *)(reg) + 1, data >> 32);
> - wr_reg32((u32 __iomem *)(reg), data);
> - } else
> -#endif
> - {
> - wr_reg32((u32 __iomem *)(reg), data >> 32);
> - wr_reg32((u32 __iomem *)(reg) + 1, data);
> - }
> -}
> -
> -static inline u64 rd_reg64(void __iomem *reg)
> -{
> -#ifndef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
> - if (caam_little_end)
> - return ((u64)rd_reg32((u32 __iomem *)(reg) + 1) << 32 |
> - (u64)rd_reg32((u32 __iomem *)(reg)));
> - else
> -#endif
> - return ((u64)rd_reg32((u32 __iomem *)(reg)) << 32 |
> - (u64)rd_reg32((u32 __iomem *)(reg) + 1));
> -}
> -#endif /* CONFIG_64BIT  */
> -
>  #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>  #ifdef CONFIG_SOC_IMX7D
>  #define cpu_to_caam_dma(value) \
>

Re: DPAA: Software Portal selection for network interface

2017-06-23 Thread Sebastian Huber


On 22/06/17 16:48, Madalin-cristian Bucur wrote:


This means all the QMan portal_isr() are distributed round-robin to all
affine portals. Is there some way to configure the software portal for a
specific network interface, e.g. use processors 0, 1, 2, 3 for one
interface,and 4, 5, 6, 7 for another?

We've stripped all configurability and advanced features from the initial
DPAA submission. We don't have options to configure this. On the other hand,
the ingress queues are held active in the portal, resulting in one CPU doing
dequeues while there are frames available. This is done to avoid frame
reordering, improving termination performance (and forwarded TCP performance).

The downside is that we're not saturating all CPUs with traffic. We're
currently working on adding Rx hashing, to be able to maintain intra-flow
frame order while spreading processing on all CPUs.

Meanwhile, would RPS (Receive Packet Steering) be a solution for you?



We receive UDP packets with an MTU of 1500 on one of the 10G interfaces 
at wire speed. So, I guess I have to add this configure myself somehow 
and use a dedicated software portal for this interface.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: [PATCH] powerpc/kernel: Avoid redundancies on giveup_all

2017-06-23 Thread Cyril Bur

On Thu, 2017-06-22 at 17:27 -0300, Breno Leitao wrote:
> Currently giveup_all() calls __giveup_fpu(), __giveup_altivec(), and
> __giveup_vsx(). But __giveup_vsx() also calls __giveup_fpu() and
> __giveup_altivec() again, in a redudant manner.
> 
> Other than giving up FP and Altivec, __giveup_vsx() also disables
> MSR_VSX on MSR, but this is already done by __giveup_{fpu,altivec}().
> As VSX can not be enabled alone on MSR (without FP and/or VEC
> enabled), this is also a redundancy. VSX is never enabled alone (without
> FP and VEC) because every time VSX is enabled, as through load_up_vsx()
> and restore_math(), FP is also enabled together.
> 
> This change improves giveup_all() in average just 3%, but since
> giveup_all() is called very frequently, around 8x per CPU per second on
> an idle machine, this change might show some noticeable improvement.
> 

So I totally agree except this makes me quite nervous. I know we're
quite good at always disabling VSX when we disable FPU and ALTIVEC and
we do always turn VSX on when we enable FPU AND ALTIVEC. But still, if
we ever get that wrong...

I'm more interested in how this improves giveup_all() performance by so
much, but then hardware often surprises - I guess that's the cost of a
function call.

Perhaps caching the thread.regs->msr isn't a good idea. If we could
branch over in the common case and but still have the call to the
function in case something goes horribly wrong?

> Signed-off-by: Breno Leitao 
> Signed-off-by: Gustavo Romero 
> ---
>  arch/powerpc/kernel/process.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 2ad725ef4368..5d6af58270e6 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -494,10 +494,6 @@ void giveup_all(struct task_struct *tsk)
>   if (usermsr & MSR_VEC)
>   __giveup_altivec(tsk);
>  #endif
> -#ifdef CONFIG_VSX
> - if (usermsr & MSR_VSX)
> - __giveup_vsx(tsk);
> -#endif
>  #ifdef CONFIG_SPE
>   if (usermsr & MSR_SPE)
>   __giveup_spe(tsk);

47 matches

Mail list logo