Re: [PATCH v9 2/2] powerpc/64s: KVM update for reimplement book3s idle code in C

2019-04-12 Thread Nicholas Piggin
kbuild test robot's on April 13, 2019 12:51 pm:
> Hi Nicholas,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on v5.1-rc4 next-20190412]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64s-reimplement-book3s-idle-code-in-C/20190413-002437
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: powerpc-defconfig (attached as .config)
> compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.2.0 make.cross ARCH=powerpc 
> 
> All errors (new ones prefixed by >>):
> 
>powerpc64-linux-gnu-ld: warning: orphan section `.gnu.hash' from `linker 
> stubs' being placed in section `.gnu.hash'.
>arch/powerpc/platforms/powernv/idle.o: In function `.pnv_cpu_offline':
>>> (.text+0x1900): undefined reference to `.idle_kvm_start_guest'

Argh, it took me longer than I'd like to admit to work this out. I'm
ELFv1-illiterate. Replacing .globl with _GLOBAL seems to fix it. I'll
roll that into the next patch.

Thanks,
Nick



Re: powerpc/64s/radix: Fix radix segment exception handling

2019-04-12 Thread Nicholas Piggin
Michael Ellerman's on April 13, 2019 1:39 pm:
> Nicholas Piggin  writes:
>> Michael Ellerman's on April 11, 2019 12:49 am:
>>> On Fri, 2019-03-29 at 07:42:57 UTC, Nicholas Piggin wrote:
 Commit 48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C")
 broke the radix-mode segment exception handler. In radix mode, this is
 exception is not an SLB miss, rather it signals that the EA is outside
 the range translated by any page table.
 
 The commit lost the radix feature alternate code patch, which can
 cause faults to some EAs to kernel BUG at arch/powerpc/mm/slb.c:639!
 
 The original radix code would send faults to slb_miss_large_addr,
 which would end up faulting due to slb_addr_limit being 0. This patch
 sends radix directly to do_bad_slb_fault, which is a bit clearer.
 
 Fixes: 48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C")
 Cc: Aneesh Kumar K.V 
 Reported-by: Anton Blanchard 
 Signed-off-by: Nicholas Piggin 
 Reviewed-by: Aneesh Kumar K.V 
>>> 
>>> Applied to powerpc fixes, thanks.
>>> 
>>> https://git.kernel.org/powerpc/c/7100e8704b61247649c50551b965e71d
>>
>> I sent a v2 with a selftests that triggers the crash if you want it.
>> Code was unchanged to no big deal there.
> 
> Yeah I checked the kernel part was unchanged so stuck with v1.
> 
> I also sent a self test, which is similar but slightly different to
> yours, though yours is better in general. I'll try and merge them into
> one test.

If you don't mind that would be good. The siglongjmp handler is
cleaner though, and should make it simpler to test ifetch accesses.
We just need to get the sig info into yours, and an array of
interesting addresses.

Thanks,
Nick



[PATCH] crypto: powerpc - convert to use crypto_simd_usable()

2019-04-12 Thread Eric Biggers
From: Eric Biggers 

Replace all calls to in_interrupt() in the PowerPC crypto code with
!crypto_simd_usable().  This causes the crypto self-tests to test the
no-SIMD code paths when CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y.

The p8_ghash algorithm is currently failing and needs to be fixed, as it
produces the wrong digest when no-SIMD updates are mixed with SIMD ones.

Signed-off-by: Eric Biggers 
---
 arch/powerpc/crypto/crc32c-vpmsum_glue.c| 4 +++-
 arch/powerpc/crypto/crct10dif-vpmsum_glue.c | 4 +++-
 arch/powerpc/include/asm/Kbuild | 1 +
 drivers/crypto/vmx/aes.c| 7 ---
 drivers/crypto/vmx/aes_cbc.c| 7 ---
 drivers/crypto/vmx/aes_ctr.c| 5 +++--
 drivers/crypto/vmx/aes_xts.c| 5 +++--
 drivers/crypto/vmx/ghash.c  | 9 -
 8 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/crypto/crc32c-vpmsum_glue.c 
b/arch/powerpc/crypto/crc32c-vpmsum_glue.c
index fd1d6c83f0c02..c4fa242dd652d 100644
--- a/arch/powerpc/crypto/crc32c-vpmsum_glue.c
+++ b/arch/powerpc/crypto/crc32c-vpmsum_glue.c
@@ -1,10 +1,12 @@
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define CHKSUM_BLOCK_SIZE  1
@@ -22,7 +24,7 @@ static u32 crc32c_vpmsum(u32 crc, unsigned char const *p, 
size_t len)
unsigned int prealign;
unsigned int tail;
 
-   if (len < (VECTOR_BREAKPOINT + VMX_ALIGN) || in_interrupt())
+   if (len < (VECTOR_BREAKPOINT + VMX_ALIGN) || !crypto_simd_usable())
return __crc32c_le(crc, p, len);
 
if ((unsigned long)p & VMX_ALIGN_MASK) {
diff --git a/arch/powerpc/crypto/crct10dif-vpmsum_glue.c 
b/arch/powerpc/crypto/crct10dif-vpmsum_glue.c
index 02ea277863d15..e27ff16573b5b 100644
--- a/arch/powerpc/crypto/crct10dif-vpmsum_glue.c
+++ b/arch/powerpc/crypto/crct10dif-vpmsum_glue.c
@@ -12,11 +12,13 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define VMX_ALIGN  16
@@ -32,7 +34,7 @@ static u16 crct10dif_vpmsum(u16 crci, unsigned char const *p, 
size_t len)
unsigned int tail;
u32 crc = crci;
 
-   if (len < (VECTOR_BREAKPOINT + VMX_ALIGN) || in_interrupt())
+   if (len < (VECTOR_BREAKPOINT + VMX_ALIGN) || !crypto_simd_usable())
return crc_t10dif_generic(crc, p, len);
 
if ((unsigned long)p & VMX_ALIGN_MASK) {
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index a0c132bedfae8..5ac3dead69523 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -11,3 +11,4 @@ generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += vtime.h
 generic-y += msi.h
+generic-y += simd.h
diff --git a/drivers/crypto/vmx/aes.c b/drivers/crypto/vmx/aes.c
index b00d6947e02f4..603a620819941 100644
--- a/drivers/crypto/vmx/aes.c
+++ b/drivers/crypto/vmx/aes.c
@@ -23,9 +23,10 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
+#include 
 
 #include "aesp8-ppc.h"
 
@@ -92,7 +93,7 @@ static void p8_aes_encrypt(struct crypto_tfm *tfm, u8 *dst, 
const u8 *src)
 {
struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm);
 
-   if (in_interrupt()) {
+   if (!crypto_simd_usable()) {
crypto_cipher_encrypt_one(ctx->fallback, dst, src);
} else {
preempt_disable();
@@ -109,7 +110,7 @@ static void p8_aes_decrypt(struct crypto_tfm *tfm, u8 *dst, 
const u8 *src)
 {
struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm);
 
-   if (in_interrupt()) {
+   if (!crypto_simd_usable()) {
crypto_cipher_decrypt_one(ctx->fallback, dst, src);
} else {
preempt_disable();
diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c
index fbe882ef1bc5d..a1a9a6f0d42cf 100644
--- a/drivers/crypto/vmx/aes_cbc.c
+++ b/drivers/crypto/vmx/aes_cbc.c
@@ -23,9 +23,10 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -100,7 +101,7 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc *desc,
struct p8_aes_cbc_ctx *ctx =
crypto_tfm_ctx(crypto_blkcipher_tfm(desc->tfm));
 
-   if (in_interrupt()) {
+   if (!crypto_simd_usable()) {
SYNC_SKCIPHER_REQUEST_ON_STACK(req, ctx->fallback);
skcipher_request_set_sync_tfm(req, ctx->fallback);
skcipher_request_set_callback(req, desc->flags, NULL, NULL);
@@ -139,7 +140,7 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc *desc,
struct p8_aes_cbc_ctx *ctx =
crypto_tfm_ctx(crypto_blkcipher_tfm(desc->tfm));
 
-   if (in_interrupt()) {
+   if (!crypto_simd_usable()) {
SYNC_SKCIPHER_REQUEST_ON_STACK(req, ctx->fallback);
skcipher_request_set_sync_tfm(req, ctx->fallback);
skcipher_r

Re: [PATCH] Documentation: Add ARM64 to kernel-parameters.rst

2019-04-12 Thread Randy Dunlap
On 4/12/19 8:56 PM, Josh Poimboeuf wrote:
> Add ARM64 to the legend of architectures.  It's already used in several
> places in kernel-parameters.txt.
> 
> Suggested-by: Randy Dunlap 
> Signed-off-by: Josh Poimboeuf 
> ---
>  Documentation/admin-guide/kernel-parameters.rst | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.rst 
> b/Documentation/admin-guide/kernel-parameters.rst
> index b8d0bc07ed0a..0124980dca2d 100644
> --- a/Documentation/admin-guide/kernel-parameters.rst
> +++ b/Documentation/admin-guide/kernel-parameters.rst
> @@ -88,6 +88,7 @@ parameter is applicable::
>   APICAPIC support is enabled.
>   APM Advanced Power Management support is enabled.
>   ARM ARM architecture is enabled.
> + ARM64   ARM64 architecture is enabled.
>   AX25Appropriate AX.25 support is enabled.
>   CLK Common clock infrastructure is enabled.
>   CMA Contiguous Memory Area support is enabled.
> 


Thanks.

-- 
~Randy


[GIT PULL] Please pull powerpc/linux.git powerpc-5.1-5 tag

2019-04-12 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Linus,

Please pull some more powerpc fixes for 5.1:

The following changes since commit 6f845ebec2706841d15831fab3cfd9e676fa:

  powerpc/pseries/mce: Fix misleading print for TLB mutlihit (2019-03-29 
16:59:19 +1100)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.1-5

for you to fetch changes up to cf7cf6977f531acd5dfe55250d0ee8cbbb6f1ae8:

  powerpc/mm: Define MAX_PHYSMEM_BITS for all 64-bit configs (2019-04-10 
14:45:57 +1000)

- --
powerpc fixes for 5.1 #5

A minor build fix for 64-bit FLATMEM configs.

A fix for a boot failure on 32-bit powermacs.

My commit to fix CLOCK_MONOTONIC across Y2038 broke the 32-bit VDSO on 64-bit
kernels, ie. compat mode, which is only used on big endian.

The rewrite of the SLB code we merged in 4.20 missed the fact that the 0x380
exception is also used with the Radix MMU to report out of range accesses. This
could lead to an oops if userspace tried to read from addresses outside the user
or kernel range.

Thanks to:
  Aneesh Kumar K.V, Christophe Leroy, Larry Finger, Nicholas Piggin.

- --
Christophe Leroy (2):
  powerpc/32: Fix early boot failure with RTAS built-in
  powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64

Michael Ellerman (1):
  powerpc/mm: Define MAX_PHYSMEM_BITS for all 64-bit configs

Nicholas Piggin (1):
  powerpc/64s/radix: Fix radix segment exception handling


 arch/powerpc/include/asm/mmu.h|  2 +-
 arch/powerpc/kernel/exceptions-64s.S  | 12 
 arch/powerpc/kernel/head_32.S |  8 
 arch/powerpc/kernel/vdso32/gettimeofday.S |  2 +-
 4 files changed, 14 insertions(+), 10 deletions(-)
-BEGIN PGP SIGNATURE-

iQIcBAEBAgAGBQJcsV9SAAoJEFHr6jzI4aWAhi8P/RB37+hOMXUP4CzX4VSKuorG
pIDo8rfAL8pmw+6gGooG4VNTfjoaENKdLygYIG+jFhh/xJx3lIUkfL970dWHv+la
bCUpofJvCwW00ZQRbjBDn8GihhZUQMITPlFSrnj1bz/hmFmMFjANFSPAOuGL7AY5
OIWQ2kzBkxFTkx7H60UPWPBQ0N+KI7DiFEX14Eg8UlEl5rWS/SBmnAL/1B+vYoLs
A8z71OXL0QD1mg0zTfevNzY7p5enOaGrZqau9Rs3QuKPb95XjiSL15jq5XwrggDy
4fvDTxrHHZOo383H9UZo4ZdY+mxgRnqhpyWgnBvxBc7JJgE/pCg5LOIE7aq/0DbY
4XaQ66BOr7d5Qw41KTs/XOHsmFTdYgfN91KwY/3UgB0740xdAhMPLn+fLxrs4iBL
xAmDS4luqk2+i8WLvUZsXh9L9b/ul41rN1q8xH0O2/ARTaTFarGo0KOsqNv2DGmW
oAiHpTA/mKkzN/I8iNBM057r8WTuunScyvjviNW9sKY36lbdvKbxndBmT5Ew9a1I
5L7c+xKPJNCjxX6iFtXaEQBYM47k+bwP+VxA8SAamvZmo+UXME01zoYeHzSRgmdX
oaYjMv1BEeWAf9eWsitV7rShZTDoJR7glMuBG4rHU3pzO0kMQTIAGM1wsVAs5XK6
z8EX5UlhVxwjFGvCDsKq
=3Jqz
-END PGP SIGNATURE-


[PATCH] Documentation: Add ARM64 to kernel-parameters.rst

2019-04-12 Thread Josh Poimboeuf
Add ARM64 to the legend of architectures.  It's already used in several
places in kernel-parameters.txt.

Suggested-by: Randy Dunlap 
Signed-off-by: Josh Poimboeuf 
---
 Documentation/admin-guide/kernel-parameters.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/admin-guide/kernel-parameters.rst 
b/Documentation/admin-guide/kernel-parameters.rst
index b8d0bc07ed0a..0124980dca2d 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -88,6 +88,7 @@ parameter is applicable::
APICAPIC support is enabled.
APM Advanced Power Management support is enabled.
ARM ARM architecture is enabled.
+   ARM64   ARM64 architecture is enabled.
AX25Appropriate AX.25 support is enabled.
CLK Common clock infrastructure is enabled.
CMA Contiguous Memory Area support is enabled.
-- 
2.17.2



Re: [PATCH] crypto: vmx - fix copy-paste error in CTR mode

2019-04-12 Thread Michael Ellerman
Nayna  writes:

> On 04/11/2019 10:47 AM, Daniel Axtens wrote:
>> Eric Biggers  writes:
>>
>>> Are you still planning to fix the remaining bug?  I booted a ppc64le VM, 
>>> and I
>>> see the same test failure (I think) you were referring to:
>>>
>>> alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test 
>>> vector 3, cfg="uneven misaligned splits, may sleep"
>>>
>> Yes, that's the one I saw. I don't have time to follow it up at the
>> moment, but Nayna is aware of it.
>>
>
> Yes Eric, we identified this as a separate issue of misalignment and 
> plan to post a separate patch to address it.

I also wrote it down in my write-only TODO list here:

  https://github.com/linuxppc/issues/issues/238


cheers


Re: powerpc/64s/radix: Fix radix segment exception handling

2019-04-12 Thread Michael Ellerman
Nicholas Piggin  writes:
> Michael Ellerman's on April 11, 2019 12:49 am:
>> On Fri, 2019-03-29 at 07:42:57 UTC, Nicholas Piggin wrote:
>>> Commit 48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C")
>>> broke the radix-mode segment exception handler. In radix mode, this is
>>> exception is not an SLB miss, rather it signals that the EA is outside
>>> the range translated by any page table.
>>> 
>>> The commit lost the radix feature alternate code patch, which can
>>> cause faults to some EAs to kernel BUG at arch/powerpc/mm/slb.c:639!
>>> 
>>> The original radix code would send faults to slb_miss_large_addr,
>>> which would end up faulting due to slb_addr_limit being 0. This patch
>>> sends radix directly to do_bad_slb_fault, which is a bit clearer.
>>> 
>>> Fixes: 48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C")
>>> Cc: Aneesh Kumar K.V 
>>> Reported-by: Anton Blanchard 
>>> Signed-off-by: Nicholas Piggin 
>>> Reviewed-by: Aneesh Kumar K.V 
>> 
>> Applied to powerpc fixes, thanks.
>> 
>> https://git.kernel.org/powerpc/c/7100e8704b61247649c50551b965e71d
>
> I sent a v2 with a selftests that triggers the crash if you want it.
> Code was unchanged to no big deal there.

Yeah I checked the kernel part was unchanged so stuck with v1.

I also sent a self test, which is similar but slightly different to
yours, though yours is better in general. I'll try and merge them into
one test.

cheers


Re: [PATCH v2 00/21] Convert hwmon documentation to ReST

2019-04-12 Thread Guenter Roeck

On 4/12/19 9:04 AM, Jonathan Corbet wrote:

On Thu, 11 Apr 2019 14:07:31 -0700
Guenter Roeck  wrote:


While nobody does such split, IMHO, the best would be to keep the
information outside Documentation/admin-guide. But hey! You're
the Doc maintainer. If you prefer to move, I'm perfectly fine
with that.
   


Same here, but please don't move the files which are kernel facing only.


Well, let's step back and think about this.  Who is the audience for
these documents?  That will tell us a lot about where they should really
be.

What I would prefer to avoid is the status quo where *everything* is in
the top-level directory, and where documents are organized for the
convenience of their maintainers rather than of their readers.  But
sometimes I feel like I'm alone in that desire...:)



The big real-world question is: Is the series good enough for you to accept,
or do you expect some level of user/kernel separation ?

Guenter


Re: [PATCH v9 2/2] powerpc/64s: KVM update for reimplement book3s idle code in C

2019-04-12 Thread kbuild test robot
Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.1-rc4 next-20190412]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64s-reimplement-book3s-idle-code-in-C/20190413-002437
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   powerpc64-linux-gnu-ld: warning: orphan section `.gnu.hash' from `linker 
stubs' being placed in section `.gnu.hash'.
   arch/powerpc/platforms/powernv/idle.o: In function `.pnv_cpu_offline':
>> (.text+0x1900): undefined reference to `.idle_kvm_start_guest'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[RFC PATCH] powerpc/64/ftrace: mprofile-kernel patch out mflr

2019-04-12 Thread Nicholas Piggin
The new mprofile-kernel mcount sequence is

  mflr  r0
  bl_mcount

Dynamic ftrace patches the branch instruction with a noop, but leaves
the mflr. mflr is executed by the branch unit that can only execute one
per cycle on POWER9 and shared with branches, so it would be nice to
avoid it where possible.

This patch is a hacky proof of concept to nop out the mflr. Can we do
this or are there races or other issues with it?
---
 arch/powerpc/kernel/trace/ftrace.c | 77 +-
 1 file changed, 75 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 52ee24fd353f..ecc75baef23e 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -172,6 +172,19 @@ __ftrace_make_nop(struct module *mod,
pr_err("Unexpected instruction %08x around bl _mcount\n", op);
return -EINVAL;
}
+
+   if (patch_instruction((unsigned int *)ip, pop)) {
+   pr_err("Patching NOP failed.\n");
+   return -EPERM;
+   }
+
+   if (op == PPC_INST_MFLR) {
+   if (patch_instruction((unsigned int *)(ip - 4), pop)) {
+   pr_err("Patching NOP failed.\n");
+   return -EPERM;
+   }
+   }
+
 #else
/*
 * Our original call site looks like:
@@ -202,13 +215,14 @@ __ftrace_make_nop(struct module *mod,
pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, op);
return -EINVAL;
}
-#endif /* CONFIG_MPROFILE_KERNEL */
 
if (patch_instruction((unsigned int *)ip, pop)) {
pr_err("Patching NOP failed.\n");
return -EPERM;
}
 
+#endif /* CONFIG_MPROFILE_KERNEL */
+
return 0;
 }
 
@@ -421,6 +435,20 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace 
*rec, unsigned long addr)
return -EPERM;
}
 
+#ifdef CONFIG_MPROFILE_KERNEL
+   if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
+   pr_err("Fetching instruction at %lx failed.\n", ip - 4);
+   return -EFAULT;
+   }
+
+   if (op == PPC_INST_MFLR) {
+   if (patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) {
+   pr_err("Patching NOP failed.\n");
+   return -EPERM;
+   }
+   }
+#endif
+
return 0;
 }
 
@@ -437,9 +465,20 @@ int ftrace_make_nop(struct module *mod,
 */
if (test_24bit_addr(ip, addr)) {
/* within range */
+   int rc;
+
old = ftrace_call_replace(ip, addr, 1);
new = PPC_INST_NOP;
-   return ftrace_modify_code(ip, old, new);
+   rc = ftrace_modify_code(ip, old, new);
+   if (rc)
+   return rc;
+#ifdef CONFIG_MPROFILE_KERNEL
+   old = PPC_INST_MFLR;
+   new = PPC_INST_NOP;
+   ftrace_modify_code(ip - 4, old, new);
+   /* old mprofile kernel will error because no mflr */
+#endif
+   return rc;
} else if (core_kernel_text(ip))
return __ftrace_make_nop_kernel(rec, addr);
 
@@ -562,6 +601,20 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long 
addr)
return -EINVAL;
}
 
+#ifdef CONFIG_MPROFILE_KERNEL
+   if (probe_kernel_read(op, (ip - 4), 4)) {
+   pr_err("Fetching instruction at %lx failed.\n", (unsigned 
long)(ip - 4));
+   return -EFAULT;
+   }
+
+   if (op[0] == PPC_INST_NOP) {
+   if (patch_instruction((ip - 4), PPC_INST_MFLR)) {
+   pr_err("Patching mflr failed.\n");
+   return -EINVAL;
+   }
+   }
+#endif
+
if (patch_branch(ip, tramp, BRANCH_SET_LINK)) {
pr_err("REL24 out of range!\n");
return -EINVAL;
@@ -650,6 +703,20 @@ static int __ftrace_make_call_kernel(struct dyn_ftrace 
*rec, unsigned long addr)
return -EINVAL;
}
 
+#ifdef CONFIG_MPROFILE_KERNEL
+   if (probe_kernel_read(&op, (ip - 4), 4)) {
+   pr_err("Fetching instruction at %lx failed.\n", (unsigned 
long)(ip - 4));
+   return -EFAULT;
+   }
+
+   if (op == PPC_INST_NOP) {
+   if (patch_instruction((ip - 4), PPC_INST_MFLR)) {
+   pr_err("Patching mflr failed.\n");
+   return -EINVAL;
+   }
+   }
+#endif
+
if (patch_branch(ip, tramp, BRANCH_SET_LINK)) {
pr_err("Error patching branch to ftrace tramp!\n");
return -EINVAL;
@@ -670,6 +737,12 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long 
addr)
 */
if (test_24bit_addr(ip, addr)) {
/* within range */
+#ifdef CONFIG_MPROFILE_KERNEL
+   old = PPC_INST_NOP;
+   new = PPC_

Re: [PATCH v2 00/21] Convert hwmon documentation to ReST

2019-04-12 Thread Guenter Roeck

On 4/12/19 5:25 PM, Mauro Carvalho Chehab wrote:

Em Fri, 12 Apr 2019 09:12:52 -0700
Guenter Roeck  escreveu:


On 4/12/19 9:04 AM, Jonathan Corbet wrote:

On Thu, 11 Apr 2019 14:07:31 -0700
Guenter Roeck  wrote:
   

While nobody does such split, IMHO, the best would be to keep the
information outside Documentation/admin-guide. But hey! You're
the Doc maintainer. If you prefer to move, I'm perfectly fine
with that.
  


Same here, but please don't move the files which are kernel facing only.


Well, let's step back and think about this.  Who is the audience for
these documents?  That will tell us a lot about where they should really
be.
   


Most of them are for users, some of them are for driver developers. A few
are for both, though that is generally not the intention (and one may argue
that driver internal documentation should be moved into the respective
driver source).


The big issue is really those files that contain both kernel internals
and userspace stuff.

This is a common pattern. I just finishing converting a lot more
documents to ReST and I found the same thing on almost all document
directories I touched.


What I would prefer to avoid is the status quo where *everything* is in
the top-level directory, and where documents are organized for the
convenience of their maintainers rather than of their readers.  But
sometimes I feel like I'm alone in that desire...:)
   

I am fine with separating user pointing from kernel API/driver developer
guides, and I agree that it would make a lot of sense. As I said, please
just make sure that kernel facing files don't end up in the wrong directory.


I like the idea of splitting user faced documents from the rest, but
this is not an easy task. On several cases, there are just a couple
of paragraphs with things like sysfs entries in the middle of a big
file with Kernel internals.



Yes, I know. I don't think that cleanup is going to happen anytime soon.

Guenter



Re: [PATCH v2 00/21] Convert hwmon documentation to ReST

2019-04-12 Thread Mauro Carvalho Chehab
Em Fri, 12 Apr 2019 09:12:52 -0700
Guenter Roeck  escreveu:

> On 4/12/19 9:04 AM, Jonathan Corbet wrote:
> > On Thu, 11 Apr 2019 14:07:31 -0700
> > Guenter Roeck  wrote:
> >   
> >>> While nobody does such split, IMHO, the best would be to keep the
> >>> information outside Documentation/admin-guide. But hey! You're
> >>> the Doc maintainer. If you prefer to move, I'm perfectly fine
> >>> with that.
> >>>  
> >>
> >> Same here, but please don't move the files which are kernel facing only.  
> > 
> > Well, let's step back and think about this.  Who is the audience for
> > these documents?  That will tell us a lot about where they should really
> > be.
> >   
> 
> Most of them are for users, some of them are for driver developers. A few
> are for both, though that is generally not the intention (and one may argue
> that driver internal documentation should be moved into the respective
> driver source).

The big issue is really those files that contain both kernel internals
and userspace stuff.

This is a common pattern. I just finishing converting a lot more
documents to ReST and I found the same thing on almost all document
directories I touched.

> > What I would prefer to avoid is the status quo where *everything* is in
> > the top-level directory, and where documents are organized for the
> > convenience of their maintainers rather than of their readers.  But
> > sometimes I feel like I'm alone in that desire...:)
> >   
> I am fine with separating user pointing from kernel API/driver developer
> guides, and I agree that it would make a lot of sense. As I said, please
> just make sure that kernel facing files don't end up in the wrong directory.

I like the idea of splitting user faced documents from the rest, but
this is not an easy task. On several cases, there are just a couple
of paragraphs with things like sysfs entries in the middle of a big
file with Kernel internals.

Thanks,
Mauro


Re: [PATCH v2 5/5] arm64/speculation: Support 'mitigations=' cmdline option

2019-04-12 Thread Randy Dunlap
On 4/12/19 1:39 PM, Josh Poimboeuf wrote:
> Configure arm64 runtime CPU speculation bug mitigations in accordance
> with the 'mitigations=' cmdline option.  This affects Meltdown, Spectre
> v2, and Speculative Store Bypass.
> 
> The default behavior is unchanged.
> 
> Signed-off-by: Josh Poimboeuf 
> ---
> NOTE: This is based on top of Jeremy Linton's patches:
>   https://lkml.kernel.org/r/20190410231237.52506-1-jeremy.lin...@arm.com
> 
>  Documentation/admin-guide/kernel-parameters.txt | 8 +---
>  arch/arm64/kernel/cpu_errata.c  | 6 +-
>  arch/arm64/kernel/cpufeature.c  | 8 +++-
>  3 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index e84a01d90e92..79bfc755defe 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2545,8 +2545,8 @@
>   http://repo.or.cz/w/linux-2.6/mini2440.git
>  
>   mitigations=
> - [X86,PPC,S390] Control optional mitigations for CPU
> - vulnerabilities.  This is a set of curated,
> + [X86,PPC,S390,ARM64] Control optional mitigations for
> + CPU vulnerabilities.  This is a set of curated,
>   arch-independent options, each of which is an
>   aggregation of existing arch-specific options.
>  
> @@ -2555,11 +2555,13 @@
>   improves system performance, but it may also
>   expose users to several CPU vulnerabilities.
>   Equivalent to: nopti [X86,PPC]
> +kpti=0 [ARM64]
>  nospectre_v1 [PPC]
>  nobp=0 [S390]
> -nospectre_v2 [X86,PPC,S390]
> +nospectre_v2 [X86,PPC,S390,ARM64]
>  spectre_v2_user=off [X86]
>  spec_store_bypass_disable=off 
> [X86,PPC]
> +ssbd=force-off [ARM64]
>  l1tf=off [X86]
>  
>   auto (default)

Hi,
Do we need to add "ARM64" to Documentation/admin-guide/kernel-parameters.rst?


-- 
~Randy


[PATCH v2 5/5] arm64/speculation: Support 'mitigations=' cmdline option

2019-04-12 Thread Josh Poimboeuf
Configure arm64 runtime CPU speculation bug mitigations in accordance
with the 'mitigations=' cmdline option.  This affects Meltdown, Spectre
v2, and Speculative Store Bypass.

The default behavior is unchanged.

Signed-off-by: Josh Poimboeuf 
---
NOTE: This is based on top of Jeremy Linton's patches:
  https://lkml.kernel.org/r/20190410231237.52506-1-jeremy.lin...@arm.com

 Documentation/admin-guide/kernel-parameters.txt | 8 +---
 arch/arm64/kernel/cpu_errata.c  | 6 +-
 arch/arm64/kernel/cpufeature.c  | 8 +++-
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index e84a01d90e92..79bfc755defe 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2545,8 +2545,8 @@
http://repo.or.cz/w/linux-2.6/mini2440.git
 
mitigations=
-   [X86,PPC,S390] Control optional mitigations for CPU
-   vulnerabilities.  This is a set of curated,
+   [X86,PPC,S390,ARM64] Control optional mitigations for
+   CPU vulnerabilities.  This is a set of curated,
arch-independent options, each of which is an
aggregation of existing arch-specific options.
 
@@ -2555,11 +2555,13 @@
improves system performance, but it may also
expose users to several CPU vulnerabilities.
Equivalent to: nopti [X86,PPC]
+  kpti=0 [ARM64]
   nospectre_v1 [PPC]
   nobp=0 [S390]
-  nospectre_v2 [X86,PPC,S390]
+  nospectre_v2 [X86,PPC,S390,ARM64]
   spectre_v2_user=off [X86]
   spec_store_bypass_disable=off 
[X86,PPC]
+  ssbd=force-off [ARM64]
   l1tf=off [X86]
 
auto (default)
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index a1f3188c7be0..65bcd7f0cca1 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -405,6 +406,9 @@ static bool has_ssbd_mitigation(const struct 
arm64_cpu_capabilities *entry,
this_cpu_safe = true;
}
 
+   if (cpu_mitigations_off())
+   ssbd_state = ARM64_SSBD_FORCE_DISABLE;
+
if (psci_ops.smccc_version == SMCCC_VERSION_1_0) {
ssbd_state = ARM64_SSBD_UNKNOWN;
if (!this_cpu_safe)
@@ -599,7 +603,7 @@ check_branch_predictor(const struct arm64_cpu_capabilities 
*entry, int scope)
}
 
/* forced off */
-   if (__nospectre_v2) {
+   if (__nospectre_v2 || cpu_mitigations_off()) {
pr_info_once("spectrev2 mitigation disabled by command line 
option\n");
__hardenbp_enab = false;
return false;
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6b7e1556460a..d826b17f7820 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -966,7 +967,7 @@ static bool unmap_kernel_at_el0(const struct 
arm64_cpu_capabilities *entry,
MIDR_ALL_VERSIONS(MIDR_HISI_TSV110),
{ /* sentinel */ }
};
-   char const *str = "command line option";
+   char const *str = "kpti command line option";
bool meltdown_safe;
 
meltdown_safe = is_midr_in_range_list(read_cpuid_id(), kpti_safe_list);
@@ -988,6 +989,11 @@ static bool unmap_kernel_at_el0(const struct 
arm64_cpu_capabilities *entry,
__kpti_forced = -1;
}
 
+   if (cpu_mitigations_off() && !__kpti_forced) {
+   str = "mitigations=off";
+   __kpti_forced = -1;
+   }
+
/* Useful for KASLR robustness */
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_offset() > 0) {
if (!__kpti_forced) {
-- 
2.17.2



[PATCH v2 4/5] s390/speculation: Support 'mitigations=' cmdline option

2019-04-12 Thread Josh Poimboeuf
Configure s390 runtime CPU speculation bug mitigations in accordance
with the 'mitigations=' cmdline option.  This affects Spectre v1 and
Spectre v2.

The default behavior is unchanged.

Signed-off-by: Josh Poimboeuf 
---
 Documentation/admin-guide/kernel-parameters.txt | 5 +++--
 arch/s390/kernel/nospec-branch.c| 3 ++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a03ab62b69af..e84a01d90e92 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2545,7 +2545,7 @@
http://repo.or.cz/w/linux-2.6/mini2440.git
 
mitigations=
-   [X86,PPC] Control optional mitigations for CPU
+   [X86,PPC,S390] Control optional mitigations for CPU
vulnerabilities.  This is a set of curated,
arch-independent options, each of which is an
aggregation of existing arch-specific options.
@@ -2556,7 +2556,8 @@
expose users to several CPU vulnerabilities.
Equivalent to: nopti [X86,PPC]
   nospectre_v1 [PPC]
-  nospectre_v2 [X86,PPC]
+  nobp=0 [S390]
+  nospectre_v2 [X86,PPC,S390]
   spectre_v2_user=off [X86]
   spec_store_bypass_disable=off 
[X86,PPC]
   l1tf=off [X86]
diff --git a/arch/s390/kernel/nospec-branch.c b/arch/s390/kernel/nospec-branch.c
index bdddaae96559..649135cbedd5 100644
--- a/arch/s390/kernel/nospec-branch.c
+++ b/arch/s390/kernel/nospec-branch.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
 #include 
+#include 
 #include 
 
 static int __init nobp_setup_early(char *str)
@@ -58,7 +59,7 @@ early_param("nospectre_v2", nospectre_v2_setup_early);
 
 void __init nospec_auto_detect(void)
 {
-   if (test_facility(156)) {
+   if (test_facility(156) || cpu_mitigations_off()) {
/*
 * The machine supports etokens.
 * Disable expolines and disable nobp.
-- 
2.17.2



[PATCH v2 3/5] powerpc/speculation: Support 'mitigations=' cmdline option

2019-04-12 Thread Josh Poimboeuf
Configure powerpc CPU runtime speculation bug mitigations in accordance
with the 'mitigations=' cmdline option.  This affects Meltdown, Spectre
v1, Spectre v2, and Speculative Store Bypass.

The default behavior is unchanged.

Signed-off-by: Josh Poimboeuf 
---
 Documentation/admin-guide/kernel-parameters.txt | 9 +
 arch/powerpc/kernel/security.c  | 6 +++---
 arch/powerpc/kernel/setup_64.c  | 2 +-
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 3e33bd03441a..a03ab62b69af 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2545,7 +2545,7 @@
http://repo.or.cz/w/linux-2.6/mini2440.git
 
mitigations=
-   [X86] Control optional mitigations for CPU
+   [X86,PPC] Control optional mitigations for CPU
vulnerabilities.  This is a set of curated,
arch-independent options, each of which is an
aggregation of existing arch-specific options.
@@ -2554,10 +2554,11 @@
Disable all optional CPU mitigations.  This
improves system performance, but it may also
expose users to several CPU vulnerabilities.
-   Equivalent to: nopti [X86]
-  nospectre_v2 [X86]
+   Equivalent to: nopti [X86,PPC]
+  nospectre_v1 [PPC]
+  nospectre_v2 [X86,PPC]
   spectre_v2_user=off [X86]
-  spec_store_bypass_disable=off 
[X86]
+  spec_store_bypass_disable=off 
[X86,PPC]
   l1tf=off [X86]
 
auto (default)
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index b33bafb8fcea..70568ccbd9fd 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -57,7 +57,7 @@ void setup_barrier_nospec(void)
enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) &&
 security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR);
 
-   if (!no_nospec)
+   if (!no_nospec && !cpu_mitigations_off())
enable_barrier_nospec(enable);
 }
 
@@ -116,7 +116,7 @@ static int __init handle_nospectre_v2(char *p)
 early_param("nospectre_v2", handle_nospectre_v2);
 void setup_spectre_v2(void)
 {
-   if (no_spectrev2)
+   if (no_spectrev2 || cpu_mitigations_off())
do_btb_flush_fixups();
else
btb_flush_enabled = true;
@@ -300,7 +300,7 @@ void setup_stf_barrier(void)
 
stf_enabled_flush_types = type;
 
-   if (!no_stf_barrier)
+   if (!no_stf_barrier && !cpu_mitigations_off())
stf_barrier_enable(enable);
 }
 
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index ba404dd9ce1d..4f49e1a3594c 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -932,7 +932,7 @@ void setup_rfi_flush(enum l1d_flush_type types, bool enable)
 
enabled_flush_types = types;
 
-   if (!no_rfi_flush)
+   if (!no_rfi_flush && !cpu_mitigations_off())
rfi_flush_enable(enable);
 }
 
-- 
2.17.2



[PATCH v2 2/5] x86/speculation: Support 'mitigations=' cmdline option

2019-04-12 Thread Josh Poimboeuf
Configure x86 runtime CPU speculation bug mitigations in accordance with
the 'mitigations=' cmdline option.  This affects Meltdown, Spectre v2,
Speculative Store Bypass, and L1TF.

The default behavior is unchanged.

Signed-off-by: Josh Poimboeuf 
---
 Documentation/admin-guide/kernel-parameters.txt | 16 +++-
 arch/x86/kernel/cpu/bugs.c  | 11 +--
 arch/x86/mm/pti.c   |  4 +++-
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 3ea92e075c64..3e33bd03441a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2545,15 +2545,20 @@
http://repo.or.cz/w/linux-2.6/mini2440.git
 
mitigations=
-   Control optional mitigations for CPU vulnerabilities.
-   This is a set of curated, arch-independent options, each
-   of which is an aggregation of existing arch-specific
-   options.
+   [X86] Control optional mitigations for CPU
+   vulnerabilities.  This is a set of curated,
+   arch-independent options, each of which is an
+   aggregation of existing arch-specific options.
 
off
Disable all optional CPU mitigations.  This
improves system performance, but it may also
expose users to several CPU vulnerabilities.
+   Equivalent to: nopti [X86]
+  nospectre_v2 [X86]
+  spectre_v2_user=off [X86]
+  spec_store_bypass_disable=off 
[X86]
+  l1tf=off [X86]
 
auto (default)
Mitigate all CPU vulnerabilities, but leave SMT
@@ -2561,12 +2566,13 @@
users who don't want to be surprised by SMT
getting disabled across kernel upgrades, or who
have other ways of avoiding SMT-based attacks.
-   This is the default behavior.
+   Equivalent to: (default behavior)
 
auto,nosmt
Mitigate all CPU vulnerabilities, disabling SMT
if needed.  This is for users who always want to
be fully mitigated, even if it means losing SMT.
+   Equivalent to: l1tf=flush,nosmt [X86]
 
mminit_loglevel=
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 2da82eff0eb4..8043a21f36be 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -440,7 +440,8 @@ static enum spectre_v2_mitigation_cmd __init 
spectre_v2_parse_cmdline(void)
char arg[20];
int ret, i;
 
-   if (cmdline_find_option_bool(boot_command_line, "nospectre_v2"))
+   if (cmdline_find_option_bool(boot_command_line, "nospectre_v2") ||
+   cpu_mitigations_off())
return SPECTRE_V2_CMD_NONE;
 
ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, 
sizeof(arg));
@@ -672,7 +673,8 @@ static enum ssb_mitigation_cmd __init 
ssb_parse_cmdline(void)
char arg[20];
int ret, i;
 
-   if (cmdline_find_option_bool(boot_command_line, 
"nospec_store_bypass_disable")) {
+   if (cmdline_find_option_bool(boot_command_line, 
"nospec_store_bypass_disable") ||
+   cpu_mitigations_off()) {
return SPEC_STORE_BYPASS_CMD_NONE;
} else {
ret = cmdline_find_option(boot_command_line, 
"spec_store_bypass_disable",
@@ -1008,6 +1010,11 @@ static void __init l1tf_select_mitigation(void)
if (!boot_cpu_has_bug(X86_BUG_L1TF))
return;
 
+   if (cpu_mitigations_off())
+   l1tf_mitigation = L1TF_MITIGATION_OFF;
+   else if (cpu_mitigations_auto_nosmt())
+   l1tf_mitigation = L1TF_MITIGATION_FLUSH_NOSMT;
+
override_cache_bits(&boot_cpu_data);
 
switch (l1tf_mitigation) {
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 5d27172c683f..9c2463bc158f 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -115,7 +116,8 @@ void __init pti_check_boottime_disable(void)
}
}
 
-   if (cmdline_find_option_bool(boot_command_line, "nopti")) {
+   if (cmdline_find_option_bool(boot_command_line, "nopti") ||

[PATCH v2 1/5] cpu/speculation: Add 'mitigations=' cmdline option

2019-04-12 Thread Josh Poimboeuf
Keeping track of the number of mitigations for all the CPU speculation
bugs has become overwhelming for many users.  It's getting more and more
complicated to decide which mitigations are needed for a given
architecture.  Complicating matters is the fact that each arch tends to
have its own custom way to mitigate the same vulnerability.

Most users fall into a few basic categories:

a) they want all mitigations off;

b) they want all reasonable mitigations on, with SMT enabled even if
   it's vulnerable; or

c) they want all reasonable mitigations on, with SMT disabled if
   vulnerable.

Define a set of curated, arch-independent options, each of which is an
aggregation of existing options:

- mitigations=off: Disable all mitigations.

- mitigations=auto: [default] Enable all the default mitigations, but
  leave SMT enabled, even if it's vulnerable.

- mitigations=auto,nosmt: Enable all the default mitigations, disabling
  SMT if needed by a mitigation.

Currently, these options are placeholders which don't actually do
anything.  They will be fleshed out in upcoming patches.

Signed-off-by: Josh Poimboeuf 
---
 .../admin-guide/kernel-parameters.txt | 24 +++
 include/linux/cpu.h   | 24 +++
 kernel/cpu.c  | 15 
 3 files changed, 63 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index faafdc59104a..3ea92e075c64 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2544,6 +2544,30 @@
in the "bleeding edge" mini2440 support kernel at
http://repo.or.cz/w/linux-2.6/mini2440.git
 
+   mitigations=
+   Control optional mitigations for CPU vulnerabilities.
+   This is a set of curated, arch-independent options, each
+   of which is an aggregation of existing arch-specific
+   options.
+
+   off
+   Disable all optional CPU mitigations.  This
+   improves system performance, but it may also
+   expose users to several CPU vulnerabilities.
+
+   auto (default)
+   Mitigate all CPU vulnerabilities, but leave SMT
+   enabled, even if it's vulnerable.  This is for
+   users who don't want to be surprised by SMT
+   getting disabled across kernel upgrades, or who
+   have other ways of avoiding SMT-based attacks.
+   This is the default behavior.
+
+   auto,nosmt
+   Mitigate all CPU vulnerabilities, disabling SMT
+   if needed.  This is for users who always want to
+   be fully mitigated, even if it means losing SMT.
+
mminit_loglevel=
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
parameter allows control of the logging verbosity for
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index ae99dde02320..5350357dfbdb 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -188,4 +188,28 @@ static inline void cpu_smt_disable(bool force) { }
 static inline void cpu_smt_check_topology(void) { }
 #endif
 
+/*
+ * These are used for a global "mitigations=" cmdline option for toggling
+ * optional CPU mitigations.
+ */
+enum cpu_mitigations {
+   CPU_MITIGATIONS_OFF,
+   CPU_MITIGATIONS_AUTO,
+   CPU_MITIGATIONS_AUTO_NOSMT,
+};
+
+extern enum cpu_mitigations cpu_mitigations;
+
+/* mitigations=off */
+static inline bool cpu_mitigations_off(void)
+{
+   return cpu_mitigations == CPU_MITIGATIONS_OFF;
+}
+
+/* mitigations=auto,nosmt */
+static inline bool cpu_mitigations_auto_nosmt(void)
+{
+   return cpu_mitigations == CPU_MITIGATIONS_AUTO_NOSMT;
+}
+
 #endif /* _LINUX_CPU_H_ */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 38890f62f9a8..aed9083f8eac 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2320,3 +2320,18 @@ void __init boot_cpu_hotplug_init(void)
 #endif
this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
 }
+
+enum cpu_mitigations cpu_mitigations __ro_after_init = CPU_MITIGATIONS_AUTO;
+
+static int __init mitigations_cmdline(char *arg)
+{
+   if (!strcmp(arg, "off"))
+   cpu_mitigations = CPU_MITIGATIONS_OFF;
+   else if (!strcmp(arg, "auto"))
+   cpu_mitigations = CPU_MITIGATIONS_AUTO;
+   else if (!strcmp(arg, "auto,nosmt"))
+   cpu_mitigations = CPU_MITIGATIONS_AUTO_NOSMT;
+
+   return 0;
+}
+early_param("mitigations", mitigations_cmdline);
-- 
2.17.2



[PATCH v2 0/5] cpu/speculation: Add 'mitigations=' cmdline option

2019-04-12 Thread Josh Poimboeuf
v2:
- docs improvements: [Randy, Michael]
- Rename to "mitigations=" [Michael]
- Add cpu_mitigations_off() function wrapper [Michael]
- x86: Simplify logic [Boris]
- powerpc: Fix no_rfi_flush checking bug (use '&&' instead of '||')
- arm64: Rebase onto Jeremy Linton's v7 patches [Will]
- arm64: "kpti command line option" [Steve P]
- arm64: Add nospectre_v2 support

---

Keeping track of the number of mitigations for all the CPU speculation
bugs has become overwhelming for many users.  It's getting more and more
complicated to decide which mitigations are needed for a given
architecture.  Complicating matters is the fact that each arch tends to
have its own custom way to mitigate the same vulnerability.

Most users fall into a few basic categories:

a) they want all mitigations off;

b) they want all reasonable mitigations on, with SMT enabled even if
   it's vulnerable; or

c) they want all reasonable mitigations on, with SMT disabled if
   vulnerable.

Define a set of curated, arch-independent options, each of which is an
aggregation of existing options:

- mitigations=off: Disable all mitigations.

- mitigations=auto: [default] Enable all the default mitigations, but
  leave SMT enabled, even if it's vulnerable.

- mitigations=auto,nosmt: Enable all the default mitigations, disabling
  SMT if needed by a mitigation.

Josh Poimboeuf (5):
  cpu/speculation: Add 'mitigations=' cmdline option
  x86/speculation: Support 'mitigations=' cmdline option
  powerpc/speculation: Support 'mitigations=' cmdline option
  s390/speculation: Support 'mitigations=' cmdline option
  arm64/speculation: Support 'mitigations=' cmdline option

 .../admin-guide/kernel-parameters.txt | 34 +++
 arch/arm64/kernel/cpu_errata.c|  6 +++-
 arch/arm64/kernel/cpufeature.c|  8 -
 arch/powerpc/kernel/security.c|  6 ++--
 arch/powerpc/kernel/setup_64.c|  2 +-
 arch/s390/kernel/nospec-branch.c  |  3 +-
 arch/x86/kernel/cpu/bugs.c| 11 --
 arch/x86/mm/pti.c |  4 ++-
 include/linux/cpu.h   | 24 +
 kernel/cpu.c  | 15 
 10 files changed, 103 insertions(+), 10 deletions(-)

-- 
2.17.2



Re: [PATCH 3/3] mm: introduce ARCH_HAS_PTE_DEVMAP

2019-04-12 Thread Ira Weiny
On Fri, Apr 12, 2019 at 07:56:02PM +0100, Robin Murphy wrote:
> ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
> with the long-out-of-date comment can lead to the impression than an
> architecture may just enable it (since __add_pages() now "comprehends
> device memory" for itself) and expect things to work.
> 
> In practice, however, ZONE_DEVICE users have little chance of
> functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
> that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
> dependency so the real situation is clearer.
> 
> Signed-off-by: Robin Murphy 

Reviewed-by: Ira Weiny 

> ---
>  arch/powerpc/Kconfig | 2 +-
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 1 -
>  arch/x86/Kconfig | 2 +-
>  arch/x86/include/asm/pgtable.h   | 4 ++--
>  arch/x86/include/asm/pgtable_types.h | 1 -
>  include/linux/mm.h   | 4 ++--
>  include/linux/pfn_t.h| 4 ++--
>  mm/Kconfig   | 5 ++---
>  mm/gup.c | 2 +-
>  9 files changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 5e3d0853c31d..77e1993bba80 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -135,6 +135,7 @@ config PPC
>   select ARCH_HAS_MMIOWB  if PPC64
>   select ARCH_HAS_PHYS_TO_DMA
>   select ARCH_HAS_PMEM_APIif PPC64
> + select ARCH_HAS_PTE_DEVMAP  if PPC_BOOK3S_64
>   select ARCH_HAS_PTE_SPECIAL
>   select ARCH_HAS_MEMBARRIER_CALLBACKS
>   select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
> && PPC64
> @@ -142,7 +143,6 @@ config PPC
>   select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
>   select ARCH_HAS_UACCESS_FLUSHCACHE  if PPC64
>   select ARCH_HAS_UBSAN_SANITIZE_ALL
> - select ARCH_HAS_ZONE_DEVICE if PPC_BOOK3S_64
>   select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   select ARCH_MIGHT_HAVE_PC_PARPORT
>   select ARCH_MIGHT_HAVE_PC_SERIO
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 581f91be9dd4..02c22ac8f387 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -90,7 +90,6 @@
>  #define _PAGE_SOFT_DIRTY _RPAGE_SW3 /* software: software dirty tracking 
> */
>  #define _PAGE_SPECIAL_RPAGE_SW2 /* software: special page */
>  #define _PAGE_DEVMAP _RPAGE_SW1 /* software: ZONE_DEVICE page */
> -#define __HAVE_ARCH_PTE_DEVMAP
>  
>  /*
>   * Drivers request for cache inhibited pte mapping using _PAGE_NO_CACHE
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5ad92419be19..ffd50f27f395 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -60,6 +60,7 @@ config X86
>   select ARCH_HAS_KCOVif X86_64
>   select ARCH_HAS_MEMBARRIER_SYNC_CORE
>   select ARCH_HAS_PMEM_APIif X86_64
> + select ARCH_HAS_PTE_DEVMAP  if X86_64
>   select ARCH_HAS_PTE_SPECIAL
>   select ARCH_HAS_REFCOUNT
>   select ARCH_HAS_UACCESS_FLUSHCACHE  if X86_64
> @@ -69,7 +70,6 @@ config X86
>   select ARCH_HAS_STRICT_MODULE_RWX
>   select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
>   select ARCH_HAS_UBSAN_SANITIZE_ALL
> - select ARCH_HAS_ZONE_DEVICE if X86_64
>   select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
>   select ARCH_MIGHT_HAVE_PC_PARPORT
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 2779ace16d23..89a1f6fd48bf 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -254,7 +254,7 @@ static inline int has_transparent_hugepage(void)
>   return boot_cpu_has(X86_FEATURE_PSE);
>  }
>  
> -#ifdef __HAVE_ARCH_PTE_DEVMAP
> +#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
>  static inline int pmd_devmap(pmd_t pmd)
>  {
>   return !!(pmd_val(pmd) & _PAGE_DEVMAP);
> @@ -715,7 +715,7 @@ static inline int pte_present(pte_t a)
>   return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
>  }
>  
> -#ifdef __HAVE_ARCH_PTE_DEVMAP
> +#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP


>  static inline int pte_devmap(pte_t a)
>  {
>   return (pte_flags(a) & _PAGE_DEVMAP) == _PAGE_DEVMAP;
> diff --git a/arch/x86/include/asm/pgtable_types.h 
> b/arch/x86/include/asm/pgtable_types.h
> index d6ff0bbdb394..b5e49e6bac63 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -103,7 +103,6 @@
>  #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
>  #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX)
>  #define _PAGE_DEVMAP (_AT(u64, 1) << _PAGE_BIT_DEVMAP)
> -#define __HAVE_ARCH_PTE_DEVMAP
>  #else
>  #define _P

Re: [PATCH RESEND 3/3] mm: introduce ARCH_HAS_PTE_DEVMAP

2019-04-12 Thread Dan Williams
On Fri, Apr 12, 2019 at 12:02 PM Robin Murphy  wrote:
>
> ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
> with the long-out-of-date comment can lead to the impression than an
> architecture may just enable it (since __add_pages() now "comprehends
> device memory" for itself) and expect things to work.
>
> In practice, however, ZONE_DEVICE users have little chance of
> functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
> that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
> dependency so the real situation is clearer.

Looks good to me.

Acked-by: Dan Williams 


Re: [PATCH 1/3] mm/memremap: Rename and consolidate SECTION_SIZE

2019-04-12 Thread Dan Williams
On Fri, Apr 12, 2019 at 11:57 AM Robin Murphy  wrote:
>
> Trying to activatee ZONE_DEVICE for arm64 reveals that memremap's

s/activatee/activate/

> internal helpers for sparsemem sections conflict with and arm64's
> definitions for hugepages, which inherit the name of "sections" from
> earlier versions of the ARM architecture.
>
> Disambiguate memremap (and now HMM too) by propagating sparsemem's PA_
> prefix, to clarify that these values are in terms of addresses rather
> than PFNs (and because it's a heck of a lot easier than changing all the
> arch code). SECTION_MASK is unused, so it can just go.

Looks good to me. So good that it collides with a similar change in
the "sub-section" support series.

Acked-by: Dan Williams 


[PATCH 3/3] mm: introduce ARCH_HAS_PTE_DEVMAP

2019-04-12 Thread Robin Murphy
ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
with the long-out-of-date comment can lead to the impression than an
architecture may just enable it (since __add_pages() now "comprehends
device memory" for itself) and expect things to work.

In practice, however, ZONE_DEVICE users have little chance of
functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
dependency so the real situation is clearer.

Signed-off-by: Robin Murphy 
---
 arch/powerpc/Kconfig | 2 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 1 -
 arch/x86/Kconfig | 2 +-
 arch/x86/include/asm/pgtable.h   | 4 ++--
 arch/x86/include/asm/pgtable_types.h | 1 -
 include/linux/mm.h   | 4 ++--
 include/linux/pfn_t.h| 4 ++--
 mm/Kconfig   | 5 ++---
 mm/gup.c | 2 +-
 9 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5e3d0853c31d..77e1993bba80 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -135,6 +135,7 @@ config PPC
select ARCH_HAS_MMIOWB  if PPC64
select ARCH_HAS_PHYS_TO_DMA
select ARCH_HAS_PMEM_APIif PPC64
+   select ARCH_HAS_PTE_DEVMAP  if PPC_BOOK3S_64
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC64
@@ -142,7 +143,6 @@ config PPC
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE  if PPC64
select ARCH_HAS_UBSAN_SANITIZE_ALL
-   select ARCH_HAS_ZONE_DEVICE if PPC_BOOK3S_64
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 581f91be9dd4..02c22ac8f387 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -90,7 +90,6 @@
 #define _PAGE_SOFT_DIRTY   _RPAGE_SW3 /* software: software dirty tracking 
*/
 #define _PAGE_SPECIAL  _RPAGE_SW2 /* software: special page */
 #define _PAGE_DEVMAP   _RPAGE_SW1 /* software: ZONE_DEVICE page */
-#define __HAVE_ARCH_PTE_DEVMAP
 
 /*
  * Drivers request for cache inhibited pte mapping using _PAGE_NO_CACHE
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5ad92419be19..ffd50f27f395 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -60,6 +60,7 @@ config X86
select ARCH_HAS_KCOVif X86_64
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_PMEM_APIif X86_64
+   select ARCH_HAS_PTE_DEVMAP  if X86_64
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_REFCOUNT
select ARCH_HAS_UACCESS_FLUSHCACHE  if X86_64
@@ -69,7 +70,6 @@ config X86
select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
select ARCH_HAS_UBSAN_SANITIZE_ALL
-   select ARCH_HAS_ZONE_DEVICE if X86_64
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_MIGHT_HAVE_PC_PARPORT
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 2779ace16d23..89a1f6fd48bf 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -254,7 +254,7 @@ static inline int has_transparent_hugepage(void)
return boot_cpu_has(X86_FEATURE_PSE);
 }
 
-#ifdef __HAVE_ARCH_PTE_DEVMAP
+#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
 static inline int pmd_devmap(pmd_t pmd)
 {
return !!(pmd_val(pmd) & _PAGE_DEVMAP);
@@ -715,7 +715,7 @@ static inline int pte_present(pte_t a)
return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
 }
 
-#ifdef __HAVE_ARCH_PTE_DEVMAP
+#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
 static inline int pte_devmap(pte_t a)
 {
return (pte_flags(a) & _PAGE_DEVMAP) == _PAGE_DEVMAP;
diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index d6ff0bbdb394..b5e49e6bac63 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -103,7 +103,6 @@
 #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
 #define _PAGE_NX   (_AT(pteval_t, 1) << _PAGE_BIT_NX)
 #define _PAGE_DEVMAP   (_AT(u64, 1) << _PAGE_BIT_DEVMAP)
-#define __HAVE_ARCH_PTE_DEVMAP
 #else
 #define _PAGE_NX   (_AT(pteval_t, 0))
 #define _PAGE_DEVMAP   (_AT(pteval_t, 0))
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d76dfb7ac617..fe05c94f23e9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -504,7 +504,7 @@ struct ino

[PATCH 2/3] mm: clean up is_device_*_page() definitions

2019-04-12 Thread Robin Murphy
Refactor is_device_{public,private}_page() with is_pci_p2pdma_page()
to make them all consistent in depending on their respective config
options even when CONFIG_DEV_PAGEMAP_OPS is enabled for other reasons.
This allows a little more compile-time optimisation as well as the
conceptual and cosmetic cleanup.

Suggested-by: Jerome Glisse 
Signed-off-by: Robin Murphy 
---
 include/linux/mm.h | 43 +--
 1 file changed, 13 insertions(+), 30 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 76769749b5a5..d76dfb7ac617 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -910,32 +910,6 @@ static inline bool put_devmap_managed_page(struct page 
*page)
}
return false;
 }
-
-static inline bool is_device_private_page(const struct page *page)
-{
-   return is_zone_device_page(page) &&
-   page->pgmap->type == MEMORY_DEVICE_PRIVATE;
-}
-
-static inline bool is_device_public_page(const struct page *page)
-{
-   return is_zone_device_page(page) &&
-   page->pgmap->type == MEMORY_DEVICE_PUBLIC;
-}
-
-#ifdef CONFIG_PCI_P2PDMA
-static inline bool is_pci_p2pdma_page(const struct page *page)
-{
-   return is_zone_device_page(page) &&
-   page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA;
-}
-#else /* CONFIG_PCI_P2PDMA */
-static inline bool is_pci_p2pdma_page(const struct page *page)
-{
-   return false;
-}
-#endif /* CONFIG_PCI_P2PDMA */
-
 #else /* CONFIG_DEV_PAGEMAP_OPS */
 static inline void dev_pagemap_get_ops(void)
 {
@@ -949,22 +923,31 @@ static inline bool put_devmap_managed_page(struct page 
*page)
 {
return false;
 }
+#endif /* CONFIG_DEV_PAGEMAP_OPS */
 
 static inline bool is_device_private_page(const struct page *page)
 {
-   return false;
+   return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
+   IS_ENABLED(CONFIG_DEVICE_PRIVATE) &&
+   is_zone_device_page(page) &&
+   page->pgmap->type == MEMORY_DEVICE_PRIVATE;
 }
 
 static inline bool is_device_public_page(const struct page *page)
 {
-   return false;
+   return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
+   IS_ENABLED(CONFIG_DEVICE_PUBLIC) &&
+   is_zone_device_page(page) &&
+   page->pgmap->type == MEMORY_DEVICE_PUBLIC;
 }
 
 static inline bool is_pci_p2pdma_page(const struct page *page)
 {
-   return false;
+   return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
+   IS_ENABLED(CONFIG_PCI_P2PDMA) &&
+   is_zone_device_page(page) &&
+   page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA;
 }
-#endif /* CONFIG_DEV_PAGEMAP_OPS */
 
 static inline void get_page(struct page *page)
 {
-- 
2.21.0.dirty



[PATCH 1/3] mm/memremap: Rename and consolidate SECTION_SIZE

2019-04-12 Thread Robin Murphy
Trying to activatee ZONE_DEVICE for arm64 reveals that memremap's
internal helpers for sparsemem sections conflict with and arm64's
definitions for hugepages, which inherit the name of "sections" from
earlier versions of the ARM architecture.

Disambiguate memremap (and now HMM too) by propagating sparsemem's PA_
prefix, to clarify that these values are in terms of addresses rather
than PFNs (and because it's a heck of a lot easier than changing all the
arch code). SECTION_MASK is unused, so it can just go.

[anshuman: Consolidated mm/hmm.c instance and updated the commit message]

Acked-by: Michal Hocko 
Reviewed-by: David Hildenbrand 
Signed-off-by: Robin Murphy 
Signed-off-by: Anshuman Khandual 
---
 include/linux/mmzone.h |  1 +
 kernel/memremap.c  | 10 --
 mm/hmm.c   |  2 --
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fba7741533be..ed7dd27ee94a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1081,6 +1081,7 @@ static inline unsigned long early_pfn_to_nid(unsigned 
long pfn)
  * PFN_SECTION_SHIFT   pfn to/from section number
  */
 #define PA_SECTION_SHIFT   (SECTION_SIZE_BITS)
+#define PA_SECTION_SIZE(1UL << PA_SECTION_SHIFT)
 #define PFN_SECTION_SHIFT  (SECTION_SIZE_BITS - PAGE_SHIFT)
 
 #define NR_MEM_SECTIONS(1UL << SECTIONS_SHIFT)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index a856cb5ff192..dda1367b385d 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -14,8 +14,6 @@
 #include 
 
 static DEFINE_XARRAY(pgmap_array);
-#define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1)
-#define SECTION_SIZE (1UL << PA_SECTION_SHIFT)
 
 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
 vm_fault_t device_private_entry_fault(struct vm_area_struct *vma,
@@ -98,8 +96,8 @@ static void devm_memremap_pages_release(void *data)
put_page(pfn_to_page(pfn));
 
/* pages are dead and unused, undo the arch mapping */
-   align_start = res->start & ~(SECTION_SIZE - 1);
-   align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
+   align_start = res->start & ~(PA_SECTION_SIZE - 1);
+   align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
- align_start;
 
nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT));
@@ -154,8 +152,8 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
if (!pgmap->ref || !pgmap->kill)
return ERR_PTR(-EINVAL);
 
-   align_start = res->start & ~(SECTION_SIZE - 1);
-   align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
+   align_start = res->start & ~(PA_SECTION_SIZE - 1);
+   align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
- align_start;
align_end = align_start + align_size - 1;
 
diff --git a/mm/hmm.c b/mm/hmm.c
index fe1cd87e49ac..ef9e4e6c9f92 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -33,8 +33,6 @@
 #include 
 #include 
 
-#define PA_SECTION_SIZE (1UL << PA_SECTION_SHIFT)
-
 #if IS_ENABLED(CONFIG_HMM_MIRROR)
 static const struct mmu_notifier_ops hmm_mmu_notifier_ops;
 
-- 
2.21.0.dirty



[PATCH 0/3] Device-memory-related cleanups

2019-04-12 Thread Robin Murphy
Hi,

As promised, these are my preparatory cleanup patches that have so far
fallen out of pmem DAX work for arm64. Patch #1 has already been out for
a ride in Anshuman's hot-remove series, so I've collected the acks
already given.

Since we have various things in flight at the moment touching arm64
pagetable code, I'm wary of conflicts and cross-tree dependencies for
our actual ARCH_HAS_PTE_DEVMAP implementation. Thus it would be nice if
these could be picked up for 5.2 via mm or nvdimm as appropriate, such
that we can then handle the devmap patch itself via arm64 next cycle.

Robin.


Robin Murphy (3):
  mm/memremap: Rename and consolidate SECTION_SIZE
  mm: clean up is_device_*_page() definitions
  mm: introduce ARCH_HAS_PTE_DEVMAP

 arch/powerpc/Kconfig |  2 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h |  1 -
 arch/x86/Kconfig |  2 +-
 arch/x86/include/asm/pgtable.h   |  4 +-
 arch/x86/include/asm/pgtable_types.h |  1 -
 include/linux/mm.h   | 47 +++-
 include/linux/mmzone.h   |  1 +
 include/linux/pfn_t.h|  4 +-
 kernel/memremap.c| 10 ++---
 mm/Kconfig   |  5 +--
 mm/gup.c |  2 +-
 mm/hmm.c |  2 -
 12 files changed, 29 insertions(+), 52 deletions(-)

-- 
2.21.0.dirty



Re: [PATCH] MAINTAINERS: Update remaining @linux.vnet.ibm.com addresses

2019-04-12 Thread Joe Perches
On Thu, 2019-04-11 at 06:12 -0700, Paul E. McKenney wrote:
> If my email address were
> to change again, I would instead go with the "(IBM)" approach and let
> the git log and MAINTAINERS file keep the contact information.  Not that
> we get to update the git log, of course.  ;-)

Add entries to .mailmap works too.




Re: [PATCH v2 00/21] Convert hwmon documentation to ReST

2019-04-12 Thread Guenter Roeck

On 4/12/19 9:04 AM, Jonathan Corbet wrote:

On Thu, 11 Apr 2019 14:07:31 -0700
Guenter Roeck  wrote:


While nobody does such split, IMHO, the best would be to keep the
information outside Documentation/admin-guide. But hey! You're
the Doc maintainer. If you prefer to move, I'm perfectly fine
with that.
   


Same here, but please don't move the files which are kernel facing only.


Well, let's step back and think about this.  Who is the audience for
these documents?  That will tell us a lot about where they should really
be.



Most of them are for users, some of them are for driver developers. A few
are for both, though that is generally not the intention (and one may argue
that driver internal documentation should be moved into the respective
driver source).


What I would prefer to avoid is the status quo where *everything* is in
the top-level directory, and where documents are organized for the
convenience of their maintainers rather than of their readers.  But
sometimes I feel like I'm alone in that desire...:)


I am fine with separating user pointing from kernel API/driver developer
guides, and I agree that it would make a lot of sense. As I said, please
just make sure that kernel facing files don't end up in the wrong directory.

Thanks,
Guenter


Re: [PATCH v2 00/21] Convert hwmon documentation to ReST

2019-04-12 Thread Jonathan Corbet
On Thu, 11 Apr 2019 14:07:31 -0700
Guenter Roeck  wrote:

> > While nobody does such split, IMHO, the best would be to keep the
> > information outside Documentation/admin-guide. But hey! You're
> > the Doc maintainer. If you prefer to move, I'm perfectly fine
> > with that.
> >   
> 
> Same here, but please don't move the files which are kernel facing only.

Well, let's step back and think about this.  Who is the audience for
these documents?  That will tell us a lot about where they should really
be.  

What I would prefer to avoid is the status quo where *everything* is in
the top-level directory, and where documents are organized for the
convenience of their maintainers rather than of their readers.  But
sometimes I feel like I'm alone in that desire...:)

Thanks,

jon


[PATCH v9 2/2] powerpc/64s: KVM update for reimplement book3s idle code in C

2019-04-12 Thread Nicholas Piggin
This is the KVM update to the new idle code. A few improvements:

- Idle sleepers now always return to caller rather than branch out
  to KVM first.
- This allows optimisations like very fast return to caller when no
  state has been lost.
- KVM no longer requires nap_state_lost because it controls NVGPR
  save/restore itself on the way in and out.
- The heavy idle wakeup KVM request check can be moved out of the
  normal host idle code and into the not-performance-critical offline
  code.
- KVM nap code now returns from where it is called, which makes the
  flow a bit easier to follow.
---
 arch/powerpc/include/asm/paca.h |   1 -
 arch/powerpc/kernel/asm-offsets.c   |   1 -
 arch/powerpc/kernel/exceptions-64s.S|  14 ++-
 arch/powerpc/kernel/idle_book3s.S   |  22 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 119 ++--
 arch/powerpc/platforms/powernv/idle.c   |  15 +++
 arch/powerpc/xmon/xmon.c|   3 -
 7 files changed, 93 insertions(+), 82 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index e55dedd7ee3e..245d11a71784 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -174,7 +174,6 @@ struct paca_struct {
u8 io_sync; /* writel() needs spin_unlock sync */
u8 irq_work_pending;/* IRQ_WORK interrupt while 
soft-disable */
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-   u8 nap_state_lost;  /* NV GPR values lost in power7_idle */
u8 pmcregs_in_use;  /* pseries puts this in lppaca */
 #endif
u64 sprg_vdso;  /* Saved user-visible sprg */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 167a59fda12e..83ad99f9f05d 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -268,7 +268,6 @@ int main(void)
OFFSET(ACCOUNT_USER_TIME, paca_struct, accounting.utime);
OFFSET(ACCOUNT_SYSTEM_TIME, paca_struct, accounting.stime);
OFFSET(PACA_TRAP_SAVE, paca_struct, trap_save);
-   OFFSET(PACA_NAPSTATELOST, paca_struct, nap_state_lost);
OFFSET(PACA_SPRG_VDSO, paca_struct, sprg_vdso);
 #else /* CONFIG_PPC64 */
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index c4c50bca12c7..6247b5bbfa5c 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -121,6 +121,8 @@ EXC_VIRT_NONE(0x4000, 0x100)
rlwinm. r10,r10,47-31,30,31 ;   \
beq-1f ;\
cmpwi   cr1,r10,2 ; \
+   mfspr   r3,SPRN_SRR1 ;  \
+   bltlr   cr1 ;   /* no state loss, return to idle caller */  \
BRANCH_TO_C000(r10, system_reset_idle_common) ; \
 1: \
KVMTEST_PR(n) ; \
@@ -144,12 +146,10 @@ TRAMP_KVM(PACA_EXNMI, 0x100)
 
 #ifdef CONFIG_PPC_P7_NAP
 EXC_COMMON_BEGIN(system_reset_idle_common)
-   mfspr   r3,SPRN_SRR1
-#ifndef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-   /* this bltlr could  be moved before the branch_to, and the
-* branch_to could maybe go straight to idle_return */
-   bltlr   cr1 /* no state loss, return to idle caller */
-#endif
+   /*
+* This must be a direct branch (without linker branch stub) because
+* we can not use TOC at this point as r2 may not be restored yet.
+*/
b   idle_return_gpr_loss
 #endif
 
@@ -441,9 +441,7 @@ EXC_COMMON_BEGIN(machine_check_idle_common)
mtlrr4
rlwinm  r10,r3,47-31,30,31
cmpwi   cr1,r10,2
-#ifndef CONFIG_KVM_BOOK3S_HV_POSSIBLE
bltlr   cr1 /* no state loss, return to idle caller */
-#endif
b   idle_return_gpr_loss
 #endif
/*
diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 0fb2eb731a29..2dfbd5d5b932 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -19,9 +19,6 @@
 #include 
 #include 
 #include 
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-#include 
-#endif
 
 /*
  * Desired PSSCR in r3
@@ -93,25 +90,6 @@ _GLOBAL(isa300_idle_stop_mayloss)
  * a simple blr instead).
  */
 _GLOBAL(idle_return_gpr_loss)
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-   lbz r0,HSTATE_HWTHREAD_STATE(r13)
-   cmpwi   r0,KVM_HWTHREAD_IN_KERNEL
-   beq 0f
-   li  r0,KVM_HWTHREAD_IN_KERNEL
-   stb r0,HSTATE_HWTHREAD_STATE(r13)
-   /* Order setting hwthread_state vs. testing hwthread_req */
-   sync
-0: lbz r0,HSTATE_HWTHREAD_REQ(r13)
-   cmpwi   r0,0
-   beq 1f
-   b   kvm_start_guest
-1:
- 

[PATCH v9 1/2] powerpc/64s: reimplement book3s idle code in C

2019-04-12 Thread Nicholas Piggin
Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
speific HV idle code to the powernv platform code.

Book3S assembly stubs are kept in common code and used only to save
the stack frame and non-volatile GPRs before executing architected
idle instructions, and restoring the stack and reloading GPRs then
returning to C after waking from idle.

The complex logic dealing with threads and subcores, locking, SPRs,
HMIs, timebase resync, etc., is all done in C which makes it more
maintainable.

This is not a strict translation to C code, there are some
significant differences:

- Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
  but saves and restores them itself.

- The optimisation where EC=ESL=0 idle modes did not have to save GPRs
  or change MSR is restored, because it's now simple to do. ESL=1
  sleeps that do not lose GPRs can use this optimization too.

- KVM secondary entry and cede is now more of a call/return style
  rather than branchy. nap_state_lost is not required because KVM
  always returns via NVGPR restoring path.

- KVM secondary wakeup from offline sequence is moved entirely into
  the offline wakeup, which avoids a hwsync in the normal idle wakeup
  path.

Performance measured with context switch ping-pong on different
threads or cores, is possibly improved a small amount, 1-3% depending
on stop state and core vs thread test for shallow states. Deep states
it's in the noise compared with other latencies.

Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Nicholas Piggin 

Notes:
- The KVM code has been significantly changed and now actually boots a
  HPT on radix guest with dependent threads mode and >0 secondaries.
  With previous iterations my test wasn't actually catching this case
  and there were some obvious bugs.

  I've broken the KVM code into the second patch just for review. The
  first patch makes KVM kind-of work following its existing design.
  The main thing that's missing from it is deep idle states that lose
  SPRs on the secondaries don't restore them if it's a KVM request
  wakeup. But you can run guests with deep idle states disabled.
  Rather than a significant rework of the code to make that work with
  the new idle code that would need testing, which then gets undone,
  I have just broken it up like this for hopefully easier review of
  the KVM parts. Patches can be squashed together before upstream merge.

- There's so many combinations of KVM modes and options I could use more
  help with review and testing.

- This is not ported up to powerpc next yet.

- P9 restores some of the PMU SPRs, but not others, and P8 only zeroes
  them. There are improvmets to be made to SPR save restore policies and
  documentation, but this first pass tries to keep things as they were.

Left to do:
- Test actual POWER7 hardware.

- More KVM testing and review.

- Port to powerpc next.

Since RFC v1:
- Now tested and working with POWER9 hash and radix.
- KVM support added. This took a bit of work to untangle and might
  still have some issues, but POWER9 seems to work including hash on
  radix with dependent threads mode.
- This snowballed a bit because of KVM and other details making it
  not feasible to leave POWER7/8 code alone. That's only half done
  at the moment.
- So far this trades about 800 lines of asm for 500 of C. With POWER7/8
  support done it might be another hundred or so lines of C.

Since RFC v2:
- Fixed deep state SLB reloading
- Now tested and working with POWER8.
- Accounted for most feedback.

Since RFC v3:
- Rebased to powerpc merge + idle state bugfix
- Split SLB flush/restore code out and shared with MCE code (pseries
  MCE patches can also use).
- More testing on POWER8 including KVM with secondaries.
- Performance testing looks good. EC=ESL=0 is about 5% faster, other
  stop states look a little faster too.
- Adjusted SPR saving to handler POWER7, haven't tested it.

Since v1:
- More review comments from Gautham.
- Rename isa3_ to isa300_ prefix.
- Tinkered with some comments, copyright notice, changelog.
- Cede and regular idle do not go via KVM secondary wakeup code path,
  so hwthread_state stores and barriers can be simplified, and some
  KVM code paths simplified a little.

Since v2:
- Rebase, SLB reload patch has been merged.
- More testing. Tested machine check idle wakeup path with mambo stepping
  through instructions.

Since v3:
- Build fixes caught by CI

Since v4:
- PSSCR test PLS rather than RL (Akshay)

Since v5:
- Fix TB loss test to use PLS instead of RL as well
- Rename hv_loss variable to spr_loss to better describe its usage
- Clamp the SPR loss level to shallower of SPR loss or TB loss in case
  future CPU has that behaviour (P8 type behaviour).
- Added a few more comments.

Since v6:
- Comment improvements
- Remove the restore_cpu() simplification. Now that restore_cpu is not
  called from idle, it can be simplified, however it's not required so
  leave that to a future patch, to avoid risking change to boo

Re: [PATCH stable v4.9 00/35] powerpc spectre backports for 4.9

2019-04-12 Thread Sasha Levin

On Fri, Apr 12, 2019 at 12:28:01PM +1000, Michael Ellerman wrote:

Sasha Levin  writes:

On Thu, Apr 11, 2019 at 09:45:55PM +1000, Michael Ellerman wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Greg,

Please queue up these powerpc patches for 4.9 if you have no objections.

There's one build fix for newer toolchains, and the rest are spectre related.


I've queued it up, thank you.


Thanks. I'll fix my script to generate "Hi Sasha" for v4.9 mails :)


Hah :)

Sasha "Greg" Levin.


Re: [PATCH v5 1/6] iommu: add generic boot option iommu.dma_mode

2019-04-12 Thread Robin Murphy

On 12/04/2019 11:26, John Garry wrote:

On 09/04/2019 13:53, Zhen Lei wrote:

Currently the IOMMU dma contains 3 modes: passthrough, lazy, strict. The
passthrough mode bypass the IOMMU, the lazy mode defer the invalidation
of hardware TLBs, and the strict mode invalidate IOMMU hardware TLBs
synchronously. The three modes are mutually exclusive. But the current
boot options are confused, such as: iommu.passthrough and iommu.strict,
because they are no good to be coexist. So add iommu.dma_mode.

Signed-off-by: Zhen Lei 
---
 Documentation/admin-guide/kernel-parameters.txt | 19 
 drivers/iommu/iommu.c   | 59 
-

 include/linux/iommu.h   |  5 +++
 3 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt

index 2b8ee90bb64470d..f7766f8ac8b9084 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1811,6 +1811,25 @@
 1 - Bypass the IOMMU for DMA.
 unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.

+    iommu.dma_mode= Configure default dma mode. if unset, use the value
+    of CONFIG_IOMMU_DEFAULT_PASSTHROUGH to determine
+    passthrough or not.


To me, for unset it's unclear what we default to. So if unset and also 
CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set, do we get lazy or strict 
mode? (note: I'm ignoring backwards compatibility and interaction of 
iommu.strict and .passthorugh also, more below).


Could we considering introducing config DEFAULT_IOMMU_DMA_MODE, similar 
to DEFAULT_IOSCHED?


Yes, what I was suggesting was specifically refactoring the Kconfig 
options into a single choice that controls the default (i.e. no command 
line option provided) behaviour. AFAICS it should be fairly 
straightforward to maintain the existing "strict" and "passthrough" 
options (and legacy arch-specific versions thereof) to override that 
default without introducing yet another command-line option, which I 
think we should avoid if possible.

+    Note: For historical reasons, ARM64/S390/PPC/X86 have
+    their specific options. Currently, only ARM64 support
+    this boot option, and hope other ARCHs to use this as
+    generic boot option.
+    passthrough
+    Configure DMA to bypass the IOMMU by default.
+    lazy
+    Request that DMA unmap operations use deferred
+    invalidation of hardware TLBs, for increased
+    throughput at the cost of reduced device isolation.
+    Will fall back to strict mode if not supported by
+    the relevant IOMMU driver.
+    strict
+    DMA unmap operations invalidate IOMMU hardware TLBs
+    synchronously.
+
 io7=    [HW] IO7 for Marvel based alpha systems
 See comment before marvel_specify_io7 in
 arch/alpha/kernel/core_marvel.c.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 109de67d5d727c2..df1ce8e22385b48 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -38,12 +38,13 @@

 static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
+
 #ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
-static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
+#define IOMMU_DEFAULT_DMA_MODE    IOMMU_DMA_MODE_PASSTHROUGH
 #else
-static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
+#define IOMMU_DEFAULT_DMA_MODE    IOMMU_DMA_MODE_STRICT
 #endif
-static bool iommu_dma_strict __read_mostly = true;
+static int iommu_default_dma_mode __read_mostly = 
IOMMU_DEFAULT_DMA_MODE;


 struct iommu_callback_data {
 const struct iommu_ops *ops;
@@ -147,20 +148,51 @@ static int __init iommu_set_def_domain_type(char 
*str)

 int ret;

 ret = kstrtobool(str, &pt);
-    if (ret)
-    return ret;
+    if (!ret && pt)
+    iommu_default_dma_mode = IOMMU_DMA_MODE_PASSTHROUGH;

-    iommu_def_domain_type = pt ? IOMMU_DOMAIN_IDENTITY : 
IOMMU_DOMAIN_DMA;

-    return 0;
+    return ret;
 }
 early_param("iommu.passthrough", iommu_set_def_domain_type);

 static int __init iommu_dma_setup(char *str)
 {
-    return kstrtobool(str, &iommu_dma_strict);
+    bool strict;
+    int ret;
+
+    ret = kstrtobool(str, &strict);
+    if (!ret)
+    iommu_default_dma_mode = strict ?
+    IOMMU_DMA_MODE_STRICT : IOMMU_DMA_MODE_LAZY;
+
+    return ret;
 }
 early_param("iommu.strict", iommu_dma_setup);

+static int __init iommu_dma_mode_setup(char *str)
+{
+    if (!str)
+    goto fail;
+
+    if (!strncmp(str, "passthrough", 11))
+    iommu_default_dma_mode = IOMMU_DMA_MODE_PASSTHROUGH;
+    else if (!strncmp(str, "lazy", 4))
+    iommu_default_dma_mode = IOMMU_DMA_MODE_LAZY;
+    else if (!strncmp(str, "strict", 6))
+    iommu_default_dma_mode = IOMMU_DMA_MODE_STRICT;
+    else
+   

Re: [PATCH v8 1/2] powerpc/64s: reimplement book3s idle code in C

2019-04-12 Thread Nicholas Piggin
Satheesh Rajendran's on April 8, 2019 5:32 pm:
> Hi,
> 
> Hit with below kernel crash during Power8 Host boot with this patch series on 
> top
> of powerpc merge branch commit 
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=merge&id=6a821ffee18a6e6c0027c523fa8c958df98ca361
> 
> built with ppc64le_defconfig
> 
> Host Console log:
> [0.454666] EEH: PCI Enhanced I/O Error Handling Enabled
> [0.456524] create_dump_obj: New platform dump. ID = 0x4 Size 7457968
> [0.457627] opal-power: OPAL EPOW, DPO support detected.
> [0.457722] BUG: Unable to handle kernel data access at 0xff76184a
> [0.457733] Faulting instruction address: 0xc001a94c
> [0.457740] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.457745] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
> [0.457750] Modules linked in:
> [0.457756] CPU: 58 PID: 0 Comm: swapper/58 Not tainted 
> 5.1.0-rc2-gd0ae6c548 #1
> [0.457762] NIP:  c001a94c LR: c00a6e9c CTR: 
> c0008000
> [0.457768] REGS: c00f272b7b50 TRAP: 0380   Not tainted  
> (5.1.0-rc2-gd0ae6c548)
> [0.457773] MSR:  90001033   CR: 24004222  
> XER: 
> [0.457781] CFAR: c00a6e98 IRQMASK: 1 
> [0.457781] GPR00: c00a6e9c c00f272b7de0 0004 
> 0006 
> [0.457781] GPR04: c00a5dd4 24004222 c00f272b7d48 
> 0001 
> [0.457781] GPR08: 0002 ff761844 c00f27250c00 
> c3feb1676be1 
> [0.457781] GPR12: 4400 c009d380 c00ffe60ff90 
>  
> [0.457781] GPR16:   c004b4d0 
> c004b4a0 
> [0.457781] GPR20: c1526214 0800 0001 
> c1521b78 
> [0.457781] GPR24: 003a  0008 
>  
> [0.457781] GPR28: c1526140 0001 0400 
> c1525ce0 
> [0.457829] NIP [c001a94c] irq_set_pending_from_srr1+0x1c/0x50
> [0.457835] LR [c00a6e9c] power7_idle+0x3c/0x50
> [0.457839] Call Trace:
> [0.457843] [c00f272b7de0] [c00a6e98] power7_idle+0x38/0x50 
> (unreliable)
> [0.457849] [c00f272b7e00] [c00210f4] arch_cpu_idle+0x54/0x160
> [0.457856] [c00f272b7e30] [c0c47bc4] 
> default_idle_call+0x74/0x88
> [0.457862] [c00f272b7e50] [c0158f54] do_idle+0x2f4/0x3d0
> [0.457868] [c00f272b7ec0] [c0159288] 
> cpu_startup_entry+0x38/0x40
> [0.457874] [c00f272b7ef0] [c004dae4] 
> start_secondary+0x654/0x680
> [0.457881] [c00f272b7f90] [c000b25c] 
> start_secondary_prolog+0x10/0x14
> [0.457886] Instruction dump:
> [0.457890] 992d098b 7c630034 5463d97e 4e800020 6000 3c4c014d 38424dd0 
> 7c0802a6 
> [0.457898] 6000 3d22ff76 78637722 39291840 
> [0.457900] BUG: Unable to handle kernel data access at 0xff76184a
> [0.457901] <7d4918ae> 2b8a00ff 419e001c 892d098b 
> [0.457907] Faulting instruction address: 0xc001a94c
> [0.457910] BUG: Unable to handle kernel data access at 0xff76184a
> [0.457915] ---[ end trace fa7343cfd21c8798 ]---
> [0.457919] Faulting instruction address: 0xc001a94c
> [0.458961] BUG: Unable to handle kernel data access at 0xff76184a
> [0.458963] BUG: Unable to handle kernel data access at 0xff76184a
> [0.458964] BUG: Unable to handle kernel data access at 0xff76184a
> [0.458966] BUG: Unable to handle kernel data access at 0xff76184a
> [0.458968] BUG: Unable to handle kernel data access at 0xff76184a
> [0.458970] BUG: Unable to handle kernel data access at 0xff76184a
> [0.458972] Faulting instruction address: 0xc001a94c
> [0.458973] Faulting instruction address: 0xc001a94c
> [0.458974] Faulting instruction address: 0xc001a94c
> [0.458975] Faulting instruction address: 0xc001a94c
> [0.458976] Faulting instruction address: 0xc001a94c
> [0.458978] initcall 
> __machine_initcall_powernv_pnv_init_idle_states+0x0/0xb30 returned 0 after 0 
> usecs
> [0.458981] calling  __machine_initcall_powernv_opal_time_init+0x0/0x150 @ 
> 1
> [0.458982] Faulting instruction address: 0xc001a94c
> [0.459022] BUG: Unable to handle kernel data access at 0xff76184a
> [0.459040] Faulting instruction address: 0xc001a94c
> [0.459043] initcall __machine_initcall_powernv_opal_time_init+0x0/0x150 
> returned 0 after 0 usecs
> [0.459044] BUG: Unable to handle kernel data access at 0xff76184c
> [0.459045] Faulting instruction address: 0xc001a94c
> [0.459060] calling  __machine_initcall_powernv_rng_init+0x0/0x334 @ 1
> [0.459084] powernv-rng: Registering arch random hook.
> [ 

Re: [PATCH v5 1/6] iommu: add generic boot option iommu.dma_mode

2019-04-12 Thread John Garry

On 09/04/2019 13:53, Zhen Lei wrote:

Currently the IOMMU dma contains 3 modes: passthrough, lazy, strict. The
passthrough mode bypass the IOMMU, the lazy mode defer the invalidation
of hardware TLBs, and the strict mode invalidate IOMMU hardware TLBs
synchronously. The three modes are mutually exclusive. But the current
boot options are confused, such as: iommu.passthrough and iommu.strict,
because they are no good to be coexist. So add iommu.dma_mode.

Signed-off-by: Zhen Lei 
---
 Documentation/admin-guide/kernel-parameters.txt | 19 
 drivers/iommu/iommu.c   | 59 -
 include/linux/iommu.h   |  5 +++
 3 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 2b8ee90bb64470d..f7766f8ac8b9084 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1811,6 +1811,25 @@
1 - Bypass the IOMMU for DMA.
unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.

+   iommu.dma_mode= Configure default dma mode. if unset, use the value
+   of CONFIG_IOMMU_DEFAULT_PASSTHROUGH to determine
+   passthrough or not.


To me, for unset it's unclear what we default to. So if unset and also 
CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set, do we get lazy or strict 
mode? (note: I'm ignoring backwards compatibility and interaction of 
iommu.strict and .passthorugh also, more below).


Could we considering introducing config DEFAULT_IOMMU_DMA_MODE, similar 
to DEFAULT_IOSCHED?



+   Note: For historical reasons, ARM64/S390/PPC/X86 have
+   their specific options. Currently, only ARM64 support
+   this boot option, and hope other ARCHs to use this as
+   generic boot option.
+   passthrough
+   Configure DMA to bypass the IOMMU by default.
+   lazy
+   Request that DMA unmap operations use deferred
+   invalidation of hardware TLBs, for increased
+   throughput at the cost of reduced device isolation.
+   Will fall back to strict mode if not supported by
+   the relevant IOMMU driver.
+   strict
+   DMA unmap operations invalidate IOMMU hardware TLBs
+   synchronously.
+
io7=[HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 109de67d5d727c2..df1ce8e22385b48 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -38,12 +38,13 @@

 static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
+
 #ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
-static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
+#define IOMMU_DEFAULT_DMA_MODE IOMMU_DMA_MODE_PASSTHROUGH
 #else
-static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
+#define IOMMU_DEFAULT_DMA_MODE IOMMU_DMA_MODE_STRICT
 #endif
-static bool iommu_dma_strict __read_mostly = true;
+static int iommu_default_dma_mode __read_mostly = IOMMU_DEFAULT_DMA_MODE;

 struct iommu_callback_data {
const struct iommu_ops *ops;
@@ -147,20 +148,51 @@ static int __init iommu_set_def_domain_type(char *str)
int ret;

ret = kstrtobool(str, &pt);
-   if (ret)
-   return ret;
+   if (!ret && pt)
+   iommu_default_dma_mode = IOMMU_DMA_MODE_PASSTHROUGH;

-   iommu_def_domain_type = pt ? IOMMU_DOMAIN_IDENTITY : IOMMU_DOMAIN_DMA;
-   return 0;
+   return ret;
 }
 early_param("iommu.passthrough", iommu_set_def_domain_type);

 static int __init iommu_dma_setup(char *str)
 {
-   return kstrtobool(str, &iommu_dma_strict);
+   bool strict;
+   int ret;
+
+   ret = kstrtobool(str, &strict);
+   if (!ret)
+   iommu_default_dma_mode = strict ?
+   IOMMU_DMA_MODE_STRICT : IOMMU_DMA_MODE_LAZY;
+
+   return ret;
 }
 early_param("iommu.strict", iommu_dma_setup);

+static int __init iommu_dma_mode_setup(char *str)
+{
+   if (!str)
+   goto fail;
+
+   if (!strncmp(str, "passthrough", 11))
+   iommu_default_dma_mode = IOMMU_DMA_MODE_PASSTHROUGH;
+   else if (!strncmp(str, "lazy", 4))
+   iommu_default_dma_mode = IOMMU_DMA_MODE_LAZY;
+   else if (!strncmp(str, "strict", 6))
+   iommu_default_dma_mode = IOMMU_DMA_MODE_STRICT;
+   else
+   goto fail;
+
+   pr_info("Force dma mode to be %d\n", iommu_default_dma_mode);


What happens if the cmdline option iommu.dma_mo

Re: [PATCH v5 1/6] iommu: add generic boot option iommu.dma_mode

2019-04-12 Thread Joerg Roedel
On Tue, Apr 09, 2019 at 08:53:03PM +0800, Zhen Lei wrote:
> +static int __init iommu_dma_mode_setup(char *str)
> +{
> + if (!str)
> + goto fail;
> +
> + if (!strncmp(str, "passthrough", 11))
> + iommu_default_dma_mode = IOMMU_DMA_MODE_PASSTHROUGH;
> + else if (!strncmp(str, "lazy", 4))
> + iommu_default_dma_mode = IOMMU_DMA_MODE_LAZY;
> + else if (!strncmp(str, "strict", 6))
> + iommu_default_dma_mode = IOMMU_DMA_MODE_STRICT;
> + else
> + goto fail;
> +
> + pr_info("Force dma mode to be %d\n", iommu_default_dma_mode);

Printing a number is not very desriptive or helpful to the user. Please
print the name of the mode instead.


Regards,

Joerg


[PATCH] powerpc/booke64: set RI in default MSR

2019-04-12 Thread laurentiu . tudor
From: Laurentiu Tudor 

Set SI in the default kernel's MSR so that the architected way of
detecting unrecoverable machine check interrupts has a chance to work.
This is inline with the MSR setup of the rest of booke powerpc
architectures configured here.

Signed-off-by: Laurentiu Tudor 
Cc: sta...@vger.kernel.org
---
 arch/powerpc/include/asm/reg_booke.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index eb2a33d5df26..e382bd6ede84 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -41,7 +41,7 @@
 #if defined(CONFIG_PPC_BOOK3E_64)
 #define MSR_64BIT  MSR_CM
 
-#define MSR_   (MSR_ME | MSR_CE)
+#define MSR_   (MSR_ME | MSR_RI | MSR_CE)
 #define MSR_KERNEL (MSR_ | MSR_64BIT)
 #define MSR_USER32 (MSR_ | MSR_PR | MSR_EE)
 #define MSR_USER64 (MSR_USER32 | MSR_64BIT)
-- 
2.17.1



Re: [PATCH] Linux: Define struct termios2 in under _GNU_SOURCE [BZ #10339]

2019-04-12 Thread Florian Weimer
* Adhemerval Zanella:

> On 11/04/2019 08:07, Florian Weimer wrote:
>> * Adhemerval Zanella:
>> 
>>> This allows us to adjust the baud rates to non-standard values using termios
>>> interfaces without to resorting to add new headers and use a different API
>>> (ioctl).
>> 
>> How much symbol versioning will be required for this change?
>
> I think all interfaces that have termios as input for sparc and mips 
> (tcgetattr, tcsetattr, cfmakeraw, cfgetispeed, cfgetospeed, cfsetispeed,
> cfsetospeed, cfsetspeed).
>
> Alpha will also need to use termios1 for pre-4.20 kernels.

So only new symbol versions there?  Hmm.

>>> As Peter Anvin has indicated, he create a POC [1] with the aforementioned
>>> new interfaces.  It has not been rebased against master, more specially 
>>> against
>>> my termios refactor to simplify the multiple architecture header 
>>> definitions,
>>> but I intend to use as a base.
>> 
>> Reference [1] is still missing. 8-(
>
> Oops... it is 
> https://git.zytor.com/users/hpa/glibc/termbaud.git/log/?h=wip.termbaud

This doesn't really illuminate things.  “Drop explicit baud setting
interfaces in favor of cfenc|decspeed()” removes the new symbol version
for the cf* functions.

My gut feeling is that it's safer to add new interfaces, based on the
actual kernel/userspace interface, rather than trying to fix up existing
interfaces with symbol versioning.  The main reason is that code
involving serial interfaces is difficult to test, so it will take years
until we find the last application broken by the glibc interface bump.

I don't feel strongly about this.  This came out of a request for
enabling TCGETS2 support downstream.  If I can't fix this upstream, I
will just reject that request.

Thanks,
Florian


Re: [PATCH kernel RFC 0/2] powerpc/ioda2: An attempt to allow DMA masks between 32 and 59

2019-04-12 Thread Russell Currey
I'm gonna try and benchmark this on a few different devices for performance, 
with 64k TCEs (as is), with larger TCE sizes, and against sketchy bypass.  
Hopefully if performance isn't too far off, we can get rid of sketchy bypass 
entirely and have a more robust solution.

-- 
  Russell Currey
  rus...@russell.cc

On Fri, Apr 12, 2019, at 4:44 PM, Alexey Kardashevskiy wrote:
> 
> This is an attempt to allow DMA mask 40 or similar which are not large
> enough to use either a PHB3 bypass mode or a sketchy bypass.
> 
> This is based on sha1
> 582549e3fbe1 Linus Torvalds Merge tag 'for-linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma 
> 
> Please comment. Thanks.
> 
> 
> 
> Alexey Kardashevskiy (2):
>   powerpc/powernv/ioda: Allocate TCE table levels on demand for default
> DMA window
>   powerpc/powernv/ioda2: Create bigger default window with 64k IOMMU
> pages
> 
>  arch/powerpc/include/asm/iommu.h  |  8 ++-
>  arch/powerpc/platforms/powernv/pci.h  |  2 +-
>  arch/powerpc/kernel/iommu.c   | 58 +--
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 19 +++---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 14 -
>  5 files changed, 66 insertions(+), 35 deletions(-)
> 
> -- 
> 2.17.1
> 
>