[PATCH] powerpc: vdso: Make vdso32 installation conditional in vdso_install

2019-03-21 Thread Ben Hutchings
The 32-bit vDSO is not needed and not normally built for 64-bit
little-endian configurations.  However, the vdso_install target still
builds and installs it.  Add the same config condition as is normally
used for the build.

Fixes: e0d005916994 ("powerpc/vdso: Disable building the 32-bit VDSO ...")
Signed-off-by: Ben Hutchings 
---
 arch/powerpc/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 488c9edffa58..3def265cf1cf 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -406,7 +406,9 @@ endef
 ifdef CONFIG_PPC64
$(Q)$(MAKE) $(build)=arch/$(ARCH)/kernel/vdso64 $@
 endif
+ifdef CONFIG_VDSO32
$(Q)$(MAKE) $(build)=arch/$(ARCH)/kernel/vdso32 $@
+endif
 
 archclean:
$(Q)$(MAKE) $(clean)=$(boot)


signature.asc
Description: PGP signature


[PATCH linux-next v8 5/7] powerpc: define syscall_get_error()

2019-03-21 Thread Dmitry V. Levin
syscall_get_error() is required to be implemented on this
architecture in addition to already implemented syscall_get_nr(),
syscall_get_arguments(), syscall_get_return_value(), and
syscall_get_arch() functions in order to extend the generic
ptrace API with PTRACE_GET_SYSCALL_INFO request.

Cc: Michael Ellerman 
Cc: Elvira Khabirova 
Cc: Eugene Syromyatnikov 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Oleg Nesterov 
Cc: Andy Lutomirski 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Dmitry V. Levin 
---

Notes:
v8: unchanged
v7: unchanged
v6: unchanged
v5: initial revision

This change has been tested with
tools/testing/selftests/ptrace/get_syscall_info.c and strace,
so it's correct from PTRACE_GET_SYSCALL_INFO point of view.

This cast doubts on commit v4.3-rc1~86^2~81 that changed
syscall_set_return_value() in a way that doesn't quite match
syscall_get_error(), but syscall_set_return_value() is out
of scope of this series, so I'll just let you know my concerns.

 arch/powerpc/include/asm/syscall.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/syscall.h 
b/arch/powerpc/include/asm/syscall.h
index efb50429c9f4..7375808c566c 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -38,6 +38,16 @@ static inline void syscall_rollback(struct task_struct *task,
regs->gpr[3] = regs->orig_gpr3;
 }
 
+static inline long syscall_get_error(struct task_struct *task,
+struct pt_regs *regs)
+{
+   /*
+* If the system call failed,
+* regs->gpr[3] contains a positive ERRORCODE.
+*/
+   return (regs->ccr & 0x1000UL) ? -regs->gpr[3] : 0;
+}
+
 static inline long syscall_get_return_value(struct task_struct *task,
struct pt_regs *regs)
 {
-- 
ldv


[PATCH linux-next v8 0/7] ptrace: add PTRACE_GET_SYSCALL_INFO request

2019-03-21 Thread Dmitry V. Levin
PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
details of the syscall the tracee is blocked in.

There are two reasons for a special syscall-related ptrace request.

Firstly, with the current ptrace API there are cases when ptracer cannot
retrieve necessary information about syscalls.  Some examples include:
* The notorious int-0x80-from-64-bit-task issue.  See [1] for details.
In short, if a 64-bit task performs a syscall through int 0x80, its tracer
has no reliable means to find out that the syscall was, in fact,
a compat syscall, and misidentifies it.
* Syscall-enter-stop and syscall-exit-stop look the same for the tracer.
Common practice is to keep track of the sequence of ptrace-stops in order
not to mix the two syscall-stops up.  But it is not as simple as it looks;
for example, strace had a (just recently fixed) long-standing bug where
attaching strace to a tracee that is performing the execve system call
led to the tracer identifying the following syscall-exit-stop as
syscall-enter-stop, which messed up all the state tracking.
* Since the introduction of commit 84d77d3f06e7e8dea057d10e8ec77ad71f721be3
("ptrace: Don't allow accessing an undumpable mm"), both PTRACE_PEEKDATA
and process_vm_readv become unavailable when the process dumpable flag
is cleared.  On such architectures as ia64 this results in all syscall
arguments being unavailable for the tracer.

Secondly, ptracers also have to support a lot of arch-specific code for
obtaining information about the tracee.  For some architectures, this
requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
argument and return value.

PTRACE_GET_SYSCALL_INFO returns the following structure:

struct ptrace_syscall_info {
__u8 op;/* PTRACE_SYSCALL_INFO_* */
__u32 arch __attribute__((__aligned__(sizeof(__u32;
__u64 instruction_pointer;
__u64 stack_pointer;
union {
struct {
__u64 nr;
__u64 args[6];
} entry;
struct {
__s64 rval;
__u8 is_error;
} exit;
struct {
__u64 nr;
__u64 args[6];
__u32 ret_data;
} seccomp;
};
};

The structure was chosen according to [2], except for the following
changes:
* seccomp substructure was added as a superset of entry substructure;
* the type of nr field was changed from int to __u64 because syscall
numbers are, as a practical matter, 64 bits;
* stack_pointer field was added along with instruction_pointer field
since it is readily available and can save the tracer from extra
PTRACE_GETREGS/PTRACE_GETREGSET calls;
* arch is always initialized to aid with tracing system calls
* such as execve();
* instruction_pointer and stack_pointer are always initialized
so they could be easily obtained for non-syscall stops;
* a boolean is_error field was added along with rval field, this way
the tracer can more reliably distinguish a return value
from an error value.

strace has been ported to PTRACE_GET_SYSCALL_INFO.
Starting with release 4.26, strace uses PTRACE_GET_SYSCALL_INFO API
as the preferred mechanism of obtaining syscall information.

[1] 
https://lore.kernel.org/lkml/ca+55afzcsvmddj9lh_gdbz1ozhyem6zrgpbdajnywm2lf_e...@mail.gmail.com/
[2] 
https://lore.kernel.org/lkml/caobl_7gm0n80n7j_dfw_eqyflyzq+sf4y2avsccv88tb3aw...@mail.gmail.com/

---

Notes:
v8:
* Moved syscall_get_arch() specific patches to a separate patchset
  which is now merged into audit/next tree.
* Rebased to linux-next.
* Moved ptrace_get_syscall_info code under #ifdef 
CONFIG_HAVE_ARCH_TRACEHOOK,
  narrowing down the set of architectures supported by this implementation
  back to those 19 that enable CONFIG_HAVE_ARCH_TRACEHOOK because
  I failed to get all syscall_get_*(), instruction_pointer(),
  and user_stack_pointer() functions implemented on some niche
  architectures.  This leaves the following architectures out:
  alpha, h8300, m68k, microblaze, and unicore32.

v7:
* Rebased to v5.0-rc1.
* 5 arch-specific preparatory patches out of 25 have been merged
  into v5.0-rc1 via arch trees.

v6:
* Add syscall_get_arguments and syscall_set_arguments wrappers
  to asm-generic/syscall.h, requested by Geert.
* Change PTRACE_GET_SYSCALL_INFO return code: do not take trailing paddings
  into account, use the end of the last field of the structure being 
written.
* Change struct ptrace_syscall_info:
  * remove .frame_pointer field, is is not needed and not portable;
  * make .arch field explicitly aligned, remove no longer needed
padding before .arch field;
  * remove trailing pads, they are no longer needed.

v5:
* Merge separate series and patches into the single series.
* Change 

Re: [PATCH kernel RFC 2/2] vfio-pci-nvlink2: Implement interconnect isolation

2019-03-21 Thread David Gibson
On Thu, Mar 21, 2019 at 12:19:34PM -0600, Alex Williamson wrote:
> On Thu, 21 Mar 2019 10:56:00 +1100
> David Gibson  wrote:
> 
> > On Wed, Mar 20, 2019 at 01:09:08PM -0600, Alex Williamson wrote:
> > > On Wed, 20 Mar 2019 15:38:24 +1100
> > > David Gibson  wrote:
> > >   
> > > > On Tue, Mar 19, 2019 at 10:36:19AM -0600, Alex Williamson wrote:  
> > > > > On Fri, 15 Mar 2019 19:18:35 +1100
> > > > > Alexey Kardashevskiy  wrote:
> > > > > 
> > > > > > The NVIDIA V100 SXM2 GPUs are connected to the CPU via PCIe links 
> > > > > > and
> > > > > > (on POWER9) NVLinks. In addition to that, GPUs themselves have 
> > > > > > direct
> > > > > > peer to peer NVLinks in groups of 2 to 4 GPUs. At the moment the 
> > > > > > POWERNV
> > > > > > platform puts all interconnected GPUs to the same IOMMU group.
> > > > > > 
> > > > > > However the user may want to pass individual GPUs to the userspace 
> > > > > > so
> > > > > > in order to do so we need to put them into separate IOMMU groups and
> > > > > > cut off the interconnects.
> > > > > > 
> > > > > > Thankfully V100 GPUs implement an interface to do by programming 
> > > > > > link
> > > > > > disabling mask to BAR0 of a GPU. Once a link is disabled in a GPU 
> > > > > > using
> > > > > > this interface, it cannot be re-enabled until the secondary bus 
> > > > > > reset is
> > > > > > issued to the GPU.
> > > > > > 
> > > > > > This defines a reset_done() handler for V100 NVlink2 device which
> > > > > > determines what links need to be disabled. This relies on presence
> > > > > > of the new "ibm,nvlink-peers" device tree property of a GPU telling 
> > > > > > which
> > > > > > PCI peers it is connected to (which includes NVLink bridges or peer 
> > > > > > GPUs).
> > > > > > 
> > > > > > This does not change the existing behaviour and instead adds
> > > > > > a new "isolate_nvlink" kernel parameter to allow such isolation.
> > > > > > 
> > > > > > The alternative approaches would be:
> > > > > > 
> > > > > > 1. do this in the system firmware (skiboot) but for that we would 
> > > > > > need
> > > > > > to tell skiboot via an additional OPAL call whether or not we want 
> > > > > > this
> > > > > > isolation - skiboot is unaware of IOMMU groups.
> > > > > > 
> > > > > > 2. do this in the secondary bus reset handler in the POWERNV 
> > > > > > platform -
> > > > > > the problem with that is at that point the device is not enabled, 
> > > > > > i.e.
> > > > > > config space is not restored so we need to enable the device (i.e. 
> > > > > > MMIO
> > > > > > bit in CMD register + program valid address to BAR0) in order to 
> > > > > > disable
> > > > > > links and then perhaps undo all this initialization to bring the 
> > > > > > device
> > > > > > back to the state where pci_try_reset_function() expects it to be.  
> > > > > >   
> > > > > 
> > > > > The trouble seems to be that this approach only maintains the 
> > > > > isolation
> > > > > exposed by the IOMMU group when vfio-pci is the active driver for the
> > > > > device.  IOMMU groups can be used by any driver and the IOMMU core is
> > > > > incorporating groups in various ways.
> > > > 
> > > > I don't think that reasoning is quite right.  An IOMMU group doesn't
> > > > necessarily represent devices which *are* isolated, just devices which
> > > > *can be* isolated.  There are plenty of instances when we don't need
> > > > to isolate devices in different IOMMU groups: passing both groups to
> > > > the same guest or userspace VFIO driver for example, or indeed when
> > > > both groups are owned by regular host kernel drivers.
> > > > 
> > > > In at least some of those cases we also don't want to isolate the
> > > > devices when we don't have to, usually for performance reasons.  
> > > 
> > > I see IOMMU groups as representing the current isolation of the device,
> > > not just the possible isolation.  If there are ways to break down that
> > > isolation then ideally the group would be updated to reflect it.  The
> > > ACS disable patches seem to support this, at boot time we can choose to
> > > disable ACS at certain points in the topology to favor peer-to-peer
> > > performance over isolation.  This is then reflected in the group
> > > composition, because even though ACS *can be* enabled at the given
> > > isolation points, it's intentionally not with this option.  Whether or
> > > not a given user who owns multiple devices needs that isolation is
> > > really beside the point, the user can choose to connect groups via IOMMU
> > > mappings or reconfigure the system to disable ACS and potentially more
> > > direct routing.  The IOMMU groups are still accurately reflecting the
> > > topology and IOMMU based isolation.  
> > 
> > Huh, ok, I think we need to straighten this out.  Thinking of iommu
> > groups as possible rather than potential isolation was a conscious
> 
> possible ~= potential

Sorry, I meant "current" not "potential".

> > decision on my part when we were first coming up with 

[PATCH 3/5] powerpc/powernv: fix possible object reference leak

2019-03-21 Thread Wen Yang
The call to of_find_node_by_path returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.

Detected by coccinelle with the following warnings:
./arch/powerpc/platforms/powernv/opal.c:741:2-8: ERROR: missing of_node_put; 
acquired a node pointer with refcount incremented on line 733, but without a 
corresponding object release within this function.

Signed-off-by: Wen Yang 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Mike Rapoport 
Cc: Andrew Morton 
Cc: Mahesh Salgaonkar 
Cc: Haren Myneni 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
---
 arch/powerpc/platforms/powernv/opal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 2b0eca1..d7736a5 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -738,6 +738,7 @@ static void opal_export_attrs(void)
kobj = kobject_create_and_add("exports", opal_kobj);
if (!kobj) {
pr_warn("kobject_create_and_add() of exports failed\n");
+   of_node_put(np);
return;
}
 
-- 
2.9.5



[PATCH 5/5] powerpc/8xx: fix possible object reference leak

2019-03-21 Thread Wen Yang
The call to of_find_compatible_node returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
irq_domain_add_linear also calls of_node_get to increase refcount,
so irq_domain will not be affected when it is released.

Detected by coccinelle with the following warnings:
./arch/powerpc/platforms/8xx/pic.c:158:1-7: ERROR: missing of_node_put; 
acquired a node pointer with refcount incremented on line 136, but without a 
corresponding object release within this function.

Signed-off-by: Wen Yang 
Cc: Vitaly Bordug 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
---
 arch/powerpc/platforms/8xx/pic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/8xx/pic.c b/arch/powerpc/platforms/8xx/pic.c
index 8d5a25d..13d880b 100644
--- a/arch/powerpc/platforms/8xx/pic.c
+++ b/arch/powerpc/platforms/8xx/pic.c
@@ -155,6 +155,7 @@ int mpc8xx_pic_init(void)
ret = -ENOMEM;
goto out;
}
+   of_node_put(np);
return 0;
 
 out:
-- 
2.9.5



[PATCH 4/5] powerpc/embedded6xx: fix possible object reference leak

2019-03-21 Thread Wen Yang
The call to of_find_compatible_node returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.

Detected by coccinelle with the following warnings:
./arch/powerpc/platforms/embedded6xx/mvme5100.c:89:2-8: ERROR: missing 
of_node_put; acquired a node pointer with refcount incremented on line 80, but 
without a corresponding object release within this function.

Signed-off-by: Wen Yang 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
---
 arch/powerpc/platforms/embedded6xx/mvme5100.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/embedded6xx/mvme5100.c 
b/arch/powerpc/platforms/embedded6xx/mvme5100.c
index 273dfa3..660654f4 100644
--- a/arch/powerpc/platforms/embedded6xx/mvme5100.c
+++ b/arch/powerpc/platforms/embedded6xx/mvme5100.c
@@ -86,6 +86,7 @@ static void __init mvme5100_pic_init(void)
cirq = irq_of_parse_and_map(cp, 0);
if (!cirq) {
pr_warn("mvme5100_pic_init: no cascade interrupt?\n");
+   of_node_put(cp);
return;
}
 
-- 
2.9.5



[PATCH 2/5] powerpc/83xx: fix possible object reference leak

2019-03-21 Thread Wen Yang
The call to of_find_node_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.

Detected by coccinelle with the following warnings:
./arch/powerpc/platforms/83xx/km83xx.c:68:2-8: ERROR: missing of_node_put; 
acquired a node pointer with refcount incremented on line 59, but without a 
corresponding object release within this function.

Signed-off-by: Wen Yang 
Cc: Scott Wood 
Cc: Kumar Gala 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
---
 arch/powerpc/platforms/83xx/km83xx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/83xx/km83xx.c 
b/arch/powerpc/platforms/83xx/km83xx.c
index d8642a4..11eea7c 100644
--- a/arch/powerpc/platforms/83xx/km83xx.c
+++ b/arch/powerpc/platforms/83xx/km83xx.c
@@ -65,6 +65,7 @@ static void quirk_mpc8360e_qe_enet10(void)
ret = of_address_to_resource(np_par, 0, );
if (ret) {
pr_warn("%s couldn;t map par_io registers\n", __func__);
+   of_node_put(np_par);
return;
}
 
-- 
2.9.5



[PATCH 1/5] powerpc/sparse: fix possible object reference leak

2019-03-21 Thread Wen Yang
The call to of_find_node_by_path returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.

Detected by coccinelle with the following warnings:
./arch/powerpc/platforms/pseries/pseries_energy.c:101:1-7: ERROR: missing 
of_node_put; acquired a node pointer with refcount incremented on line 46, but 
without a corresponding object release within this function.
./arch/powerpc/platforms/pseries/pseries_energy.c:172:1-7: ERROR: missing 
of_node_put; acquired a node pointer with refcount incremented on line 111, but 
without a corresponding object release within this function.

Signed-off-by: Wen Yang 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
---
 arch/powerpc/platforms/pseries/pseries_energy.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c 
b/arch/powerpc/platforms/pseries/pseries_energy.c
index 6ed2212..e3913e4 100644
--- a/arch/powerpc/platforms/pseries/pseries_energy.c
+++ b/arch/powerpc/platforms/pseries/pseries_energy.c
@@ -69,7 +69,7 @@ static u32 cpu_to_drc_index(int cpu)
 
of_read_drc_info_cell(, , );
if (strncmp(drc.drc_type, "CPU", 3))
-   goto err;
+   goto err_of_node_put;
 
if (thread_index < drc.last_drc_index)
break;
@@ -131,7 +131,7 @@ static int drc_index_to_cpu(u32 drc_index)
 
of_read_drc_info_cell(, , );
if (strncmp(drc.drc_type, "CPU", 3))
-   goto err;
+   goto err_of_node_put;
 
if (drc_index > drc.last_drc_index) {
cpu += drc.num_sequential_elems;
-- 
2.9.5



Re: [PATCH] arch/powerpc/crypto/crc-vpmsum_test: Use cheaper random numbers for self-test

2019-03-21 Thread Daniel Axtens
Hi George,

> This code was filling a 64K buffer from /dev/urandom in order to
> compute a CRC over (on average half of) it by two different methods,
> comparing the CRCs, and repeating.
>
> This is not a remotely security-critical application, so use the far
> faster and cheaper prandom_u32() generator.
>

I've had a quick look at the prandom_u32 generator and I agree that it's
suitable for this. 

> And, while we're at it, only fill as much of the buffer as we plan to use.

This also looks good to me.

Acked-by: Daniel Axtens 

Regards,
Daniel

>
> Signed-off-by: George Spelvin 
> Cc: Daniel Axtens 
> Cc: Herbert Xu 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> ---
>  arch/powerpc/crypto/crc-vpmsum_test.c | 10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/crypto/crc-vpmsum_test.c 
> b/arch/powerpc/crypto/crc-vpmsum_test.c
> index 0153a9c6f4af..98ea4f4d3dde 100644
> --- a/arch/powerpc/crypto/crc-vpmsum_test.c
> +++ b/arch/powerpc/crypto/crc-vpmsum_test.c
> @@ -78,16 +78,12 @@ static int __init crc_test_init(void)
>  
>   pr_info("crc-vpmsum_test begins, %lu iterations\n", iterations);
>   for (i=0; i - size_t len, offset;
> + size_t offset = prandom_u32_max(16);
> + size_t len = prandom_u32_max(MAX_CRC_LENGTH);
>  
> - get_random_bytes(data, MAX_CRC_LENGTH);
> - get_random_bytes(, sizeof(len));
> - get_random_bytes(, sizeof(offset));
> -
> - len %= MAX_CRC_LENGTH;
> - offset &= 15;
>   if (len <= offset)
>   continue;
> + prandom_bytes(data, len);
>   len -= offset;
>  
>   crypto_shash_update(crct10dif_shash, data+offset, len);
> -- 
> 2.20.1


Re: [PATCH v5 05/10] powerpc: Add a framework for Kernel Userspace Access Protection

2019-03-21 Thread Michael Ellerman
Christophe Leroy  writes:

> Le 20/03/2019 à 13:57, Michael Ellerman a écrit :
>> Christophe Leroy  writes:
>>> Le 08/03/2019 à 02:16, Michael Ellerman a écrit :
 From: Christophe Leroy 

 This patch implements a framework for Kernel Userspace Access
 Protection.

 Then subarches will have the possibility to provide their own
 implementation by providing setup_kuap() and
 allow/prevent_user_access().

 Some platforms will need to know the area accessed and whether it is
 accessed from read, write or both. Therefore source, destination and
 size and handed over to the two functions.

 mpe: Rename to allow/prevent rather than unlock/lock, and add
 read/write wrappers. Drop the 32-bit code for now until we have an
 implementation for it. Add kuap to pt_regs for 64-bit as well as
 32-bit. Don't split strings, use pr_crit_ratelimited().

 Signed-off-by: Christophe Leroy 
 Signed-off-by: Russell Currey 
 Signed-off-by: Michael Ellerman 
 ---
 v5: Futex ops need read/write so use allow_user_acccess() there.
   Use #ifdef CONFIG_PPC64 in kup.h to fix build errors.
   Allow subarch to override allow_read/write_from/to_user().
>>>
>>> Those little helpers that will just call allow_user_access() when
>>> distinct read/write handling is not performed looks overkill to me.
>>>
>>> Can't the subarch do it by itself based on the nullity of from/to ?
>>>
>>> static inline void allow_user_access(void __user *to, const void __user
>>> *from,
>>>  unsigned long size)
>>> {
>>> if (to & from)
>>> set_kuap(0);
>>> else if (to)
>>> set_kuap(AMR_KUAP_BLOCK_READ);
>>> else if (from)
>>> set_kuap(AMR_KUAP_BLOCK_WRITE);
>>> }
>> 
>> You could implement it that way, but it reads better at the call sites
>> if we have:
>> 
>>  allow_write_to_user(uaddr, sizeof(*uaddr));
>> vs:
>>  allow_user_access(uaddr, NULL, sizeof(*uaddr));
>> 
>> So I'm inclined to keep them. It should all end up inlined and generate
>> the same code at the end of the day.
>> 
>
> I was not suggesting to completly remove allow_write_to_user(), I fully 
> agree that it reads better at the call sites.
>
> I was just thinking that allow_write_to_user() could remain generic and 
> call the subarch specific allow_user_access() instead of making multiple 
> subarch's allow_write_to_user()

Yep OK I see what you mean.

Your suggestion above should work, and involves the least amount of
ifdefs and so on.

I'll try and get time to post a v6.

cheers


Re: [PATCH] powerpc: vmlinux.lds: Drop Binutils 2.18 workarounds

2019-03-21 Thread Michael Ellerman
Segher Boessenkool  writes:

> On Thu, Mar 21, 2019 at 11:02:53AM +1030, Joel Stanley wrote:
>> Segher added some workarounds for GCC 4.2 and bintuils 2.18. We now set
>> 4.6 and 2.20 as the minimum, so they can be dropped.
>
> It was a bug in binutils _before_ 2.18, only seen by people using GCC
> _before_ 4.2.
>
> It's all ancient history by now, and good riddance :-)
>
>> Signed-off-by: Joel Stanley 
>
> Acked-by: Segher Boessenkool 

Thanks.

I updated the change log slightly:

  powerpc/vmlinux.lds: Drop binutils < 2.18 workarounds
  
  Segher added some workarounds for binutils < 2.18 and GCC < 4.2. We
  now set GCC 4.6 and binutils 2.20 as the minimum, so the workarounds
  can be dropped.
  
  This is mostly a revert of c6995fe4 ("powerpc: Fix build bug with
  binutils < 2.18 and GCC < 4.2").

cheers


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-03-21 Thread Thiago Jung Bauermann


Michael S. Tsirkin  writes:

> On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
>> >> Another way of looking at this issue which also explains our reluctance
>> >> is that the only difference between a secure guest and a regular guest
>> >> (at least regarding virtio) is that the former uses swiotlb while the
>> >> latter doens't.
>> >
>> > But swiotlb is just one implementation. It's a guest internal thing. The
>> > issue is that memory isn't host accessible.
>>
>> >From what I understand of the ACCESS_PLATFORM definition, the host will
>> only ever try to access memory addresses that are supplied to it by the
>> guest, so all of the secure guest memory that the host cares about is
>> accessible:
>>
>> If this feature bit is set to 0, then the device has same access to
>> memory addresses supplied to it as the driver has. In particular,
>> the device will always use physical addresses matching addresses
>> used by the driver (typically meaning physical addresses used by the
>> CPU) and not translated further, and can access any address supplied
>> to it by the driver. When clear, this overrides any
>> platform-specific description of whether device access is limited or
>> translated in any way, e.g. whether an IOMMU may be present.
>>
>> All of the above is true for POWER guests, whether they are secure
>> guests or not.
>>
>> Or are you saying that a virtio device may want to access memory
>> addresses that weren't supplied to it by the driver?
>
> Your logic would apply to IOMMUs as well.  For your mode, there are
> specific encrypted memory regions that driver has access to but device
> does not. that seems to violate the constraint.

Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
the device can ignore the IOMMU for all practical purposes I would
indeed say that the logic would apply to IOMMUs as well. :-)

I guess I'm still struggling with the purpose of signalling to the
driver that the host may not have access to memory addresses that it
will never try to access.

>> >> And from the device's point of view they're
>> >> indistinguishable. It can't tell one guest that is using swiotlb from
>> >> one that isn't. And that implies that secure guest vs regular guest
>> >> isn't a virtio interface issue, it's "guest internal affairs". So
>> >> there's no reason to reflect that in the feature flags.
>> >
>> > So don't. The way not to reflect that in the feature flags is
>> > to set ACCESS_PLATFORM.  Then you say *I don't care let platform device*.
>> >
>> >
>> > Without ACCESS_PLATFORM
>> > virtio has a very specific opinion about the security of the
>> > device, and that opinion is that device is part of the guest
>> > supervisor security domain.
>>
>> Sorry for being a bit dense, but not sure what "the device is part of
>> the guest supervisor security domain" means. In powerpc-speak,
>> "supervisor" is the operating system so perhaps that explains my
>> confusion. Are you saying that without ACCESS_PLATFORM, the guest
>> considers the host to be part of the guest operating system's security
>> domain?
>
> I think so. The spec says "device has same access as driver".

Ok, makes sense.

>> If so, does that have any other implication besides "the host
>> can access any address supplied to it by the driver"? If that is the
>> case, perhaps the definition of ACCESS_PLATFORM needs to be amended to
>> include that information because it's not part of the current
>> definition.
>>
>> >> > But the name "sev_active" makes me scared because at least AMD guys who
>> >> > were doing the sensible thing and setting ACCESS_PLATFORM
>> >>
>> >> My understanding is, AMD guest-platform knows in advance that their
>> >> guest will run in secure mode and hence sets the flag at the time of VM
>> >> instantiation. Unfortunately we dont have that luxury on our platforms.
>> >
>> > Well you do have that luxury. It looks like that there are existing
>> > guests that already acknowledge ACCESS_PLATFORM and you are not happy
>> > with how that path is slow. So you are trying to optimize for
>> > them by clearing ACCESS_PLATFORM and then you have lost ability
>> > to invoke DMA API.
>> >
>> > For example if there was another flag just like ACCESS_PLATFORM
>> > just not yet used by anyone, you would be all fine using that right?
>>
>> Yes, a new flag sounds like a great idea. What about the definition
>> below?
>>
>> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as
>> VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the
>> exception that the IOMMU is explicitly defined to be off or bypassed
>> when accessing memory addresses supplied to the device by the
>> driver. This flag should be set by the guest if offered, but to
>> allow for backward-compatibility device implementations allow for it
>> to be left unset by the guest. It is an error to set both this flag
>> and VIRTIO_F_ACCESS_PLATFORM.

Re: [PATCH] powerpc: vmlinux.lds: Drop Binutils 2.18 workarounds

2019-03-21 Thread Segher Boessenkool
On Thu, Mar 21, 2019 at 11:02:53AM +1030, Joel Stanley wrote:
> Segher added some workarounds for GCC 4.2 and bintuils 2.18. We now set
> 4.6 and 2.20 as the minimum, so they can be dropped.

It was a bug in binutils _before_ 2.18, only seen by people using GCC
_before_ 4.2.

It's all ancient history by now, and good riddance :-)

> Signed-off-by: Joel Stanley 

Acked-by: Segher Boessenkool 


Segher


Re: [PATCH 1/4] add generic builtin command line

2019-03-21 Thread Andrew Morton
On Thu, 21 Mar 2019 08:13:08 -0700 Daniel Walker  wrote:

> > The patches (or some version of them) are already in linux-next,
> > which messes me up.  I'll disable them for now.
>  
> Those are from my tree, but I remove them when you picked up the series. The
> next linux-next should not have them.

Yup, thanks, all looks good now.


Re: powerpc32 boot crash in 5.1-rc1

2019-03-21 Thread LEROY Christophe

Meelis Roos  a écrit :

While 5.0.0 worked fine on my PowerMac G4, 5.0 + git (unknown rev as  
of Mar 13), 5.0.0-11520-gf261c4e and todays git all fail to boot.


The problem seems to be in page fault handler in load_elf_binary()  
of init process.


The patch at https://patchwork.ozlabs.org/patch/1053385/ should fix it

Christophe



Two different screenshots are attached (let's see if they come through).

--
Meelis Roos 





powerpc32 boot crash in 5.1-rc1

2019-03-21 Thread Meelis Roos

While 5.0.0 worked fine on my PowerMac G4, 5.0 + git (unknown rev as of Mar 13),
5.0.0-11520-gf261c4e and todays git all fail to boot.

The problem seems to be in page fault handler in load_elf_binary() of init 
process.

Two different screenshots are at 
http://kodu.ut.ee/~mroos/powerpc-boot-hang-1.jpg and
http://kodu.ut.ee/~mroos/powerpc-boot-hang-1.jpg .

--
Meelis Roos 


[PATCH] arch/powerpc/crypto/crc-vpmsum_test: Use cheaper random numbers for self-test

2019-03-21 Thread George Spelvin
This code was filling a 64K buffer from /dev/urandom in order to
compute a CRC over (on average half of) it by two different methods,
comparing the CRCs, and repeating.

This is not a remotely security-critical application, so use the far
faster and cheaper prandom_u32() generator.

And, while we're at it, only fill as much of the buffer as we plan to use.

Signed-off-by: George Spelvin 
Cc: Daniel Axtens 
Cc: Herbert Xu 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
---
 arch/powerpc/crypto/crc-vpmsum_test.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/crypto/crc-vpmsum_test.c 
b/arch/powerpc/crypto/crc-vpmsum_test.c
index 0153a9c6f4af..98ea4f4d3dde 100644
--- a/arch/powerpc/crypto/crc-vpmsum_test.c
+++ b/arch/powerpc/crypto/crc-vpmsum_test.c
@@ -78,16 +78,12 @@ static int __init crc_test_init(void)
 
pr_info("crc-vpmsum_test begins, %lu iterations\n", iterations);
for (i=0; i

Re: [PATCH kernel RFC 2/2] vfio-pci-nvlink2: Implement interconnect isolation

2019-03-21 Thread Alex Williamson
On Thu, 21 Mar 2019 10:56:00 +1100
David Gibson  wrote:

> On Wed, Mar 20, 2019 at 01:09:08PM -0600, Alex Williamson wrote:
> > On Wed, 20 Mar 2019 15:38:24 +1100
> > David Gibson  wrote:
> >   
> > > On Tue, Mar 19, 2019 at 10:36:19AM -0600, Alex Williamson wrote:  
> > > > On Fri, 15 Mar 2019 19:18:35 +1100
> > > > Alexey Kardashevskiy  wrote:
> > > > 
> > > > > The NVIDIA V100 SXM2 GPUs are connected to the CPU via PCIe links and
> > > > > (on POWER9) NVLinks. In addition to that, GPUs themselves have direct
> > > > > peer to peer NVLinks in groups of 2 to 4 GPUs. At the moment the 
> > > > > POWERNV
> > > > > platform puts all interconnected GPUs to the same IOMMU group.
> > > > > 
> > > > > However the user may want to pass individual GPUs to the userspace so
> > > > > in order to do so we need to put them into separate IOMMU groups and
> > > > > cut off the interconnects.
> > > > > 
> > > > > Thankfully V100 GPUs implement an interface to do by programming link
> > > > > disabling mask to BAR0 of a GPU. Once a link is disabled in a GPU 
> > > > > using
> > > > > this interface, it cannot be re-enabled until the secondary bus reset 
> > > > > is
> > > > > issued to the GPU.
> > > > > 
> > > > > This defines a reset_done() handler for V100 NVlink2 device which
> > > > > determines what links need to be disabled. This relies on presence
> > > > > of the new "ibm,nvlink-peers" device tree property of a GPU telling 
> > > > > which
> > > > > PCI peers it is connected to (which includes NVLink bridges or peer 
> > > > > GPUs).
> > > > > 
> > > > > This does not change the existing behaviour and instead adds
> > > > > a new "isolate_nvlink" kernel parameter to allow such isolation.
> > > > > 
> > > > > The alternative approaches would be:
> > > > > 
> > > > > 1. do this in the system firmware (skiboot) but for that we would need
> > > > > to tell skiboot via an additional OPAL call whether or not we want 
> > > > > this
> > > > > isolation - skiboot is unaware of IOMMU groups.
> > > > > 
> > > > > 2. do this in the secondary bus reset handler in the POWERNV platform 
> > > > > -
> > > > > the problem with that is at that point the device is not enabled, i.e.
> > > > > config space is not restored so we need to enable the device (i.e. 
> > > > > MMIO
> > > > > bit in CMD register + program valid address to BAR0) in order to 
> > > > > disable
> > > > > links and then perhaps undo all this initialization to bring the 
> > > > > device
> > > > > back to the state where pci_try_reset_function() expects it to be.
> > > > 
> > > > The trouble seems to be that this approach only maintains the isolation
> > > > exposed by the IOMMU group when vfio-pci is the active driver for the
> > > > device.  IOMMU groups can be used by any driver and the IOMMU core is
> > > > incorporating groups in various ways.
> > > 
> > > I don't think that reasoning is quite right.  An IOMMU group doesn't
> > > necessarily represent devices which *are* isolated, just devices which
> > > *can be* isolated.  There are plenty of instances when we don't need
> > > to isolate devices in different IOMMU groups: passing both groups to
> > > the same guest or userspace VFIO driver for example, or indeed when
> > > both groups are owned by regular host kernel drivers.
> > > 
> > > In at least some of those cases we also don't want to isolate the
> > > devices when we don't have to, usually for performance reasons.  
> > 
> > I see IOMMU groups as representing the current isolation of the device,
> > not just the possible isolation.  If there are ways to break down that
> > isolation then ideally the group would be updated to reflect it.  The
> > ACS disable patches seem to support this, at boot time we can choose to
> > disable ACS at certain points in the topology to favor peer-to-peer
> > performance over isolation.  This is then reflected in the group
> > composition, because even though ACS *can be* enabled at the given
> > isolation points, it's intentionally not with this option.  Whether or
> > not a given user who owns multiple devices needs that isolation is
> > really beside the point, the user can choose to connect groups via IOMMU
> > mappings or reconfigure the system to disable ACS and potentially more
> > direct routing.  The IOMMU groups are still accurately reflecting the
> > topology and IOMMU based isolation.  
> 
> Huh, ok, I think we need to straighten this out.  Thinking of iommu
> groups as possible rather than potential isolation was a conscious

possible ~= potential

> decision on my part when we were first coming up with them.  The
> rationale was that that way iommu groups could be static for the
> lifetime of boot, with more dynamic isolation state layered on top.
> 
> Now, that was based on analogy with PAPR's concept of "Partitionable
> Endpoints" which are decided by firmware before boot.  However, I
> think it makes sense in other contexts too: if iommu groups represent
> current isolation, then we 

Re: [PATCH] kmemleak: powerpc: skip scanning holes in the .bss section

2019-03-21 Thread Qian Cai
On Thu, 2019-03-21 at 17:19 +, Catalin Marinas wrote:
> The commit 2d4f567103ff ("KVM: PPC: Introduce kvm_tmp framework") adds
> kvm_tmp[] into the .bss section and then free the rest of unused spaces
> back to the page allocator.
> 
> kernel_init
>   kvm_guest_init
> kvm_free_tmp
>   free_reserved_area
> free_unref_page
>   free_unref_page_prepare
> 
> With DEBUG_PAGEALLOC=y, it will unmap those pages from kernel. As the
> result, kmemleak scan will trigger a panic when it scans the .bss
> section with unmapped pages.
> 
> This patch creates dedicated kmemleak objects for the .data, .bss and
> potentially .data..ro_after_init sections to allow partial freeing via
> the kmemleak_free_part() in the powerpc kvm_free_tmp() function.
> 
> Acked-by: Michael Ellerman  (powerpc)
> Reported-by: Qian Cai 
> Signed-off-by: Catalin Marinas 

Tested-by: Qian Cai 


[PATCH] kmemleak: powerpc: skip scanning holes in the .bss section

2019-03-21 Thread Catalin Marinas
The commit 2d4f567103ff ("KVM: PPC: Introduce kvm_tmp framework") adds
kvm_tmp[] into the .bss section and then free the rest of unused spaces
back to the page allocator.

kernel_init
  kvm_guest_init
kvm_free_tmp
  free_reserved_area
free_unref_page
  free_unref_page_prepare

With DEBUG_PAGEALLOC=y, it will unmap those pages from kernel. As the
result, kmemleak scan will trigger a panic when it scans the .bss
section with unmapped pages.

This patch creates dedicated kmemleak objects for the .data, .bss and
potentially .data..ro_after_init sections to allow partial freeing via
the kmemleak_free_part() in the powerpc kvm_free_tmp() function.

Acked-by: Michael Ellerman  (powerpc)
Reported-by: Qian Cai 
Signed-off-by: Catalin Marinas 
---

Posting as a proper patch following the inlined one here:

http://lkml.kernel.org/r/20190320181656.gb38...@arrakis.emea.arm.com

Changes from the above:

- Added comment to the powerpc kmemleak_free_part() call

- Only register the .data..ro_after_init in kmemleak if not contained
  within the .data sections (which seems to be the case for lots of
  architectures)

I preserved part of Qian's original commit message but changed the
author since I rewrote the patch.

 arch/powerpc/kernel/kvm.c |  7 +++
 mm/kmemleak.c | 16 +++-
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 683b5b3805bd..cd381e2291df 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -712,6 +713,12 @@ static void kvm_use_magic_page(void)
 
 static __init void kvm_free_tmp(void)
 {
+   /*
+* Inform kmemleak about the hole in the .bss section since the
+* corresponding pages will be unmapped with DEBUG_PAGEALLOC=y.
+*/
+   kmemleak_free_part(_tmp[kvm_tmp_index],
+  ARRAY_SIZE(kvm_tmp) - kvm_tmp_index);
free_reserved_area(_tmp[kvm_tmp_index],
   _tmp[ARRAY_SIZE(kvm_tmp)], -1, NULL);
 }
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 707fa5579f66..6c318f5ac234 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1529,11 +1529,6 @@ static void kmemleak_scan(void)
}
rcu_read_unlock();
 
-   /* data/bss scanning */
-   scan_large_block(_sdata, _edata);
-   scan_large_block(__bss_start, __bss_stop);
-   scan_large_block(__start_ro_after_init, __end_ro_after_init);
-
 #ifdef CONFIG_SMP
/* per-cpu sections scanning */
for_each_possible_cpu(i)
@@ -2071,6 +2066,17 @@ void __init kmemleak_init(void)
}
local_irq_restore(flags);
 
+   /* register the data/bss sections */
+   create_object((unsigned long)_sdata, _edata - _sdata,
+ KMEMLEAK_GREY, GFP_ATOMIC);
+   create_object((unsigned long)__bss_start, __bss_stop - __bss_start,
+ KMEMLEAK_GREY, GFP_ATOMIC);
+   /* only register .data..ro_after_init if not within .data */
+   if (__start_ro_after_init < _sdata || __end_ro_after_init > _edata)
+   create_object((unsigned long)__start_ro_after_init,
+ __end_ro_after_init - __start_ro_after_init,
+ KMEMLEAK_GREY, GFP_ATOMIC);
+
/*
 * This is the point where tracking allocations is safe. Automatic
 * scanning is started during the late initcall. Add the early logged


Re: [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it

2019-03-21 Thread Ira Weiny
On Tue, Mar 19, 2019 at 03:19:30PM -0700, Andrew Morton wrote:
> On Sun, 17 Mar 2019 11:34:31 -0700 ira.we...@intel.com wrote:
> 
> > Resending after rebasing to the latest mm tree.
> > 
> > HFI1, qib, and mthca, use get_user_pages_fast() due to it performance
> > advantages.  These pages can be held for a significant time.  But
> > get_user_pages_fast() does not protect against mapping FS DAX pages.
> > 
> > Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
> > retains the performance while also adding the FS DAX checks.  XDP has also
> > shown interest in using this functionality.[1]
> > 
> > In addition we change get_user_pages() to use the new FOLL_LONGTERM flag and
> > remove the specialized get_user_pages_longterm call.
> 
> It would be helpful to include your response to Christoph's question
> (http://lkml.kernel.org/r/20190220180255.ga12...@iweiny-desk2.sc.intel.com)
> in the changelog.  Because if one person was wondering about this,
> others will likely do so.
> 
> We have no record of acks or reviewed-by's.  At least one was missed
> (http://lkml.kernel.org/r/caog9msttcd-9bcsdfc0wryqfvrnb4twozl0c4+6qxi-n_y4...@mail.gmail.com),
> but that is very very partial.

That is my bad.  Sorry to Mike.  And I have added him.

> 
> This patchset is fairly DAX-centered, but Dan wasn't cc'ed!

Agreed, I'm new to changing things which affect this many sub-systems and I
struggled with who should be CC'ed (get_maintainer.pl returned a very large
list  :-(.

I fear I may have cc'ed too many people, and the wrong people apparently, so
that may be affecting the review...

So again my apologies.  I don't know if Dan is going to get a chance to put a
reviewed-by on them this week but I thought I would send this note to let you
know I'm not ignoring your feedback.  Just waiting a bit before resending to
hopefully get some more acks/reviewed bys.

Thanks,
Ira

> 
> So ho hum.  I'll scoop them up and shall make the above changes to the
> [1/n] changelog, but we still have some work to do.
> 


Re: [PATCH 1/4] add generic builtin command line

2019-03-21 Thread Daniel Walker
On Wed, Mar 20, 2019 at 08:14:33PM -0700, Andrew Morton wrote:
> On Wed, 20 Mar 2019 16:23:28 -0700 Daniel Walker  wrote:
> 
> > On Wed, Mar 20, 2019 at 03:53:19PM -0700, Andrew Morton wrote:
> > > On Tue, 19 Mar 2019 16:24:45 -0700 Daniel Walker  
> > > wrote:
> > > 
> > > > This code allows architectures to use a generic builtin command line.
> > > 
> > > I wasn't cc'ed on [2/4].  No mailing lists were cc'ed on [0/4] but it
> > > didn't say anything useful anyway ;)
> > > 
> > > I'll queue them up for testing and shall await feedback from the
> > > powerpc developers.
> > > 
> > 
> > You weren't CC'd , but it was To: you,
> > 
> >  35 From: Daniel Walker 
> >  36 To: Andrew Morton ,
> >  37 Christophe Leroy ,
> >  38 Michael Ellerman ,
> >  39 Rob Herring , xe-linux-exter...@cisco.com,
> >  40 linuxppc-dev@lists.ozlabs.org, Frank Rowand 
> > 
> >  41 Cc: devicet...@vger.kernel.org, linux-ker...@vger.kernel.org
> >  42 Subject: [PATCH 2/4] drivers: of: generic command line support
> 
> hm.
> 
> > Thanks for picking it up.
> 
> The patches (or some version of them) are already in linux-next,
> which messes me up.  I'll disable them for now.
 
Those are from my tree, but I remove them when you picked up the series. The
next linux-next should not have them.

Daniel


[PATCH -next] crypto: vmx - Make p8_init and p8_exit static

2019-03-21 Thread Yue Haibing
From: YueHaibing 

Fix sparse warnings:

drivers/crypto/vmx/vmx.c:44:12: warning:
 symbol 'p8_init' was not declared. Should it be static?
drivers/crypto/vmx/vmx.c:70:13: warning:
 symbol 'p8_exit' was not declared. Should it be static?

Signed-off-by: YueHaibing 
---
 drivers/crypto/vmx/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/vmx/vmx.c b/drivers/crypto/vmx/vmx.c
index 31a98dc..a9f5198 100644
--- a/drivers/crypto/vmx/vmx.c
+++ b/drivers/crypto/vmx/vmx.c
@@ -41,7 +41,7 @@ static struct crypto_alg *algs[] = {
NULL,
 };
 
-int __init p8_init(void)
+static int __init p8_init(void)
 {
int ret = 0;
struct crypto_alg **alg_it;
@@ -67,7 +67,7 @@ int __init p8_init(void)
return ret;
 }
 
-void __exit p8_exit(void)
+static void __exit p8_exit(void)
 {
struct crypto_alg **alg_it;
 
-- 
2.7.0




Applied "ASoC: fsl_asrc: add constraint for the asrc of older version" to the asoc tree

2019-03-21 Thread Mark Brown
The patch

   ASoC: fsl_asrc: add constraint for the asrc of older version

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 53f67a78663811968f426d480bc55887d787bd94 Mon Sep 17 00:00:00 2001
From: "S.j. Wang" 
Date: Sat, 2 Mar 2019 05:52:19 +
Subject: [PATCH] ASoC: fsl_asrc: add constraint for the asrc of older version

There is a constraint for the channel number setting on the
asrc of older version (e.g. imx35), the channel number should
be even, odd number isn't valid.

So add this constraint when the asrc of older version is used.

Acked-by: Nicolin Chen 
Signed-off-by: Shengjiu Wang 
Signed-off-by: Mark Brown 
---
 sound/soc/fsl/fsl_asrc.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index 528e8b108422..0b937924d2e4 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -445,6 +445,19 @@ struct dma_chan *fsl_asrc_get_dma_channel(struct 
fsl_asrc_pair *pair, bool dir)
 }
 EXPORT_SYMBOL_GPL(fsl_asrc_get_dma_channel);
 
+static int fsl_asrc_dai_startup(struct snd_pcm_substream *substream,
+   struct snd_soc_dai *dai)
+{
+   struct fsl_asrc *asrc_priv = snd_soc_dai_get_drvdata(dai);
+
+   /* Odd channel number is not valid for older ASRC (channel_bits==3) */
+   if (asrc_priv->channel_bits == 3)
+   snd_pcm_hw_constraint_step(substream->runtime, 0,
+  SNDRV_PCM_HW_PARAM_CHANNELS, 2);
+
+   return 0;
+}
+
 static int fsl_asrc_dai_hw_params(struct snd_pcm_substream *substream,
  struct snd_pcm_hw_params *params,
  struct snd_soc_dai *dai)
@@ -539,6 +552,7 @@ static int fsl_asrc_dai_trigger(struct snd_pcm_substream 
*substream, int cmd,
 }
 
 static const struct snd_soc_dai_ops fsl_asrc_dai_ops = {
+   .startup  = fsl_asrc_dai_startup,
.hw_params= fsl_asrc_dai_hw_params,
.hw_free  = fsl_asrc_dai_hw_free,
.trigger  = fsl_asrc_dai_trigger,
-- 
2.20.1



Applied "ASoC: fsl_esai: fix channel swap issue when stream starts" to the asoc tree

2019-03-21 Thread Mark Brown
The patch

   ASoC: fsl_esai: fix channel swap issue when stream starts

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 0ff4e8c61b794a4bf6c854ab071a1abaaa80f358 Mon Sep 17 00:00:00 2001
From: "S.j. Wang" 
Date: Wed, 27 Feb 2019 06:31:12 +
Subject: [PATCH] ASoC: fsl_esai: fix channel swap issue when stream starts

There is very low possibility ( < 0.1% ) that channel swap happened
in beginning when multi output/input pin is enabled. The issue is
that hardware can't send data to correct pin in the beginning with
the normal enable flow.

This is hardware issue, but there is no errata, the workaround flow
is that: Each time playback/recording, firstly clear the xSMA/xSMB,
then enable TE/RE, then enable xSMB and xSMA (xSMB must be enabled
before xSMA). Which is to use the xSMA as the trigger start register,
previously the xCR_TE or xCR_RE is the bit for starting.

Fixes commit 43d24e76b698 ("ASoC: fsl_esai: Add ESAI CPU DAI driver")
Cc: 
Reviewed-by: Fabio Estevam 
Acked-by: Nicolin Chen 
Signed-off-by: Shengjiu Wang 
Signed-off-by: Mark Brown 
---
 sound/soc/fsl/fsl_esai.c | 47 +++-
 1 file changed, 37 insertions(+), 10 deletions(-)

diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
index afe67c865330..3623aa9a6f2e 100644
--- a/sound/soc/fsl/fsl_esai.c
+++ b/sound/soc/fsl/fsl_esai.c
@@ -54,6 +54,8 @@ struct fsl_esai {
u32 fifo_depth;
u32 slot_width;
u32 slots;
+   u32 tx_mask;
+   u32 rx_mask;
u32 hck_rate[2];
u32 sck_rate[2];
bool hck_dir[2];
@@ -361,21 +363,13 @@ static int fsl_esai_set_dai_tdm_slot(struct snd_soc_dai 
*dai, u32 tx_mask,
regmap_update_bits(esai_priv->regmap, REG_ESAI_TCCR,
   ESAI_xCCR_xDC_MASK, ESAI_xCCR_xDC(slots));
 
-   regmap_update_bits(esai_priv->regmap, REG_ESAI_TSMA,
-  ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(tx_mask));
-   regmap_update_bits(esai_priv->regmap, REG_ESAI_TSMB,
-  ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(tx_mask));
-
regmap_update_bits(esai_priv->regmap, REG_ESAI_RCCR,
   ESAI_xCCR_xDC_MASK, ESAI_xCCR_xDC(slots));
 
-   regmap_update_bits(esai_priv->regmap, REG_ESAI_RSMA,
-  ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(rx_mask));
-   regmap_update_bits(esai_priv->regmap, REG_ESAI_RSMB,
-  ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(rx_mask));
-
esai_priv->slot_width = slot_width;
esai_priv->slots = slots;
+   esai_priv->tx_mask = tx_mask;
+   esai_priv->rx_mask = rx_mask;
 
return 0;
 }
@@ -596,6 +590,7 @@ static int fsl_esai_trigger(struct snd_pcm_substream 
*substream, int cmd,
bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
u8 i, channels = substream->runtime->channels;
u32 pins = DIV_ROUND_UP(channels, esai_priv->slots);
+   u32 mask;
 
switch (cmd) {
case SNDRV_PCM_TRIGGER_START:
@@ -608,15 +603,38 @@ static int fsl_esai_trigger(struct snd_pcm_substream 
*substream, int cmd,
for (i = 0; tx && i < channels; i++)
regmap_write(esai_priv->regmap, REG_ESAI_ETDR, 0x0);
 
+   /*
+* When set the TE/RE in the end of enablement flow, there
+* will be channel swap issue for multi data line case.
+* In order to workaround this issue, we switch the bit
+* enablement sequence to below sequence
+* 1) clear the xSMB & xSMA: which is done in probe and
+*   stop state.
+* 2) set TE/RE
+* 3) set xSMB
+* 4) set xSMA:  xSMA is the last one in this flow, which
+*   will trigger esai to start.
+*/
regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx),
   tx ? ESAI_xCR_TE_MASK : ESAI_xCR_RE_MASK,
   tx ? ESAI_xCR_TE(pins) : ESAI_xCR_RE(pins));
+   mask = tx ? esai_priv->tx_mask : 

Re: [RFC PATCH 1/1] KVM: PPC: Report single stepping capability

2019-03-21 Thread Fabiano Rosas
Alexey Kardashevskiy  writes:

> In the cover letter (which is not really required for a single patch)
> you say the capability will be present for BookE and PR KVM (which
> Book3s) but here it is BookE only, is that intentional?

A few lines below (falling through) we have:

/* We support this only for PR */
r = !hv_enabled;

> Also, you need to update Documentation/virtual/kvm/api.txt for the new
> capability. After reading which I started wondering could not we just
> use existing KVM_CAP_GUEST_DEBUG_HW_BPS?

We _could_, but I think that would conflate two different
concepts. Single stepping does not necessarily makes use of hardware
breakpoints (e.g. Trace Interrupt on Book3s PR).

I also think we should use KVM_CAP_GUEST_DEBUG_HW_BPS in the future to
let QEMU know about: i) the lack of hardware breakpoints in Book3s and
ii) BookE's hardware breakpoints (Instruction Address Compare) that are
currently not being reported via HW_BPS.



[PATCH v5 04/19] powerpc: mm: Add p?d_large() definitions

2019-03-21 Thread Steven Price
walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_large() functions/macros.

For powerpc pmd_large() was already implemented, so hoist it out of the
CONFIG_TRANSPARENT_HUGEPAGE condition and implement the other levels.

Also since we now have a pmd_large always implemented we can drop the
pmd_is_leaf() function.

CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: linuxppc-dev@lists.ozlabs.org
CC: kvm-...@vger.kernel.org
Signed-off-by: Steven Price 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++--
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 12 ++--
 2 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 581f91be9dd4..f6d1ac8b832e 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -897,6 +897,12 @@ static inline int pud_present(pud_t pud)
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+#define pud_large  pud_large
+static inline int pud_large(pud_t pud)
+{
+   return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
+}
+
 extern struct page *pud_page(pud_t pud);
 extern struct page *pmd_page(pmd_t pmd);
 static inline pte_t pud_pte(pud_t pud)
@@ -940,6 +946,12 @@ static inline int pgd_present(pgd_t pgd)
return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+#define pgd_large  pgd_large
+static inline int pgd_large(pgd_t pgd)
+{
+   return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
+}
+
 static inline pte_t pgd_pte(pgd_t pgd)
 {
return __pte_raw(pgd_raw(pgd));
@@ -1093,6 +1105,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool 
write)
return pte_access_permitted(pmd_pte(pmd), write);
 }
 
+#define pmd_large  pmd_large
+/*
+ * returns true for pmd migration entries, THP, devmap, hugetlb
+ */
+static inline int pmd_large(pmd_t pmd)
+{
+   return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
 extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
@@ -1119,15 +1140,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long 
addr, pmd_t *pmdp,
return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
 }
 
-/*
- * returns true for pmd migration entries, THP, devmap, hugetlb
- * But compile time dependent on THP config
- */
-static inline int pmd_large(pmd_t pmd)
-{
-   return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
-}
-
 static inline pmd_t pmd_mknotpresent(pmd_t pmd)
 {
return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index f55ef071883f..1b57b4e3f819 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -363,12 +363,6 @@ static void kvmppc_pte_free(pte_t *ptep)
kmem_cache_free(kvm_pte_cache, ptep);
 }
 
-/* Like pmd_huge() and pmd_large(), but works regardless of config options */
-static inline int pmd_is_leaf(pmd_t pmd)
-{
-   return !!(pmd_val(pmd) & _PAGE_PTE);
-}
-
 static pmd_t *kvmppc_pmd_alloc(void)
 {
return kmem_cache_alloc(kvm_pmd_cache, GFP_KERNEL);
@@ -460,7 +454,7 @@ static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t 
*pmd, bool full,
for (im = 0; im < PTRS_PER_PMD; ++im, ++p) {
if (!pmd_present(*p))
continue;
-   if (pmd_is_leaf(*p)) {
+   if (pmd_large(*p)) {
if (full) {
pmd_clear(p);
} else {
@@ -593,7 +587,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
else if (level <= 1)
new_pmd = kvmppc_pmd_alloc();
 
-   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd)))
+   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_large(*pmd)))
new_ptep = kvmppc_pte_alloc();
 
/* Check if we might have been invalidated; let the guest retry if so */
@@ -662,7 +656,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pmd = NULL;
}
pmd = pmd_offset(pud, gpa);
-   if (pmd_is_leaf(*pmd)) {
+   if (pmd_large(*pmd)) {
unsigned long lgpa = gpa & PMD_MASK;
 
/* Check if we raced and someone else has set the same thing */
-- 
2.20.1



Re: [alsa-devel] [PATCH V4] ASoC: fsl_esai: fix channel swap issue when stream starts

2019-03-21 Thread Mark Brown
On Tue, Mar 19, 2019 at 02:08:20AM +, S.j. Wang wrote:
> Hi Mark
> 
> Can this patch be accepted?  Or need I do any update?

Please don't send content free pings and please allow a reasonable time
for review.  People get busy, go on holiday, attend conferences and so 
on so unless there is some reason for urgency (like critical bug fixes)
please allow at least a couple of weeks for review.  If there have been
review comments then people may be waiting for those to be addressed.

Sending content free pings adds to the mail volume (if they are seen at
all) which is often the problem and since they can't be reviewed
directly if something has gone wrong you'll have to resend the patches
anyway, so sending again is generally a better approach though there are
some other maintainers who like them - if in doubt look at how patches
for the subsystem are normally handled.


signature.asc
Description: PGP signature


[RFC PATCH 7/8] vfs: Convert spufs to fs_context

2019-03-21 Thread David Howells
Signed-off-by: David Howells 
cc: Jeremy Kerr 
cc: Arnd Bergmann 
cc: linuxppc-dev@lists.ozlabs.org
---

 arch/powerpc/platforms/cell/spufs/inode.c |  207 -
 1 file changed, 116 insertions(+), 91 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/inode.c 
b/arch/powerpc/platforms/cell/spufs/inode.c
index db329d4bf1c3..f951a7fe4e3c 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -23,6 +23,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -33,7 +35,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -43,7 +44,7 @@
 #include "spufs.h"
 
 struct spufs_sb_info {
-   int debug;
+   bool debug;
 };
 
 static struct kmem_cache *spufs_inode_cache;
@@ -593,16 +594,27 @@ long spufs_create(struct path *path, struct dentry 
*dentry,
 }
 
 /* File system initialization */
+struct spufs_fs_context {
+   kuid_t  uid;
+   kgid_t  gid;
+   umode_t mode;
+};
+
 enum {
-   Opt_uid, Opt_gid, Opt_mode, Opt_debug, Opt_err,
+   Opt_uid, Opt_gid, Opt_mode, Opt_debug,
+};
+
+static const struct fs_parameter_spec spufs_param_specs[] = {
+   fsparam_u32 ("gid", Opt_gid),
+   fsparam_u32oct  ("mode",Opt_mode),
+   fsparam_u32 ("uid", Opt_uid),
+   fsparam_flag("debug",   Opt_debug),
+   {}
 };
 
-static const match_table_t spufs_tokens = {
-   { Opt_uid,   "uid=%d" },
-   { Opt_gid,   "gid=%d" },
-   { Opt_mode,  "mode=%o" },
-   { Opt_debug, "debug" },
-   { Opt_err,NULL  },
+static const struct fs_parameter_description spufs_fs_parameters = {
+   .name   = "spufs",
+   .specs  = spufs_param_specs,
 };
 
 static int spufs_show_options(struct seq_file *m, struct dentry *root)
@@ -623,47 +635,41 @@ static int spufs_show_options(struct seq_file *m, struct 
dentry *root)
return 0;
 }
 
-static int
-spufs_parse_options(struct super_block *sb, char *options, struct inode *root)
-{
-   char *p;
-   substring_t args[MAX_OPT_ARGS];
-
-   while ((p = strsep(, ",")) != NULL) {
-   int token, option;
-
-   if (!*p)
-   continue;
-
-   token = match_token(p, spufs_tokens, args);
-   switch (token) {
-   case Opt_uid:
-   if (match_int([0], ))
-   return 0;
-   root->i_uid = make_kuid(current_user_ns(), option);
-   if (!uid_valid(root->i_uid))
-   return 0;
-   break;
-   case Opt_gid:
-   if (match_int([0], ))
-   return 0;
-   root->i_gid = make_kgid(current_user_ns(), option);
-   if (!gid_valid(root->i_gid))
-   return 0;
-   break;
-   case Opt_mode:
-   if (match_octal([0], ))
-   return 0;
-   root->i_mode = option | S_IFDIR;
-   break;
-   case Opt_debug:
-   spufs_get_sb_info(sb)->debug = 1;
-   break;
-   default:
-   return 0;
-   }
+static int spufs_parse_param(struct fs_context *fc, struct fs_parameter *param)
+{
+   struct spufs_fs_context *ctx = fc->fs_private;
+   struct spufs_sb_info *sbi = fc->s_fs_info;
+   struct fs_parse_result result;
+   kuid_t uid;
+   kgid_t gid;
+   int opt;
+
+   opt = fs_parse(fc, _fs_parameters, param, );
+   if (opt < 0)
+   return opt;
+
+   switch (opt) {
+   case Opt_uid:
+   uid = make_kuid(current_user_ns(), result.uint_32);
+   if (!uid_valid(uid))
+   return invalf(fc, "Unknown uid");
+   ctx->uid = uid;
+   break;
+   case Opt_gid:
+   gid = make_kgid(current_user_ns(), result.uint_32);
+   if (!gid_valid(gid))
+   return invalf(fc, "Unknown gid");
+   ctx->gid = gid;
+   break;
+   case Opt_mode:
+   ctx->mode = result.uint_32 & S_IALLUGO;
+   break;
+   case Opt_debug:
+   sbi->debug = true;
+   break;
}
-   return 1;
+
+   return 0;
 }
 
 static void spufs_exit_isolated_loader(void)
@@ -697,79 +703,98 @@ spufs_init_isolated_loader(void)
printk(KERN_INFO "spufs: SPU isolation mode enabled\n");
 }
 
-static int
-spufs_create_root(struct super_block *sb, void *data)
+static int spufs_create_root(struct super_block *sb, struct fs_context *fc)
 {
+   struct spufs_fs_context *ctx = fc->fs_private;

[RFC PATCH 0/8] Convert mount_single-using filesystems to fs_context

2019-03-21 Thread David Howells


Hi Al,

Here's a set of patches that converts the mount_single()-using filesystems
to use the new fs_context struct.  There may be prerequisite commits in the
branch detailed below.

 (1) Add a new keying to vfs_get_super() that indicates that
 ->reconfigure() should be called instead of (*fill_super)() if the
 superblock already exists.

 (2) Convert debugfs.

 (3) Convert tracefs.

 (4) Convert pstore.

 (5) Fix a bug in hypfs.

 (6) Convert hypfs.

 (7) Convert spufs.

 (8) Kill off mount_single().

These can be found in the following branch:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=mount-api-viro

Thanks,
David
---
David Howells (8):
  vfs: Add a single-or-reconfig keying to vfs_get_super()
  vfs: Convert debugfs to fs_context
  vfs: Convert tracefs to fs_context
  vfs: Convert pstore to fs_context
  hypfs: Fix error number left in struct pointer member
  vfs: Convert hypfs to fs_context
  vfs: Convert spufs to fs_context
  vfs: Kill off mount_single()


 Documentation/filesystems/vfs.txt |4 -
 arch/powerpc/platforms/cell/spufs/inode.c |  207 -
 arch/s390/hypfs/inode.c   |  137 +++
 fs/debugfs/inode.c|  186 --
 fs/pstore/inode.c |  110 ++-
 fs/super.c|   73 ++
 fs/tracefs/inode.c|  180 -
 include/linux/fs.h|3 
 include/linux/fs_context.h|1 
 9 files changed, 446 insertions(+), 455 deletions(-)



Re: [PATCH v1 00/27] Reduce ifdef mess in hugetlbpage.c and slice.c

2019-03-21 Thread Christophe Leroy

This series went through a successfull build test on kisskb:

http://kisskb.ellerman.id.au/kisskb/branch/chleroy/head/f9dc3b2203af4356e9eac2d901126a0dfc5b51f6/

Christophe

Le 20/03/2019 à 11:06, Christophe Leroy a écrit :

The main purpose of this series is to reduce the amount of #ifdefs in
hugetlbpage.c and slice.c

At the same time, it does some cleanup by reducing the number of BUG_ON()
and dropping unused functions.

It also removes 64k pages related code in nohash/64 as 64k pages are
can only by selected on book3s/64

Christophe Leroy (27):
   powerpc/mm: Don't BUG() in hugepd_page()
   powerpc/mm: don't BUG in add_huge_page_size()
   powerpc/mm: don't BUG() in slice_mask_for_size()
   powerpc/book3e: drop mmu_get_tsize()
   powerpc/mm: drop slice_set_user_psize()
   powerpc/64: only book3s/64 supports CONFIG_PPC_64K_PAGES
   powerpc/book3e: hugetlbpage is only for CONFIG_PPC_FSL_BOOK3E
   powerpc/mm: move __find_linux_pte() out of hugetlbpage.c
   powerpc/mm: make hugetlbpage.c depend on CONFIG_HUGETLB_PAGE
   powerpc/mm: make gup_hugepte() static
   powerpc/mm: split asm/hugetlb.h into dedicated subarch files
   powerpc/mm: add a helper to populate hugepd
   powerpc/mm: define get_slice_psize() all the time
   powerpc/mm: no slice for nohash/64
   powerpc/mm: cleanup ifdef mess in add_huge_page_size()
   powerpc/mm: move hugetlb_disabled into asm/hugetlb.h
   powerpc/mm: cleanup HPAGE_SHIFT setup
   powerpc/mm: cleanup remaining ifdef mess in hugetlbpage.c
   powerpc/mm: drop slice DEBUG
   powerpc/mm: remove unnecessary #ifdef CONFIG_PPC64
   powerpc/mm: hand a context_t over to slice_mask_for_size() instead of
 mm_struct
   powerpc/mm: move slice_mask_for_size() into mmu.h
   powerpc/mm: remove a couple of #ifdef CONFIG_PPC_64K_PAGES in
 mm/slice.c
   powerpc: define subarch SLB_ADDR_LIMIT_DEFAULT
   powerpc/mm: flatten function __find_linux_pte()
   powerpc/mm: flatten function __find_linux_pte() step 2
   powerpc/mm: flatten function __find_linux_pte() step 3

  arch/powerpc/Kconfig |   3 +-
  arch/powerpc/include/asm/book3s/64/hugetlb.h |  73 +++
  arch/powerpc/include/asm/book3s/64/mmu.h |  22 +-
  arch/powerpc/include/asm/book3s/64/slice.h   |   7 +-
  arch/powerpc/include/asm/hugetlb.h   |  87 +---
  arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h |  45 +
  arch/powerpc/include/asm/nohash/32/mmu-8xx.h |  18 ++
  arch/powerpc/include/asm/nohash/32/slice.h   |   2 +
  arch/powerpc/include/asm/nohash/64/pgalloc.h |   3 -
  arch/powerpc/include/asm/nohash/64/pgtable.h |   4 -
  arch/powerpc/include/asm/nohash/64/slice.h   |  12 --
  arch/powerpc/include/asm/nohash/hugetlb-book3e.h |  45 +
  arch/powerpc/include/asm/nohash/pte-book3e.h |   5 -
  arch/powerpc/include/asm/page.h  |  12 +-
  arch/powerpc/include/asm/pgtable-be-types.h  |   7 +-
  arch/powerpc/include/asm/pgtable-types.h |   7 +-
  arch/powerpc/include/asm/pgtable.h   |   3 -
  arch/powerpc/include/asm/slice.h |   8 +-
  arch/powerpc/include/asm/task_size_64.h  |   2 +-
  arch/powerpc/kernel/fadump.c |   1 +
  arch/powerpc/kernel/setup-common.c   |   8 +-
  arch/powerpc/mm/Makefile |   4 +-
  arch/powerpc/mm/hash_utils_64.c  |   1 +
  arch/powerpc/mm/hugetlbpage-book3e.c |  52 ++---
  arch/powerpc/mm/hugetlbpage-hash64.c |  16 ++
  arch/powerpc/mm/hugetlbpage.c| 245 ---
  arch/powerpc/mm/pgtable.c| 114 +++
  arch/powerpc/mm/slice.c  | 132 ++--
  arch/powerpc/mm/tlb_low_64e.S|  31 ---
  arch/powerpc/mm/tlb_nohash.c |  13 --
  arch/powerpc/platforms/Kconfig.cputype   |   4 +
  31 files changed, 438 insertions(+), 548 deletions(-)
  create mode 100644 arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
  delete mode 100644 arch/powerpc/include/asm/nohash/64/slice.h
  create mode 100644 arch/powerpc/include/asm/nohash/hugetlb-book3e.h



Re: [PATCH v1 11/27] powerpc/mm: split asm/hugetlb.h into dedicated subarch files

2019-03-21 Thread Christophe Leroy

snowpatch fails applying this.

I usually base my patches on branch merge. Shouldn't snowpatch use merge 
branch as well instead of next branch ?


Christophe

Le 20/03/2019 à 11:06, Christophe Leroy a écrit :

Three subarches support hugepages:
- fsl book3e
- book3s/64
- 8xx

This patch splits asm/hugetlb.h to reduce the #ifdef mess.

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/book3s/64/hugetlb.h | 41 +++
  arch/powerpc/include/asm/hugetlb.h   | 89 ++--
  arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 32 +
  arch/powerpc/include/asm/nohash/hugetlb-book3e.h | 31 +
  4 files changed, 108 insertions(+), 85 deletions(-)
  create mode 100644 arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
  create mode 100644 arch/powerpc/include/asm/nohash/hugetlb-book3e.h

diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h 
b/arch/powerpc/include/asm/book3s/64/hugetlb.h
index ec2a55a553c7..2f9cf2bc601c 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
@@ -62,4 +62,45 @@ extern pte_t huge_ptep_modify_prot_start(struct 
vm_area_struct *vma,
  extern void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t old_pte, pte_t new_pte);
+/*
+ * This should work for other subarchs too. But right now we use the
+ * new format only for 64bit book3s
+ */
+static inline pte_t *hugepd_page(hugepd_t hpd)
+{
+   if (WARN_ON(!hugepd_ok(hpd)))
+   return NULL;
+   /*
+* We have only four bits to encode, MMU page size
+*/
+   BUILD_BUG_ON((MMU_PAGE_COUNT - 1) > 0xf);
+   return __va(hpd_val(hpd) & HUGEPD_ADDR_MASK);
+}
+
+static inline unsigned int hugepd_mmu_psize(hugepd_t hpd)
+{
+   return (hpd_val(hpd) & HUGEPD_SHIFT_MASK) >> 2;
+}
+
+static inline unsigned int hugepd_shift(hugepd_t hpd)
+{
+   return mmu_psize_to_shift(hugepd_mmu_psize(hpd));
+}
+static inline void flush_hugetlb_page(struct vm_area_struct *vma,
+ unsigned long vmaddr)
+{
+   if (radix_enabled())
+   return radix__flush_hugetlb_page(vma, vmaddr);
+}
+
+static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
+   unsigned int pdshift)
+{
+   unsigned long idx = (addr & ((1UL << pdshift) - 1)) >> 
hugepd_shift(hpd);
+
+   return hugepd_page(hpd) + idx;
+}
+
+void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
+
  #endif
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 48c29686c78e..fd5c0873a57d 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -6,85 +6,13 @@
  #include 
  
  #ifdef CONFIG_PPC_BOOK3S_64

-
  #include 
-/*
- * This should work for other subarchs too. But right now we use the
- * new format only for 64bit book3s
- */
-static inline pte_t *hugepd_page(hugepd_t hpd)
-{
-   if (WARN_ON(!hugepd_ok(hpd)))
-   return NULL;
-   /*
-* We have only four bits to encode, MMU page size
-*/
-   BUILD_BUG_ON((MMU_PAGE_COUNT - 1) > 0xf);
-   return __va(hpd_val(hpd) & HUGEPD_ADDR_MASK);
-}
-
-static inline unsigned int hugepd_mmu_psize(hugepd_t hpd)
-{
-   return (hpd_val(hpd) & HUGEPD_SHIFT_MASK) >> 2;
-}
-
-static inline unsigned int hugepd_shift(hugepd_t hpd)
-{
-   return mmu_psize_to_shift(hugepd_mmu_psize(hpd));
-}
-static inline void flush_hugetlb_page(struct vm_area_struct *vma,
- unsigned long vmaddr)
-{
-   if (radix_enabled())
-   return radix__flush_hugetlb_page(vma, vmaddr);
-}
-
-#else
-
-static inline pte_t *hugepd_page(hugepd_t hpd)
-{
-   if (WARN_ON(!hugepd_ok(hpd)))
-   return NULL;
-#ifdef CONFIG_PPC_8xx
-   return (pte_t *)__va(hpd_val(hpd) & ~HUGEPD_SHIFT_MASK);
-#else
-   return (pte_t *)((hpd_val(hpd) &
- ~HUGEPD_SHIFT_MASK) | PD_HUGE);
-#endif
-}
-
-static inline unsigned int hugepd_shift(hugepd_t hpd)
-{
-#ifdef CONFIG_PPC_8xx
-   return ((hpd_val(hpd) & _PMD_PAGE_MASK) >> 1) + 17;
-#else
-   return hpd_val(hpd) & HUGEPD_SHIFT_MASK;
-#endif
-}
-
+#elif defined(CONFIG_PPC_FSL_BOOK3E)
+#include 
+#elif defined(CONFIG_PPC_8xx)
+#include 
  #endif /* CONFIG_PPC_BOOK3S_64 */
  
-

-static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
-   unsigned pdshift)
-{
-   /*
-* On FSL BookE, we have multiple higher-level table entries that
-* point to the same hugepte.  Just use the first one since they're all
-* identical.  So for that case, idx=0.
-*/
-   unsigned long idx = 0;
-
-   pte_t *dir = hugepd_page(hpd);
-#ifdef CONFIG_PPC_8xx
-   idx = (addr & ((1UL << pdshift) - 1)) 

Re: [PATCH v5 05/10] powerpc: Add a framework for Kernel Userspace Access Protection

2019-03-21 Thread Christophe Leroy




Le 20/03/2019 à 14:04, Christophe Leroy a écrit :



Le 20/03/2019 à 13:57, Michael Ellerman a écrit :

Christophe Leroy  writes:

Le 08/03/2019 à 02:16, Michael Ellerman a écrit :

From: Christophe Leroy 

This patch implements a framework for Kernel Userspace Access
Protection.

Then subarches will have the possibility to provide their own
implementation by providing setup_kuap() and
allow/prevent_user_access().

Some platforms will need to know the area accessed and whether it is
accessed from read, write or both. Therefore source, destination and
size and handed over to the two functions.

mpe: Rename to allow/prevent rather than unlock/lock, and add
read/write wrappers. Drop the 32-bit code for now until we have an
implementation for it. Add kuap to pt_regs for 64-bit as well as
32-bit. Don't split strings, use pr_crit_ratelimited().

Signed-off-by: Christophe Leroy 
Signed-off-by: Russell Currey 
Signed-off-by: Michael Ellerman 
---
v5: Futex ops need read/write so use allow_user_acccess() there.
  Use #ifdef CONFIG_PPC64 in kup.h to fix build errors.
  Allow subarch to override allow_read/write_from/to_user().


Those little helpers that will just call allow_user_access() when
distinct read/write handling is not performed looks overkill to me.

Can't the subarch do it by itself based on the nullity of from/to ?

static inline void allow_user_access(void __user *to, const void __user
*from,
 unsigned long size)
{
if (to & from)
    set_kuap(0);
else if (to)
    set_kuap(AMR_KUAP_BLOCK_READ);
else if (from)
    set_kuap(AMR_KUAP_BLOCK_WRITE);
}


You could implement it that way, but it reads better at the call sites
if we have:

allow_write_to_user(uaddr, sizeof(*uaddr));
vs:
allow_user_access(uaddr, NULL, sizeof(*uaddr));

So I'm inclined to keep them. It should all end up inlined and generate
the same code at the end of the day.



I was not suggesting to completly remove allow_write_to_user(), I fully 
agree that it reads better at the call sites.


I was just thinking that allow_write_to_user() could remain generic and 
call the subarch specific allow_user_access() instead of making multiple 
subarch's allow_write_to_user()


Otherwise, could we do something like the following, so that subarches 
may implement it or not ?


#ifndef allow_read_from_user
static inline void allow_read_from_user(const void __user *from, 
unsigned long size)

{
allow_user_access(NULL, from, size);
}
#endif

#ifndef allow_write_from_user
static inline void allow_write_to_user(void __user *to, unsigned long size)
{
allow_user_access(to, NULL, size);
}
#endif

Christophe



But both solution are OK for me at the end.

Christophe


Re: [PATCH] powerpc/highmem: change BUG_ON() to WARN_ON()

2019-03-21 Thread Christophe Leroy




Le 21/03/2019 à 11:07, Michael Ellerman a écrit :

Christophe Leroy  writes:


Le 21/03/2019 à 06:29, Michael Ellerman a écrit :

Christophe Leroy  writes:

In arch/powerpc/mm/highmem.c, BUG_ON() is called only when
CONFIG_DEBUG_HIGHMEM is selected, this means the BUG_ON() is
not vital and can be replaced by a a WARN_ON

At the sametime, use IS_ENABLED() instead of #ifdef to clean a bit.

Signed-off-by: Christophe Leroy 
---
   arch/powerpc/mm/highmem.c | 12 
   1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/highmem.c b/arch/powerpc/mm/highmem.c
index 82a0e37557a5..b68c9f20fbdf 100644
--- a/arch/powerpc/mm/highmem.c
+++ b/arch/powerpc/mm/highmem.c
@@ -56,7 +54,7 @@ EXPORT_SYMBOL(kmap_atomic_prot);
   void __kunmap_atomic(void *kvaddr)
   {
unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-   int type __maybe_unused;
+   int type;


Why don't we move type into the block below.


Yes you're right, when Mathieu introduced the __maybe_unused, I was
wrongly thinging that kmap_atomic_idx() was doing something important
that had to be done also when DEBUG was not selected, but indeed it does
nothing else than returning the type.

I'll send a new patch.


I can just fix it up when applying.



Ok, thanks

Christophe



Re: [PATCH] powerpc/highmem: change BUG_ON() to WARN_ON()

2019-03-21 Thread Michael Ellerman
Christophe Leroy  writes:

> Le 21/03/2019 à 06:29, Michael Ellerman a écrit :
>> Christophe Leroy  writes:
>>> In arch/powerpc/mm/highmem.c, BUG_ON() is called only when
>>> CONFIG_DEBUG_HIGHMEM is selected, this means the BUG_ON() is
>>> not vital and can be replaced by a a WARN_ON
>>>
>>> At the sametime, use IS_ENABLED() instead of #ifdef to clean a bit.
>>>
>>> Signed-off-by: Christophe Leroy 
>>> ---
>>>   arch/powerpc/mm/highmem.c | 12 
>>>   1 file changed, 4 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/powerpc/mm/highmem.c b/arch/powerpc/mm/highmem.c
>>> index 82a0e37557a5..b68c9f20fbdf 100644
>>> --- a/arch/powerpc/mm/highmem.c
>>> +++ b/arch/powerpc/mm/highmem.c
>>> @@ -56,7 +54,7 @@ EXPORT_SYMBOL(kmap_atomic_prot);
>>>   void __kunmap_atomic(void *kvaddr)
>>>   {
>>> unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
>>> -   int type __maybe_unused;
>>> +   int type;
>> 
>> Why don't we move type into the block below.
>
> Yes you're right, when Mathieu introduced the __maybe_unused, I was 
> wrongly thinging that kmap_atomic_idx() was doing something important 
> that had to be done also when DEBUG was not selected, but indeed it does 
> nothing else than returning the type.
>
> I'll send a new patch.

I can just fix it up when applying.

cheers


Re: [PATCH] powerpc/security: Fix spectre_v2 reporting

2019-03-21 Thread Diana Madalina Craciun
Reviewed-by: Diana Craciun 

On 3/21/2019 6:24 AM, Michael Ellerman wrote:
> When I updated the spectre_v2 reporting to handle software count cache
> flush I got the logic wrong when there's no software count cache
> enabled at all.
>
> The result is that on systems with the software count cache flush
> disabled we print:
>
>   Mitigation: Indirect branch cache disabled, Software count cache flush
>
> Which correctly indicates that the count cache is disabled, but
> incorrectly says the software count cache flush is enabled.
>
> The root of the problem is that we are trying to handle all
> combinations of options. But we know now that we only expect to see
> the software count cache flush enabled if the other options are false.
>
> So split the two cases, which simplifies the logic and fixes the bug.
> We were also missing a space before "(hardware accelerated)".
>
> The result is we see one of:
>
>   Mitigation: Indirect branch serialisation (kernel only)
>   Mitigation: Indirect branch cache disabled
>   Mitigation: Software count cache flush
>   Mitigation: Software count cache flush (hardware accelerated)
>
> Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache 
> flush")
> Cc: sta...@vger.kernel.org # v4.19+
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/kernel/security.c | 23 ---
>  1 file changed, 8 insertions(+), 15 deletions(-)
>
> diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
> index 9b8631533e02..b33bafb8fcea 100644
> --- a/arch/powerpc/kernel/security.c
> +++ b/arch/powerpc/kernel/security.c
> @@ -190,29 +190,22 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct 
> device_attribute *attr, c
>   bcs = security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED);
>   ccd = security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED);
>  
> - if (bcs || ccd || count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) {
> - bool comma = false;
> + if (bcs || ccd) {
>   seq_buf_printf(, "Mitigation: ");
>  
> - if (bcs) {
> + if (bcs)
>   seq_buf_printf(, "Indirect branch serialisation 
> (kernel only)");
> - comma = true;
> - }
>  
> - if (ccd) {
> - if (comma)
> - seq_buf_printf(, ", ");
> - seq_buf_printf(, "Indirect branch cache disabled");
> - comma = true;
> - }
> -
> - if (comma)
> + if (bcs && ccd)
>   seq_buf_printf(, ", ");
>  
> - seq_buf_printf(, "Software count cache flush");
> + if (ccd)
> + seq_buf_printf(, "Indirect branch cache disabled");
> + } else if (count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) {
> + seq_buf_printf(, "Mitigation: Software count cache flush");
>  
>   if (count_cache_flush_type == COUNT_CACHE_FLUSH_HW)
> - seq_buf_printf(, "(hardware accelerated)");
> + seq_buf_printf(, " (hardware accelerated)");
>   } else if (btb_flush_enabled) {
>   seq_buf_printf(, "Mitigation: Branch predictor state flush");
>   } else {




Re: [PATCH v4 06/17] KVM: PPC: Book3S HV: XIVE: add controls for the EQ configuration

2019-03-21 Thread Cédric Le Goater
On 3/21/19 12:09 AM, David Gibson wrote:
> On Wed, Mar 20, 2019 at 09:37:40AM +0100, Cédric Le Goater wrote:
>> These controls will be used by the H_INT_SET_QUEUE_CONFIG and
>> H_INT_GET_QUEUE_CONFIG hcalls from QEMU to configure the underlying
>> Event Queue in the XIVE IC. They will also be used to restore the
>> configuration of the XIVE EQs and to capture the internal run-time
>> state of the EQs. Both 'get' and 'set' rely on an OPAL call to access
>> the EQ toggle bit and EQ index which are updated by the XIVE IC when
>> event notifications are enqueued in the EQ.
>>
>> The value of the guest physical address of the event queue is saved in
>> the XIVE internal xive_q structure for later use. That is when
>> migration needs to mark the EQ pages dirty to capture a consistent
>> memory state of the VM.
>>
>> To be noted that H_INT_SET_QUEUE_CONFIG does not require the extra
>> OPAL call setting the EQ toggle bit and EQ index to configure the EQ,
>> but restoring the EQ state will.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>
>>  Changes since v3 :
>>
>>  - fix the test ont the initial setting of the EQ toggle bit : 0 -> 1
>>  - renamed qsize to qshift
>>  - renamed qpage to qaddr
>>  - checked host page size
>>  - limited flags to KVM_XIVE_EQ_ALWAYS_NOTIFY to fit sPAPR specs
>>  
>>  Changes since v2 :
>>  
>>  - fixed comments on the KVM device attribute definitions
>>  - fixed check on supported EQ size to restrict to 64K pages
>>  - checked kvm_eq.flags that need to be zero
>>  - removed the OPAL call when EQ qtoggle bit and index are zero. 
>>
>>  arch/powerpc/include/asm/xive.h|   2 +
>>  arch/powerpc/include/uapi/asm/kvm.h|  19 ++
>>  arch/powerpc/kvm/book3s_xive.h |   2 +
>>  arch/powerpc/kvm/book3s_xive.c |  15 +-
>>  arch/powerpc/kvm/book3s_xive_native.c  | 242 +
>>  Documentation/virtual/kvm/devices/xive.txt |  34 +++
>>  6 files changed, 308 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/xive.h 
>> b/arch/powerpc/include/asm/xive.h
>> index b579a943407b..c4e88abd3b67 100644
>> --- a/arch/powerpc/include/asm/xive.h
>> +++ b/arch/powerpc/include/asm/xive.h
>> @@ -73,6 +73,8 @@ struct xive_q {
>>  u32 esc_irq;
>>  atomic_tcount;
>>  atomic_tpending_count;
>> +u64 guest_qaddr;
>> +u32 guest_qshift;
>>  };
>>  
>>  /* Global enable flags for the XIVE support */
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>> b/arch/powerpc/include/uapi/asm/kvm.h
>> index e8161e21629b..85005400fd86 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -681,6 +681,7 @@ struct kvm_ppc_cpu_char {
>>  #define KVM_DEV_XIVE_GRP_CTRL   1
>>  #define KVM_DEV_XIVE_GRP_SOURCE 2   /* 64-bit source 
>> identifier */
>>  #define KVM_DEV_XIVE_GRP_SOURCE_CONFIG  3   /* 64-bit source 
>> identifier */
>> +#define KVM_DEV_XIVE_GRP_EQ_CONFIG  4   /* 64-bit EQ identifier */
>>  
>>  /* Layout of 64-bit XIVE source attribute values */
>>  #define KVM_XIVE_LEVEL_SENSITIVE(1ULL << 0)
>> @@ -696,4 +697,22 @@ struct kvm_ppc_cpu_char {
>>  #define KVM_XIVE_SOURCE_EISN_SHIFT  33
>>  #define KVM_XIVE_SOURCE_EISN_MASK   0xfffeULL
>>  
>> +/* Layout of 64-bit EQ identifier */
>> +#define KVM_XIVE_EQ_PRIORITY_SHIFT  0
>> +#define KVM_XIVE_EQ_PRIORITY_MASK   0x7
>> +#define KVM_XIVE_EQ_SERVER_SHIFT3
>> +#define KVM_XIVE_EQ_SERVER_MASK 0xfff8ULL
>> +
>> +/* Layout of EQ configuration values (64 bytes) */
>> +struct kvm_ppc_xive_eq {
>> +__u32 flags;
>> +__u32 qshift;
>> +__u64 qaddr;
>> +__u32 qtoggle;
>> +__u32 qindex;
>> +__u8  pad[40];
>> +};
>> +
>> +#define KVM_XIVE_EQ_ALWAYS_NOTIFY   0x0001
>> +
>>  #endif /* __LINUX_KVM_POWERPC_H */
>> diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
>> index ae26fe653d98..622f594d93e1 100644
>> --- a/arch/powerpc/kvm/book3s_xive.h
>> +++ b/arch/powerpc/kvm/book3s_xive.h
>> @@ -272,6 +272,8 @@ struct kvmppc_xive_src_block 
>> *kvmppc_xive_create_src_block(
>>  struct kvmppc_xive *xive, int irq);
>>  void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb);
>>  int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio);
>> +int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
>> +  bool single_escalation);
>>  
>>  #endif /* CONFIG_KVM_XICS */
>>  #endif /* _KVM_PPC_BOOK3S_XICS_H */
>> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
>> index e09f3addffe5..c1b7aa7dbc28 100644
>> --- a/arch/powerpc/kvm/book3s_xive.c
>> +++ b/arch/powerpc/kvm/book3s_xive.c
>> @@ -166,7 +166,8 @@ static irqreturn_t xive_esc_irq(int irq, void *data)
>>  return IRQ_HANDLED;
>>  }
>>  
>> -static int xive_attach_escalation(struct kvm_vcpu *vcpu, 

Re: [PATCH] compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING

2019-03-21 Thread Heiko Carstens
On Wed, Mar 20, 2019 at 03:20:27PM +0900, Masahiro Yamada wrote:
> Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
> CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.
> 
> The idea is obviously arch-agnostic although we need some code fixups.
> This commit moves the config entry from arch/x86/Kconfig.debug to
> lib/Kconfig.debug so that all architectures (except MIPS for now) can
> benefit from it.
> 
> At this moment, I added "depends on !MIPS" because fixing 0day bot reports
> for MIPS was complex to me.
> 
> I tested this patch on my arm/arm64 boards.
> 
> This can make a huge difference in kernel image size especially when
> CONFIG_OPTIMIZE_FOR_SIZE is enabled.
> 
> For example, I got 3.5% smaller arm64 kernel image for v5.1-rc1.
> 
>   dec   file
>   18983424  arch/arm64/boot/Image.before
>   18321920  arch/arm64/boot/Image.after

Well, this will change, since now people (have to) start adding
__always_inline annotations on all architectures, most likely until
all have about the same amount of annotations like x86. This will
reduce the benefit.

Not sure if it's really a win that we get the inline vs
__always_inline discussion now on all architectures.



Re: [PATCH] powerpc/security: Fix spectre_v2 reporting

2019-03-21 Thread Michael Neuling
On Thu, 2019-03-21 at 15:24 +1100, Michael Ellerman wrote:
> When I updated the spectre_v2 reporting to handle software count cache
> flush I got the logic wrong when there's no software count cache
> enabled at all.
> 
> The result is that on systems with the software count cache flush
> disabled we print:
> 
>   Mitigation: Indirect branch cache disabled, Software count cache flush
> 
> Which correctly indicates that the count cache is disabled, but
> incorrectly says the software count cache flush is enabled.
> 
> The root of the problem is that we are trying to handle all
> combinations of options. But we know now that we only expect to see
> the software count cache flush enabled if the other options are false.
> 
> So split the two cases, which simplifies the logic and fixes the bug.
> We were also missing a space before "(hardware accelerated)".
> 
> The result is we see one of:
> 
>   Mitigation: Indirect branch serialisation (kernel only)
>   Mitigation: Indirect branch cache disabled
>   Mitigation: Software count cache flush
>   Mitigation: Software count cache flush (hardware accelerated)
> 
> Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache 
> flush")
> Cc: sta...@vger.kernel.org # v4.19+
> Signed-off-by: Michael Ellerman 

LGTM

Reviewed-by: Michael Neuling 

> ---
>  arch/powerpc/kernel/security.c | 23 ---
>  1 file changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
> index 9b8631533e02..b33bafb8fcea 100644
> --- a/arch/powerpc/kernel/security.c
> +++ b/arch/powerpc/kernel/security.c
> @@ -190,29 +190,22 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct 
> device_attribute *attr, c
>   bcs = security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED);
>   ccd = security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED);
>  
> - if (bcs || ccd || count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) {
> - bool comma = false;
> + if (bcs || ccd) {
>   seq_buf_printf(, "Mitigation: ");
>  
> - if (bcs) {
> + if (bcs)
>   seq_buf_printf(, "Indirect branch serialisation 
> (kernel only)");
> - comma = true;
> - }
>  
> - if (ccd) {
> - if (comma)
> - seq_buf_printf(, ", ");
> - seq_buf_printf(, "Indirect branch cache disabled");
> - comma = true;
> - }
> -
> - if (comma)
> + if (bcs && ccd)
>   seq_buf_printf(, ", ");
>  
> - seq_buf_printf(, "Software count cache flush");
> + if (ccd)
> + seq_buf_printf(, "Indirect branch cache disabled");
> + } else if (count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) {
> + seq_buf_printf(, "Mitigation: Software count cache flush");
>  
>   if (count_cache_flush_type == COUNT_CACHE_FLUSH_HW)
> - seq_buf_printf(, "(hardware accelerated)");
> + seq_buf_printf(, " (hardware accelerated)");
>   } else if (btb_flush_enabled) {
>   seq_buf_printf(, "Mitigation: Branch predictor state flush");
>   } else {



Re: [PATCH] powerpc/highmem: change BUG_ON() to WARN_ON()

2019-03-21 Thread Christophe Leroy




Le 21/03/2019 à 06:29, Michael Ellerman a écrit :

Christophe Leroy  writes:

In arch/powerpc/mm/highmem.c, BUG_ON() is called only when
CONFIG_DEBUG_HIGHMEM is selected, this means the BUG_ON() is
not vital and can be replaced by a a WARN_ON

At the sametime, use IS_ENABLED() instead of #ifdef to clean a bit.

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/mm/highmem.c | 12 
  1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/highmem.c b/arch/powerpc/mm/highmem.c
index 82a0e37557a5..b68c9f20fbdf 100644
--- a/arch/powerpc/mm/highmem.c
+++ b/arch/powerpc/mm/highmem.c
@@ -56,7 +54,7 @@ EXPORT_SYMBOL(kmap_atomic_prot);
  void __kunmap_atomic(void *kvaddr)
  {
unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-   int type __maybe_unused;
+   int type;


Why don't we move type into the block below.


Yes you're right, when Mathieu introduced the __maybe_unused, I was 
wrongly thinging that kmap_atomic_idx() was doing something important 
that had to be done also when DEBUG was not selected, but indeed it does 
nothing else than returning the type.


I'll send a new patch.

Christophe



eg:


@@ -66,12 +64,11 @@ void __kunmap_atomic(void *kvaddr)
  

-   type = kmap_atomic_idx();
  
-#ifdef CONFIG_DEBUG_HIGHMEM

-   {
+   if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM)) {

int type = kmap_atomic_idx();

unsigned int idx;
  
  		idx = type + KM_TYPE_NR * smp_processor_id();

-   BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
+   WARN_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));



cheers