date:20181005

Re: [ANN] init-kconfig - easy way to embrace Linux's kconfig

2018-10-05 Thread Ulf Magnusson

On Thu, Oct 4, 2018 at 10:03 PM Luis Chamberlain  wrote:
>
> Every now and then a project is born, and they decide to use Linux's
> kconfig to enable configuration of their project. As it stands we *know*
> kconfig is now used in at least over 12 different projects [0]. I myself
> added kconfig to one as well years ago. Even research reveals that
> kconfig has become one of the leading industrial variability modeling
> languages [1] [2].
>
> What is often difficult to do though is to start off using kconfig and
> integrating it into a project. Or updating / syncing to the latest
> kconfig from upstream Linux.
>
> I had yet another need to use kconfig for another small project so
> decided to make a clean template others can use and help keep it in sync.
> This is a passive fork which aims to keep in sync with the Linux
> kernel's latest kconfig to make it easier to keep up to date and to
> enable new projects to use and embrace kconfig on their own.  The goal
> is *not* to fork kconfig and evolve it separately, but rather keep in
> sync with the evolution of kconfig on Linux to make it easier for
> projects to use kconfig and also update their own kconfig when needed.
>
> This may also be useful if folks want to test R code on a smaller
> compartamentalized codebase.
>
> If you find this useful and you'd like to help keep it in sync, send
> patches my way as the kernel's kconfig evolves. The code is up on
> gitlab).) [3].
>
> Do we want to document this option on Linux in case folks want to try
> and embrace kconfig on their own for other projects?
>
> [0] http://www.eng.uwaterloo.ca/~shshe/kconfig_semantics.pdf
> [1] http://gsd.uwaterloo.ca/sites/default/files/vm-2013-berger.pdf
> [2] http://gsd.uwaterloo.ca/sites/default/files/ase241-berger_0.pdf
> [3] https://gitlab.com/mcgrof/init-kconfig
>
>   Luis

Shameless self-plug:

There's also a Python Kconfig implementation that's starting to get
picked up by several projects: https://github.com/ulfalizer/kconfiglib

It has a terminal menuconfig interface with a lot more features than
mconf (a demonstration is available at
https://raw.githubusercontent.com/ulfalizer/Kconfiglib/screenshots/screenshots/menuconfig.gif),
and can also be used e.g. to generate cross-referenced Kconfig
documentation that includes propagated dependencies:
https://docs.zephyrproject.org/latest/reference/kconfig/index.html
(note: heavy page).

Kconfiglib is based around a library (an old version appears in e.g.
U-Boot and Yocto, and a newer version in e.g. Espressif). The
documentation generation is just a script
(https://github.com/zephyrproject-rtos/zephyr/blob/master/doc/scripts/genrest.py),
and the same goes for the menuconfig and the other tools. The core
library takes part of all the trickiness related to evaluating
symbols.

I realize there would probably be massive opposition to adding a
Python dependency to a core part of the kernel, so I'm not going for
that. For most other projects, I think it's a good fit though.

Cheers,
Ulf

Re: [ANN] init-kconfig - easy way to embrace Linux's kconfig

2018-10-05 Thread Ulf Magnusson

On Thu, Oct 4, 2018 at 10:03 PM Luis Chamberlain  wrote:
>
> Every now and then a project is born, and they decide to use Linux's
> kconfig to enable configuration of their project. As it stands we *know*
> kconfig is now used in at least over 12 different projects [0]. I myself
> added kconfig to one as well years ago. Even research reveals that
> kconfig has become one of the leading industrial variability modeling
> languages [1] [2].
>
> What is often difficult to do though is to start off using kconfig and
> integrating it into a project. Or updating / syncing to the latest
> kconfig from upstream Linux.
>
> I had yet another need to use kconfig for another small project so
> decided to make a clean template others can use and help keep it in sync.
> This is a passive fork which aims to keep in sync with the Linux
> kernel's latest kconfig to make it easier to keep up to date and to
> enable new projects to use and embrace kconfig on their own.  The goal
> is *not* to fork kconfig and evolve it separately, but rather keep in
> sync with the evolution of kconfig on Linux to make it easier for
> projects to use kconfig and also update their own kconfig when needed.
>
> This may also be useful if folks want to test R code on a smaller
> compartamentalized codebase.
>
> If you find this useful and you'd like to help keep it in sync, send
> patches my way as the kernel's kconfig evolves. The code is up on
> gitlab).) [3].
>
> Do we want to document this option on Linux in case folks want to try
> and embrace kconfig on their own for other projects?
>
> [0] http://www.eng.uwaterloo.ca/~shshe/kconfig_semantics.pdf
> [1] http://gsd.uwaterloo.ca/sites/default/files/vm-2013-berger.pdf
> [2] http://gsd.uwaterloo.ca/sites/default/files/ase241-berger_0.pdf
> [3] https://gitlab.com/mcgrof/init-kconfig
>
>   Luis

Shameless self-plug:

There's also a Python Kconfig implementation that's starting to get
picked up by several projects: https://github.com/ulfalizer/kconfiglib

It has a terminal menuconfig interface with a lot more features than
mconf (a demonstration is available at
https://raw.githubusercontent.com/ulfalizer/Kconfiglib/screenshots/screenshots/menuconfig.gif),
and can also be used e.g. to generate cross-referenced Kconfig
documentation that includes propagated dependencies:
https://docs.zephyrproject.org/latest/reference/kconfig/index.html
(note: heavy page).

Kconfiglib is based around a library (an old version appears in e.g.
U-Boot and Yocto, and a newer version in e.g. Espressif). The
documentation generation is just a script
(https://github.com/zephyrproject-rtos/zephyr/blob/master/doc/scripts/genrest.py),
and the same goes for the menuconfig and the other tools. The core
library takes part of all the trickiness related to evaluating
symbols.

I realize there would probably be massive opposition to adding a
Python dependency to a core part of the kernel, so I'm not going for
that. For most other projects, I think it's a good fit though.

Cheers,
Ulf

Re: [PATCH 7/7] Documentation: devicetree: Add Xilinx R5 rproc binding

2018-10-05 Thread Bjorn Andersson

On Thu 16 Aug 00:06 PDT 2018, Wendy Liang wrote:

> Add device tree binding for Xilinx Cortex-r5 remoteproc.
> 
> Signed-off-by: Wendy Liang 
> ---
>  .../remoteproc/xlnx,zynqmp-r5-remoteproc.txt   | 81 
> ++
>  1 file changed, 81 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5-remoteproc.txt
> 
> diff --git 
> a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5-remoteproc.txt 
> b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5-remoteproc.txt
> new file mode 100644
> index 000..3940019
> --- /dev/null
> +++ 
> b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5-remoteproc.txt
> @@ -0,0 +1,81 @@
> +Xilinx ARM Cortex A53-R5 remoteproc driver
> +==
> +
> +ZynqMP family of devices use two Cortex R5 processors to help with various
> +low power / real time tasks.
> +
> +This driver requires specific ZynqMP hardware design.
> +
> +ZynqMP R5 RemoteProc Device Node:
> +=
> +A zynqmp_r5_remoteproc device node is used to represent a R5 IP instance
> +within ZynqMP SoC.
> +
> +Required properties:
> +
> + - compatible : Should be "xlnx,zynqmp-r5-remoteproc-1.0"

What is 1.0?

> + - reg : Address and length of the register set for the device. It
> +contains in the same order as described reg-names
> + - reg-names: Contain the register set names.
> +  "tcm_a" and "tcm_b" for TCM memories.
> +  If the user uses the remoteproc driver with the RPMsg kernel
> +  driver,"ipi" for the IPI register used to communicate with RPU
> +  is also required.
> +  Otherwise, if user only uses the remoteproc driver to boot RPU
> +  firmware, "ipi" is not required.
> + - tcm-pnode-id: TCM resources power nodes IDs which are used to request TCM
> + resources for the remoteproc driver to access.
> + - rpu-pnode-id : RPU power node id which is used by the remoteproc driver
> +  to start RPU or shut it down.
> +
> +Optional properties:
> +
> + - core_conf : R5 core configuration (valid string - split0 or split1 or
> +   lock-step), default is lock-step.
> + - memory-region: memories regions for RPU executable and DMA memory.
> + - interrupts : Interrupt mapping for remoteproc IPI. It is required if the
> +user uses the remoteproc driver with the RPMsg kernel driver.
> + - interrupt-parent : Phandle for the interrupt controller. It is required if
> +  the user uses the remoteproc driver with the RPMsg 
> kernel
> +  kernel driver.
> +
> +Example:
> +
> + reserved-memory {
> + #address-cells = <2>;
> + #size-cells = <2>;
> + ranges;
> + rproc_0_fw_reserved: rproc@3ed00 {
> + compatible = "rproc-prog-memory";
> + no-map;
> + reg = <0x0 0x3ed0 0x0 0x4>;
> + };
> + rproc_0_dma_reserved: rproc@3ed40 {
> + compatible = "shared-dma-pool";
> + no-map;
> + reg = <0x0 0x3ed4 0x0 0x8>;
> + };
> + };
> +
> + firmware {
> + zynqmp_firmware: zynqmp-firmware {
> + compatible = "xlnx,zynqmp-firmware";
> + method = "smc";
> + };
> + };
> +
> + zynqmp-r5-remoteproc@0 {

remoteproc@ffe0 {

> + compatible = "xlnx,zynqmp-r5-remoteproc-1.0";
> + reg = <0x0 0xFFE0 0x0 0x1>,
> + <0x0 0xFFE2 0x0 0x1>,
> + <0x0 0xff34 0x0 0x100>;

Make all addresses lowercase, rather than mixing case.

> + reg-names = "tcm_a", "tcm_b", "ipi";
> + dma-ranges;
> + core_conf = "split0";
> + memory-region = <_0_fw_reserved>,
> + <_0_dma_reserved>;
> + tcm-pnode-id = <0xf>, <0x10>;
> + rpu-pnode-id = <0x7>;
> + interrupt-parent = <>;
> + interrupts = <0 29 4>;

interrutps = ;

> + } ;
> -- 
> 2.7.4
> 

Regards,
Bjorn

Re: [PATCH 7/7] Documentation: devicetree: Add Xilinx R5 rproc binding

2018-10-05 Thread Bjorn Andersson

On Thu 16 Aug 00:06 PDT 2018, Wendy Liang wrote:

> Add device tree binding for Xilinx Cortex-r5 remoteproc.
> 
> Signed-off-by: Wendy Liang 
> ---
>  .../remoteproc/xlnx,zynqmp-r5-remoteproc.txt   | 81 
> ++
>  1 file changed, 81 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5-remoteproc.txt
> 
> diff --git 
> a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5-remoteproc.txt 
> b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5-remoteproc.txt
> new file mode 100644
> index 000..3940019
> --- /dev/null
> +++ 
> b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5-remoteproc.txt
> @@ -0,0 +1,81 @@
> +Xilinx ARM Cortex A53-R5 remoteproc driver
> +==
> +
> +ZynqMP family of devices use two Cortex R5 processors to help with various
> +low power / real time tasks.
> +
> +This driver requires specific ZynqMP hardware design.
> +
> +ZynqMP R5 RemoteProc Device Node:
> +=
> +A zynqmp_r5_remoteproc device node is used to represent a R5 IP instance
> +within ZynqMP SoC.
> +
> +Required properties:
> +
> + - compatible : Should be "xlnx,zynqmp-r5-remoteproc-1.0"

What is 1.0?

> + - reg : Address and length of the register set for the device. It
> +contains in the same order as described reg-names
> + - reg-names: Contain the register set names.
> +  "tcm_a" and "tcm_b" for TCM memories.
> +  If the user uses the remoteproc driver with the RPMsg kernel
> +  driver,"ipi" for the IPI register used to communicate with RPU
> +  is also required.
> +  Otherwise, if user only uses the remoteproc driver to boot RPU
> +  firmware, "ipi" is not required.
> + - tcm-pnode-id: TCM resources power nodes IDs which are used to request TCM
> + resources for the remoteproc driver to access.
> + - rpu-pnode-id : RPU power node id which is used by the remoteproc driver
> +  to start RPU or shut it down.
> +
> +Optional properties:
> +
> + - core_conf : R5 core configuration (valid string - split0 or split1 or
> +   lock-step), default is lock-step.
> + - memory-region: memories regions for RPU executable and DMA memory.
> + - interrupts : Interrupt mapping for remoteproc IPI. It is required if the
> +user uses the remoteproc driver with the RPMsg kernel driver.
> + - interrupt-parent : Phandle for the interrupt controller. It is required if
> +  the user uses the remoteproc driver with the RPMsg 
> kernel
> +  kernel driver.
> +
> +Example:
> +
> + reserved-memory {
> + #address-cells = <2>;
> + #size-cells = <2>;
> + ranges;
> + rproc_0_fw_reserved: rproc@3ed00 {
> + compatible = "rproc-prog-memory";
> + no-map;
> + reg = <0x0 0x3ed0 0x0 0x4>;
> + };
> + rproc_0_dma_reserved: rproc@3ed40 {
> + compatible = "shared-dma-pool";
> + no-map;
> + reg = <0x0 0x3ed4 0x0 0x8>;
> + };
> + };
> +
> + firmware {
> + zynqmp_firmware: zynqmp-firmware {
> + compatible = "xlnx,zynqmp-firmware";
> + method = "smc";
> + };
> + };
> +
> + zynqmp-r5-remoteproc@0 {

remoteproc@ffe0 {

> + compatible = "xlnx,zynqmp-r5-remoteproc-1.0";
> + reg = <0x0 0xFFE0 0x0 0x1>,
> + <0x0 0xFFE2 0x0 0x1>,
> + <0x0 0xff34 0x0 0x100>;

Make all addresses lowercase, rather than mixing case.

> + reg-names = "tcm_a", "tcm_b", "ipi";
> + dma-ranges;
> + core_conf = "split0";
> + memory-region = <_0_fw_reserved>,
> + <_0_dma_reserved>;
> + tcm-pnode-id = <0xf>, <0x10>;
> + rpu-pnode-id = <0x7>;
> + interrupt-parent = <>;
> + interrupts = <0 29 4>;

interrutps = ;

> + } ;
> -- 
> 2.7.4
> 

Regards,
Bjorn

Re: [PATCH 6/7] remoteproc: Add Xilinx ZynqMP R5 remoteproc

2018-10-05 Thread Bjorn Andersson

On Thu 16 Aug 00:06 PDT 2018, Wendy Liang wrote:
> diff --git a/drivers/remoteproc/zynqmp_r5_remoteproc.c 
> b/drivers/remoteproc/zynqmp_r5_remoteproc.c
> new file mode 100644
> index 000..7fc3718
> --- /dev/null
> +++ b/drivers/remoteproc/zynqmp_r5_remoteproc.c
> @@ -0,0 +1,692 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Zynq R5 Remote Processor driver
> + *
> + * Copyright (C) 2015 Xilinx, Inc.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "remoteproc_internal.h"
> +
> +/* IPI reg offsets */
> +#define TRIG_OFFSET  0x
> +#define OBS_OFFSET   0x0004
> +#define ISR_OFFSET   0x0010
> +#define IMR_OFFSET   0x0014
> +#define IER_OFFSET   0x0018
> +#define IDR_OFFSET   0x001C
> +#define IPI_ALL_MASK 0x0F0F0301
> +
> +/* RPU IPI mask */
> +#define RPU_IPI_INIT_MASK0x0100
> +#define RPU_IPI_MASK(n)  (RPU_IPI_INIT_MASK << (n))
> +#define RPU_0_IPI_MASK   RPU_IPI_MASK(0)
> +#define RPU_1_IPI_MASK   RPU_IPI_MASK(1)

Rather than using 2 levels of macros, just define RPU_0_IPI_MASK and
RPU_1_IPI_MASK as BIT(8) and BIT(9)

> +
> +/* PM proc states */
> +#define PM_PROC_STATE_ACTIVE 1u

Unused

> +
> +/* Maximum TCM power nodes IDs */
> +#define MAX_TCM_PNODES 4
> +
> +/* Register access macros */
> +#define reg_read(base, reg) \
> + readl(((void __iomem *)(base)) + (reg))
> +#define reg_write(base, reg, val) \
> + writel((val), ((void __iomem *)(base)) + (reg))

Please drop these macros, using readl/writel directly rather than hiding
it behind a similar macro will make it easier to read the code.

> +
> +#define DEFAULT_FIRMWARE_NAME"rproc-rpu-fw"
> +
> +static bool autoboot __read_mostly;

A variable only read during probe() doesn't need hints.

> +
> +struct zynqmp_r5_rproc_pdata;

No need to forward declare this, as the very next statement is the
declaration of this struct.

> +
> +/**
> + * struct zynqmp_r5_rproc_pdata - zynqmp rpu remote processor instance state
> + * @rproc: rproc handle
> + * @workqueue: workqueue for the RPU remoteproc
> + * @ipi_base: virt ptr to IPI channel address registers for APU
> + * @rpu_mode: RPU core configuration
> + * @rpu_id: RPU CPU id
> + * @rpu_pnode_id: RPU CPU power domain id
> + * @mem_pools: list of gen_pool for firmware mmio_sram memory and their
> + * power domain IDs

mem_pools is not a member of the struct.

> + * @mems: list of rproc_mem_entries for firmware

Please reorder to match struct.

> + * @irq: IRQ number
> + * @ipi_dest_mask: IPI destination mask for the IPI channel
> + */
> +struct zynqmp_r5_rproc_pdata {
> + struct rproc *rproc;
> + struct work_struct workqueue;

This is the work object, not the work queue. Please update naming
("work" is a common choice to this).

> + void __iomem *ipi_base;
> + enum rpu_oper_mode rpu_mode;
> + struct list_head mems;

Consider renaming to mem_entries.

> + u32 ipi_dest_mask;
> + u32 rpu_id;
> + u32 rpu_pnode_id;
> + int irq;
> + u32 tcm_pnode_id[MAX_TCM_PNODES];
> +};
> +
> +/**
> + * r5_boot_addr_config - configure the boot address of R5

Add () on the function name in kerneldoc.

> + * @pdata: platform data
> + * @bootmem: boot from LOVEC or HIVEC
> + *
> + * This function will set the RPU boot address
> + */
> +static void r5_boot_addr_config(struct zynqmp_r5_rproc_pdata *pdata,
> + enum rpu_boot_mem bootmem)
> +{
> + const struct zynqmp_eemi_ops *eemi = zynqmp_pm_get_eemi_ops();

I presume this will return the same eemi as when it was called right
before in zynqmp_r5_rproc_start(). How about passing eemi from the
caller?

> +
> + pr_debug("%s: R5 ID: %d, boot_dev %d\n",
> +  __func__, pdata->rpu_id, bootmem);
> +
> + if (!eemi || !eemi->ioctl) {

If eemi is NULL zynqmp_r5_rproc_start() already aborted. How about
making zynqmp_r5_rproc_start() also check to see that eemi->ioctl is
non-NULL? and then just skip this check.

> + pr_err("%s: no eemi ioctl operation.\n", __func__);
> + return;
> + }
> + eemi->ioctl(pdata->rpu_pnode_id, IOCTL_RPU_BOOT_ADDR_CONFIG,
> + bootmem, 0, NULL);
> +}
> +
> +/**
> + * r5_mode_config - configure R5 operation mode
> + * @pdata: platform data
> + *
> + * configure R5 to split mode or lockstep mode
> + * based on the platform data.
> + */
> +static void r5_mode_config(struct zynqmp_r5_rproc_pdata *pdata)
> +{
> + const struct zynqmp_eemi_ops *eemi = zynqmp_pm_get_eemi_ops();

Same comments as for r5_boot_addr_config()

> +
> + pr_debug("%s: mode: %d\n", __func__, pdata->rpu_mode);
> +
> + if (!eemi || !eemi->ioctl) {
> +

Re: [PATCH 6/7] remoteproc: Add Xilinx ZynqMP R5 remoteproc

2018-10-05 Thread Bjorn Andersson

On Thu 16 Aug 00:06 PDT 2018, Wendy Liang wrote:
> diff --git a/drivers/remoteproc/zynqmp_r5_remoteproc.c 
> b/drivers/remoteproc/zynqmp_r5_remoteproc.c
> new file mode 100644
> index 000..7fc3718
> --- /dev/null
> +++ b/drivers/remoteproc/zynqmp_r5_remoteproc.c
> @@ -0,0 +1,692 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Zynq R5 Remote Processor driver
> + *
> + * Copyright (C) 2015 Xilinx, Inc.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "remoteproc_internal.h"
> +
> +/* IPI reg offsets */
> +#define TRIG_OFFSET  0x
> +#define OBS_OFFSET   0x0004
> +#define ISR_OFFSET   0x0010
> +#define IMR_OFFSET   0x0014
> +#define IER_OFFSET   0x0018
> +#define IDR_OFFSET   0x001C
> +#define IPI_ALL_MASK 0x0F0F0301
> +
> +/* RPU IPI mask */
> +#define RPU_IPI_INIT_MASK0x0100
> +#define RPU_IPI_MASK(n)  (RPU_IPI_INIT_MASK << (n))
> +#define RPU_0_IPI_MASK   RPU_IPI_MASK(0)
> +#define RPU_1_IPI_MASK   RPU_IPI_MASK(1)

Rather than using 2 levels of macros, just define RPU_0_IPI_MASK and
RPU_1_IPI_MASK as BIT(8) and BIT(9)

> +
> +/* PM proc states */
> +#define PM_PROC_STATE_ACTIVE 1u

Unused

> +
> +/* Maximum TCM power nodes IDs */
> +#define MAX_TCM_PNODES 4
> +
> +/* Register access macros */
> +#define reg_read(base, reg) \
> + readl(((void __iomem *)(base)) + (reg))
> +#define reg_write(base, reg, val) \
> + writel((val), ((void __iomem *)(base)) + (reg))

Please drop these macros, using readl/writel directly rather than hiding
it behind a similar macro will make it easier to read the code.

> +
> +#define DEFAULT_FIRMWARE_NAME"rproc-rpu-fw"
> +
> +static bool autoboot __read_mostly;

A variable only read during probe() doesn't need hints.

> +
> +struct zynqmp_r5_rproc_pdata;

No need to forward declare this, as the very next statement is the
declaration of this struct.

> +
> +/**
> + * struct zynqmp_r5_rproc_pdata - zynqmp rpu remote processor instance state
> + * @rproc: rproc handle
> + * @workqueue: workqueue for the RPU remoteproc
> + * @ipi_base: virt ptr to IPI channel address registers for APU
> + * @rpu_mode: RPU core configuration
> + * @rpu_id: RPU CPU id
> + * @rpu_pnode_id: RPU CPU power domain id
> + * @mem_pools: list of gen_pool for firmware mmio_sram memory and their
> + * power domain IDs

mem_pools is not a member of the struct.

> + * @mems: list of rproc_mem_entries for firmware

Please reorder to match struct.

> + * @irq: IRQ number
> + * @ipi_dest_mask: IPI destination mask for the IPI channel
> + */
> +struct zynqmp_r5_rproc_pdata {
> + struct rproc *rproc;
> + struct work_struct workqueue;

This is the work object, not the work queue. Please update naming
("work" is a common choice to this).

> + void __iomem *ipi_base;
> + enum rpu_oper_mode rpu_mode;
> + struct list_head mems;

Consider renaming to mem_entries.

> + u32 ipi_dest_mask;
> + u32 rpu_id;
> + u32 rpu_pnode_id;
> + int irq;
> + u32 tcm_pnode_id[MAX_TCM_PNODES];
> +};
> +
> +/**
> + * r5_boot_addr_config - configure the boot address of R5

Add () on the function name in kerneldoc.

> + * @pdata: platform data
> + * @bootmem: boot from LOVEC or HIVEC
> + *
> + * This function will set the RPU boot address
> + */
> +static void r5_boot_addr_config(struct zynqmp_r5_rproc_pdata *pdata,
> + enum rpu_boot_mem bootmem)
> +{
> + const struct zynqmp_eemi_ops *eemi = zynqmp_pm_get_eemi_ops();

I presume this will return the same eemi as when it was called right
before in zynqmp_r5_rproc_start(). How about passing eemi from the
caller?

> +
> + pr_debug("%s: R5 ID: %d, boot_dev %d\n",
> +  __func__, pdata->rpu_id, bootmem);
> +
> + if (!eemi || !eemi->ioctl) {

If eemi is NULL zynqmp_r5_rproc_start() already aborted. How about
making zynqmp_r5_rproc_start() also check to see that eemi->ioctl is
non-NULL? and then just skip this check.

> + pr_err("%s: no eemi ioctl operation.\n", __func__);
> + return;
> + }
> + eemi->ioctl(pdata->rpu_pnode_id, IOCTL_RPU_BOOT_ADDR_CONFIG,
> + bootmem, 0, NULL);
> +}
> +
> +/**
> + * r5_mode_config - configure R5 operation mode
> + * @pdata: platform data
> + *
> + * configure R5 to split mode or lockstep mode
> + * based on the platform data.
> + */
> +static void r5_mode_config(struct zynqmp_r5_rproc_pdata *pdata)
> +{
> + const struct zynqmp_eemi_ops *eemi = zynqmp_pm_get_eemi_ops();

Same comments as for r5_boot_addr_config()

> +
> + pr_debug("%s: mode: %d\n", __func__, pdata->rpu_mode);
> +
> + if (!eemi || !eemi->ioctl) {
> +

Re: [PATCH v3 3/7] drivers: parisc: Avoids building driver if CONFIG_PARISC is disabled

2018-10-05 Thread Michael Schmitz





Am 05.10.2018 um 15:16 schrieb Leonardo Bras:

Well it's not really that persuasive.  Most people simply let the build
run to completion, but if you have a problem with a job control 3h
timelimit, then create a job that kills itself at 2:59 and then
resubmits itself.  That will produce a complete build in 3h chunks
without any need to call sub Makefiles.



Humm, I probably should have explained better how GitlabCI works.
It works creating a docker container that have a limited lifespan of 3h max.
After that the time is over, this container ceases to exist, leaving no build
objects, only the console log. So there is no way of 'resuming' the building
from where it stopped. I used the 'job' term because it's how they call it,
and I understand it's easily confused with bash jobs.


All of our Makefiles are coded assuming the upper level can prevent
descent into the lower ones.  You're proposing to change that
assumption, requiring a fairly large patch set, which doesn't really
seem to provide a huge benefit.

James


I understand your viewpoint.
But what I propose is not to change that assumption, but instead give some
Makefiles the aditional ability to be called directly and still not build stuff
if they were not enabled in .config.

But, why these chosen Makefiles, and not all of them?
Granularity.
What I am trying to achieve with this patchset is the ability of building
smaller sets of drivers without accidentally building what is not enabled
on .config.
And, in my viewpoint, building a single drivers/DRIVERNAME is small enough to
be fast in most situations.


That already works, doesn't it? So all that you'd need is an offline 
tool to precompute what drivers to actually build with a given config.


'make -n' with some suitable output mangling might do the job.

There may well be other ways to achieve your stated goal, without any 
need to make changes to the kernel build process (which is the result of 
many years of evolution and tuning, BTW).



This change is not supposed to bother the usual way of building the kernel, and


Enough people have voiced their concern to warrant that you should back 
up that claim, IMO. Have you verified that your patchset does not change 
current behaviour when building the entire set of default configurations 
for each supported architecture? Does it reduce or increase overall 
complexity of the build process?



it is not even supposed to add overhead to kernel compilation. And it would,
at least, solve my problem with the 3h limit, and enable the tool
I am building on GiltabCI to help other developers.


(Apropos of nothing: Am I the only one who thinks gitlab might take a 
rather dim view of your creativity in dealing with their limit?)



Thanks for reading,

Leonardo Bras


Thanks for listening!

Cheers,

Michael

Re: [PATCH v3 3/7] drivers: parisc: Avoids building driver if CONFIG_PARISC is disabled

2018-10-05 Thread Michael Schmitz





Am 05.10.2018 um 15:16 schrieb Leonardo Bras:

Well it's not really that persuasive.  Most people simply let the build
run to completion, but if you have a problem with a job control 3h
timelimit, then create a job that kills itself at 2:59 and then
resubmits itself.  That will produce a complete build in 3h chunks
without any need to call sub Makefiles.



Humm, I probably should have explained better how GitlabCI works.
It works creating a docker container that have a limited lifespan of 3h max.
After that the time is over, this container ceases to exist, leaving no build
objects, only the console log. So there is no way of 'resuming' the building
from where it stopped. I used the 'job' term because it's how they call it,
and I understand it's easily confused with bash jobs.


All of our Makefiles are coded assuming the upper level can prevent
descent into the lower ones.  You're proposing to change that
assumption, requiring a fairly large patch set, which doesn't really
seem to provide a huge benefit.

James


I understand your viewpoint.
But what I propose is not to change that assumption, but instead give some
Makefiles the aditional ability to be called directly and still not build stuff
if they were not enabled in .config.

But, why these chosen Makefiles, and not all of them?
Granularity.
What I am trying to achieve with this patchset is the ability of building
smaller sets of drivers without accidentally building what is not enabled
on .config.
And, in my viewpoint, building a single drivers/DRIVERNAME is small enough to
be fast in most situations.


That already works, doesn't it? So all that you'd need is an offline 
tool to precompute what drivers to actually build with a given config.


'make -n' with some suitable output mangling might do the job.

There may well be other ways to achieve your stated goal, without any 
need to make changes to the kernel build process (which is the result of 
many years of evolution and tuning, BTW).



This change is not supposed to bother the usual way of building the kernel, and


Enough people have voiced their concern to warrant that you should back 
up that claim, IMO. Have you verified that your patchset does not change 
current behaviour when building the entire set of default configurations 
for each supported architecture? Does it reduce or increase overall 
complexity of the build process?



it is not even supposed to add overhead to kernel compilation. And it would,
at least, solve my problem with the 3h limit, and enable the tool
I am building on GiltabCI to help other developers.


(Apropos of nothing: Am I the only one who thinks gitlab might take a 
rather dim view of your creativity in dealing with their limit?)



Thanks for reading,

Leonardo Bras


Thanks for listening!

Cheers,

Michael

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread Andrea Arcangeli

Hello,

On Thu, Oct 04, 2018 at 04:05:26PM -0700, David Rientjes wrote:
> The source of the problem needs to be addressed: memory compaction.  We 
> regress because we lose __GFP_NORETRY and pointlessly try reclaim, but 

I commented in detail about the __GFP_NORETRY topic in the other email
so I will skip the discussion about __GFP_NORETRY in the context of
this answer except for the comment at the end of the email to the
actual code that implements __GFP_NORETRY.

> But that's a memory compaction issue, not a thp gfp mask issue; the 
> reclaim issue is responded to below.

Actually memory compaction has no issues whatsoever with
__GFP_THISNODE regardless of __GFP_NORETRY.

> This patch causes an even worse regression if all system memory is 
> fragmented such that thp cannot be allocated because it tries to stress 
> compaction on remote nodes, which ends up unsuccessfully, not just the 
> local node.
> 
> On Haswell, when all memory is fragmented (not just the local node as I 
> obtained by 13.9% regression result), the patch results in a fault latency 
> regression of 40.9% for MADV_HUGEPAGE region of 8GB.  This is because it 
> is thrashing both nodes pointlessly instead of just failing for 
> __GFP_THISNODE.

There's no I/O involved at the very least on compaction, nor we drop
any cache or shrink any slab by mistake by just invoking compaction.
Even when you hit the worst case "all nodes are 100% fragmented"
scenario that generates the 40% increased allocation latency, all
other tasks running in the local node will keep running fine, and they
won't be pushed away forcefully into swap with all their kernel cache
depleted, which is a mlock/mbind privileged behavior that the app
using the MADV_HUGEPAGE lib should not ever been able to inflict on
other processes running in the node from different users (users as in
uid).

Furthermore when you incur the worst case latency after that there's
compact deferred logic skipping compaction next time around if all
nodes were so fragmented to the point of guaranteed failure. While
there's nothing stopping reclaim to run every time COMPACT_SKIPPED is
returned just because compaction keeps succeeding as reclaim keeps
pushing more 2M amounts into swap from the local nodes.

I don't doubt with 1024 nodes things can get pretty bad when they're
all 100% fragmented, __GFP_THISNODE would win in such case, but then
what you're asking then is the __GFP_COMPACT_ONLY behavior. That will
solve it.

What we'd need probably regardless of how we solve this bug (because
not all compaction invocations are THP invocations... and we can't
keep making special cases and optimizations tailored for THP or we end
up in that same 40% higher latency for large skbs and other stuff) is
a more sophisticated COMPACT_DEFERRED logic where you can track when
remote compaction failed. Then you wait many more times before trying
a global compaction. It could be achieved with just a compact_deferred
counter in the zone/pgdat (wherever it fits best).

Overall I don't think the bug we're dealing with and the slowdown of
compaction on the remote nodes are comparable, also considering the
latter will still happen regardless if you've large skbs or other
drivers allocating large amounts of memory as an optimization.

> So the end result is that the patch regresses access latency forever by 
> 13.9% when the local node is fragmented because it is accessing remote thp 
> vs local pages of the native page size, and regresses fault latency of 
> 40.9% when the system is fully fragmented.  The only time that fault 
> latency is improved is when remote memory is not fully fragmented, but 
> then you must incur the remote access latency.

You get THP however which will reduce the TLB miss cost and maximize
TLB usage, so it depends on the app if that 13.9% cost is actually
offseted by the THP benefit or not.

It entirely depends if large part of the workload mostly fits in
in-socket CPU cache. The more the in-socket/node CPU cache pays off,
the more remote-THP also pays off. There would be definitely workloads
that would run faster, not slower, with the remote THP instead of
local PAGE_SIZEd memory. The benefit of THP is also larger for the
guest loads than for host loads, so it depends on that too.

We agree about the latency issue with a ton of RAM and thousands of
nodes, but again that can be mitigated with a NUMA friendly
COMPACT_DEFERRED logic NUMA aware. Even without such
NUMA-aware-compact_deferred logic improvement, the worst case of the
remote compaction behavior still doesn't look nearly as bad as this
bug by thinking about it. And it only is a concern for extremely large
NUMA systems (which may run the risk of running in other solubility
issues in other places if random workloads are applied to it and all
nodes are low on memory and fully fragmented which is far from common
scenario on those large systems), while the bug we fixed was hurting
badly all very common 2 nodes installs with workloads that

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread Andrea Arcangeli

Hello,

On Thu, Oct 04, 2018 at 04:05:26PM -0700, David Rientjes wrote:
> The source of the problem needs to be addressed: memory compaction.  We 
> regress because we lose __GFP_NORETRY and pointlessly try reclaim, but 

I commented in detail about the __GFP_NORETRY topic in the other email
so I will skip the discussion about __GFP_NORETRY in the context of
this answer except for the comment at the end of the email to the
actual code that implements __GFP_NORETRY.

> But that's a memory compaction issue, not a thp gfp mask issue; the 
> reclaim issue is responded to below.

Actually memory compaction has no issues whatsoever with
__GFP_THISNODE regardless of __GFP_NORETRY.

> This patch causes an even worse regression if all system memory is 
> fragmented such that thp cannot be allocated because it tries to stress 
> compaction on remote nodes, which ends up unsuccessfully, not just the 
> local node.
> 
> On Haswell, when all memory is fragmented (not just the local node as I 
> obtained by 13.9% regression result), the patch results in a fault latency 
> regression of 40.9% for MADV_HUGEPAGE region of 8GB.  This is because it 
> is thrashing both nodes pointlessly instead of just failing for 
> __GFP_THISNODE.

There's no I/O involved at the very least on compaction, nor we drop
any cache or shrink any slab by mistake by just invoking compaction.
Even when you hit the worst case "all nodes are 100% fragmented"
scenario that generates the 40% increased allocation latency, all
other tasks running in the local node will keep running fine, and they
won't be pushed away forcefully into swap with all their kernel cache
depleted, which is a mlock/mbind privileged behavior that the app
using the MADV_HUGEPAGE lib should not ever been able to inflict on
other processes running in the node from different users (users as in
uid).

Furthermore when you incur the worst case latency after that there's
compact deferred logic skipping compaction next time around if all
nodes were so fragmented to the point of guaranteed failure. While
there's nothing stopping reclaim to run every time COMPACT_SKIPPED is
returned just because compaction keeps succeeding as reclaim keeps
pushing more 2M amounts into swap from the local nodes.

I don't doubt with 1024 nodes things can get pretty bad when they're
all 100% fragmented, __GFP_THISNODE would win in such case, but then
what you're asking then is the __GFP_COMPACT_ONLY behavior. That will
solve it.

What we'd need probably regardless of how we solve this bug (because
not all compaction invocations are THP invocations... and we can't
keep making special cases and optimizations tailored for THP or we end
up in that same 40% higher latency for large skbs and other stuff) is
a more sophisticated COMPACT_DEFERRED logic where you can track when
remote compaction failed. Then you wait many more times before trying
a global compaction. It could be achieved with just a compact_deferred
counter in the zone/pgdat (wherever it fits best).

Overall I don't think the bug we're dealing with and the slowdown of
compaction on the remote nodes are comparable, also considering the
latter will still happen regardless if you've large skbs or other
drivers allocating large amounts of memory as an optimization.

> So the end result is that the patch regresses access latency forever by 
> 13.9% when the local node is fragmented because it is accessing remote thp 
> vs local pages of the native page size, and regresses fault latency of 
> 40.9% when the system is fully fragmented.  The only time that fault 
> latency is improved is when remote memory is not fully fragmented, but 
> then you must incur the remote access latency.

You get THP however which will reduce the TLB miss cost and maximize
TLB usage, so it depends on the app if that 13.9% cost is actually
offseted by the THP benefit or not.

It entirely depends if large part of the workload mostly fits in
in-socket CPU cache. The more the in-socket/node CPU cache pays off,
the more remote-THP also pays off. There would be definitely workloads
that would run faster, not slower, with the remote THP instead of
local PAGE_SIZEd memory. The benefit of THP is also larger for the
guest loads than for host loads, so it depends on that too.

We agree about the latency issue with a ton of RAM and thousands of
nodes, but again that can be mitigated with a NUMA friendly
COMPACT_DEFERRED logic NUMA aware. Even without such
NUMA-aware-compact_deferred logic improvement, the worst case of the
remote compaction behavior still doesn't look nearly as bad as this
bug by thinking about it. And it only is a concern for extremely large
NUMA systems (which may run the risk of running in other solubility
issues in other places if random workloads are applied to it and all
nodes are low on memory and fully fragmented which is far from common
scenario on those large systems), while the bug we fixed was hurting
badly all very common 2 nodes installs with workloads that

Re: [PATCH v4.19-rc7] treewide: Replace more open-coded allocation size multiplications

2018-10-05 Thread Joel Fernandes

On Fri, Oct 05, 2018 at 05:22:35PM -0700, Greg KH wrote:
> On Fri, Oct 05, 2018 at 05:04:16PM -0700, Kees Cook wrote:
> > On Fri, Oct 5, 2018 at 4:51 PM, Greg KH  wrote:
> > > On Fri, Oct 05, 2018 at 04:35:59PM -0700, Kees Cook wrote:
> > >> As done treewide earlier, this catches several more open-coded
> > >> allocation size calculations that were added to the kernel during the
> > >> merge window. This performs the following mechanical transformations
> > >> using Coccinelle:
> > >>
> > >>   kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
> > >>   kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
> > >>   devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)
> > >>
> > >> Signed-off-by: Kees Cook 
> > >
> > > Has this had any testing in linux-next?
> > 
> > No; they're mechanical transformations (though I did build test them).
> > If you want I could add this to linux-next for a week?
> 
> That would be good, thanks.
> 
> > > And when was "earlier"?
> > 
> > v4.18, when all of these were originally eliminated:
> > 
> > 026f05079b00 treewide: Use array_size() in f2fs_kzalloc()
> > c86065938aab treewide: Use array_size() in f2fs_kmalloc()
> > 76e43e37a407 treewide: Use array_size() in sock_kmalloc()
> > 84ca176bf54a treewide: Use array_size() in kvzalloc_node()
> > fd7becedb1f0 treewide: Use array_size() in vzalloc_node()
> > fad953ce0b22 treewide: Use array_size() in vzalloc()
> > 42bc47b35320 treewide: Use array_size() in vmalloc()
> > a86854d0c599 treewide: devm_kzalloc() -> devm_kcalloc()
> > 3c4211ba8ad8 treewide: devm_kmalloc() -> devm_kmalloc_array()
> > 778e1cdd81bb treewide: kvzalloc() -> kvcalloc()
> > 344476e16acb treewide: kvmalloc() -> kvmalloc_array()
> > 590b5b7d8671 treewide: kzalloc_node() -> kcalloc_node()
> > 6396bb221514 treewide: kzalloc() -> kcalloc()
> > 6da2ec56059c treewide: kmalloc() -> kmalloc_array()
> > 
> > The new patch is catching new open-coded multiplications introduced in 
> > v4.19.
> 
> As this is getting smaller, why not just break it up and do it through
> all of the different subsystems instead of one large patch?
> 
> And do we have a way to add a rule to 0-day to catch these so that they
> get a warning when they are added again?

They could just be added to scripts/coccinelle and 0-day will report them?

For example, 0-day ran scripts/coccinelle/api/platform_no_drv_owner.cocci on
a recently submitted patch and reported it here:
https://lore.kernel.org/lkml/201808301856.vmnjerss%25fengguang...@intel.com/

But I'm not sure if 0-day runs make coccicheck on specific semantic patches,
or runs all of them (CC'd Fengguang).

thanks,

 - Joel

Re: [PATCH v4.19-rc7] treewide: Replace more open-coded allocation size multiplications

2018-10-05 Thread Joel Fernandes

On Fri, Oct 05, 2018 at 05:22:35PM -0700, Greg KH wrote:
> On Fri, Oct 05, 2018 at 05:04:16PM -0700, Kees Cook wrote:
> > On Fri, Oct 5, 2018 at 4:51 PM, Greg KH  wrote:
> > > On Fri, Oct 05, 2018 at 04:35:59PM -0700, Kees Cook wrote:
> > >> As done treewide earlier, this catches several more open-coded
> > >> allocation size calculations that were added to the kernel during the
> > >> merge window. This performs the following mechanical transformations
> > >> using Coccinelle:
> > >>
> > >>   kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
> > >>   kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
> > >>   devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)
> > >>
> > >> Signed-off-by: Kees Cook 
> > >
> > > Has this had any testing in linux-next?
> > 
> > No; they're mechanical transformations (though I did build test them).
> > If you want I could add this to linux-next for a week?
> 
> That would be good, thanks.
> 
> > > And when was "earlier"?
> > 
> > v4.18, when all of these were originally eliminated:
> > 
> > 026f05079b00 treewide: Use array_size() in f2fs_kzalloc()
> > c86065938aab treewide: Use array_size() in f2fs_kmalloc()
> > 76e43e37a407 treewide: Use array_size() in sock_kmalloc()
> > 84ca176bf54a treewide: Use array_size() in kvzalloc_node()
> > fd7becedb1f0 treewide: Use array_size() in vzalloc_node()
> > fad953ce0b22 treewide: Use array_size() in vzalloc()
> > 42bc47b35320 treewide: Use array_size() in vmalloc()
> > a86854d0c599 treewide: devm_kzalloc() -> devm_kcalloc()
> > 3c4211ba8ad8 treewide: devm_kmalloc() -> devm_kmalloc_array()
> > 778e1cdd81bb treewide: kvzalloc() -> kvcalloc()
> > 344476e16acb treewide: kvmalloc() -> kvmalloc_array()
> > 590b5b7d8671 treewide: kzalloc_node() -> kcalloc_node()
> > 6396bb221514 treewide: kzalloc() -> kcalloc()
> > 6da2ec56059c treewide: kmalloc() -> kmalloc_array()
> > 
> > The new patch is catching new open-coded multiplications introduced in 
> > v4.19.
> 
> As this is getting smaller, why not just break it up and do it through
> all of the different subsystems instead of one large patch?
> 
> And do we have a way to add a rule to 0-day to catch these so that they
> get a warning when they are added again?

They could just be added to scripts/coccinelle and 0-day will report them?

For example, 0-day ran scripts/coccinelle/api/platform_no_drv_owner.cocci on
a recently submitted patch and reported it here:
https://lore.kernel.org/lkml/201808301856.vmnjerss%25fengguang...@intel.com/

But I'm not sure if 0-day runs make coccicheck on specific semantic patches,
or runs all of them (CC'd Fengguang).

thanks,

 - Joel

[PATCH v3 0/3] get_user_pages*() and RDMA: first steps

2018-10-05 Thread john . hubbard

From: John Hubbard 

Changes since v2:

-- Absorbed more dirty page handling logic into the put_user_page*(), and
   handled some page releasing loops in infiniband more thoroughly, as per
   Jason Gunthorpe's feedback.

-- Fixed a bug in the put_user_pages*() routines' loops (thanks to
   Ralph Campbell for spotting it).

Changes since v1:

-- Renamed release_user_pages*() to put_user_pages*(), from Jan's feedback.

-- Removed the goldfish.c changes, and instead, only included a single
   user (infiniband) of the new functions. That is because goldfish.c no
   longer has a name collision (it has a release_user_pages() routine), and
   also because infiniband exercises both the put_user_page() and
   put_user_pages*() paths.

-- Updated links to discussions and plans, so as to be sure to include
   bounce buffers, thanks to Jerome's feedback.

Also:

-- Dennis, thanks for your earlier review, and I have not yet added your
   Reviewed-by tag, because this revision changes the things that you had
   previously reviewed, thus potentially requiring another look.

This short series prepares for eventually fixing the problem described
in [1], and is following a plan listed in [2], [3], [4].

Patch 1, although not technically critical to do now, is still nice to
have, because it's already been reviewed by Jan, and it's just one more
thing on the long TODO list here, that is ready to be checked off.

Patch 2 is required in order to allow me (and others, if I'm lucky) to
start submitting changes to convert all of the callsites of
get_user_pages*() and put_page().  I think this will work a lot better
than trying to maintain a massive patchset and submitting all at once.

Patch 3 converts infiniband drivers: put_page() --> put_user_page(), and
also exercises put_user_pages_dirty_locked().

Once these are all in, then the floodgates can open up to convert the large
number of get_user_pages*() callsites.

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

[2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubb...@nvidia.com
Proposed steps for fixing get_user_pages() + DMA problems.

[3]https://lkml.kernel.org/r/20180710082100.mkdwngdv5kkrc...@quack2.suse.cz
Bounce buffers (otherwise [2] is not really viable).

[4] https://lkml.kernel.org/r/20181003162115.gg24...@quack2.suse.cz
Follow-up discussions.

CC: Matthew Wilcox 
CC: Michal Hocko 
CC: Christopher Lameter 
CC: Jason Gunthorpe 
CC: Dan Williams 
CC: Jan Kara 
CC: Al Viro 
CC: Jerome Glisse 
CC: Christoph Hellwig 
CC: Ralph Campbell 

John Hubbard (3):
  mm: get_user_pages: consolidate error handling
  mm: introduce put_user_page*(), placeholder versions
  infiniband/mm: convert put_page() to put_user_page*()

 drivers/infiniband/core/umem.c  |  7 +--
 drivers/infiniband/core/umem_odp.c  |  2 +-
 drivers/infiniband/hw/hfi1/user_pages.c | 11 ++---
 drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +--
 drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ++---
 drivers/infiniband/hw/qib/qib_user_sdma.c   |  8 ++--
 drivers/infiniband/hw/usnic/usnic_uiom.c|  7 +--
 include/linux/mm.h  | 48 -
 mm/gup.c| 37 +---
 9 files changed, 92 insertions(+), 45 deletions(-)

-- 
2.19.0

[PATCH v3 0/3] get_user_pages*() and RDMA: first steps

2018-10-05 Thread john . hubbard

From: John Hubbard 

Changes since v2:

-- Absorbed more dirty page handling logic into the put_user_page*(), and
   handled some page releasing loops in infiniband more thoroughly, as per
   Jason Gunthorpe's feedback.

-- Fixed a bug in the put_user_pages*() routines' loops (thanks to
   Ralph Campbell for spotting it).

Changes since v1:

-- Renamed release_user_pages*() to put_user_pages*(), from Jan's feedback.

-- Removed the goldfish.c changes, and instead, only included a single
   user (infiniband) of the new functions. That is because goldfish.c no
   longer has a name collision (it has a release_user_pages() routine), and
   also because infiniband exercises both the put_user_page() and
   put_user_pages*() paths.

-- Updated links to discussions and plans, so as to be sure to include
   bounce buffers, thanks to Jerome's feedback.

Also:

-- Dennis, thanks for your earlier review, and I have not yet added your
   Reviewed-by tag, because this revision changes the things that you had
   previously reviewed, thus potentially requiring another look.

This short series prepares for eventually fixing the problem described
in [1], and is following a plan listed in [2], [3], [4].

Patch 1, although not technically critical to do now, is still nice to
have, because it's already been reviewed by Jan, and it's just one more
thing on the long TODO list here, that is ready to be checked off.

Patch 2 is required in order to allow me (and others, if I'm lucky) to
start submitting changes to convert all of the callsites of
get_user_pages*() and put_page().  I think this will work a lot better
than trying to maintain a massive patchset and submitting all at once.

Patch 3 converts infiniband drivers: put_page() --> put_user_page(), and
also exercises put_user_pages_dirty_locked().

Once these are all in, then the floodgates can open up to convert the large
number of get_user_pages*() callsites.

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

[2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubb...@nvidia.com
Proposed steps for fixing get_user_pages() + DMA problems.

[3]https://lkml.kernel.org/r/20180710082100.mkdwngdv5kkrc...@quack2.suse.cz
Bounce buffers (otherwise [2] is not really viable).

[4] https://lkml.kernel.org/r/20181003162115.gg24...@quack2.suse.cz
Follow-up discussions.

CC: Matthew Wilcox 
CC: Michal Hocko 
CC: Christopher Lameter 
CC: Jason Gunthorpe 
CC: Dan Williams 
CC: Jan Kara 
CC: Al Viro 
CC: Jerome Glisse 
CC: Christoph Hellwig 
CC: Ralph Campbell 

John Hubbard (3):
  mm: get_user_pages: consolidate error handling
  mm: introduce put_user_page*(), placeholder versions
  infiniband/mm: convert put_page() to put_user_page*()

 drivers/infiniband/core/umem.c  |  7 +--
 drivers/infiniband/core/umem_odp.c  |  2 +-
 drivers/infiniband/hw/hfi1/user_pages.c | 11 ++---
 drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +--
 drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ++---
 drivers/infiniband/hw/qib/qib_user_sdma.c   |  8 ++--
 drivers/infiniband/hw/usnic/usnic_uiom.c|  7 +--
 include/linux/mm.h  | 48 -
 mm/gup.c| 37 +---
 9 files changed, 92 insertions(+), 45 deletions(-)

-- 
2.19.0

[PATCH v3 3/3] infiniband/mm: convert put_page() to put_user_page*()

2018-10-05 Thread john . hubbard

From: John Hubbard 

For code that retains pages via get_user_pages*(),
release those pages via the new put_user_page(), or
put_user_pages*(), instead of put_page()

This prepares for eventually fixing the problem described
in [1], and is following a plan listed in [2], [3], [4].

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

[2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubb...@nvidia.com
Proposed steps for fixing get_user_pages() + DMA problems.

[3]https://lkml.kernel.org/r/20180710082100.mkdwngdv5kkrc...@quack2.suse.cz
Bounce buffers (otherwise [2] is not really viable).

[4] https://lkml.kernel.org/r/20181003162115.gg24...@quack2.suse.cz
Follow-up discussions.

CC: Doug Ledford 
CC: Jason Gunthorpe 
CC: Mike Marciniszyn 
CC: Dennis Dalessandro 
CC: Christian Benvenuti 

CC: linux-r...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux...@kvack.org
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c  |  7 ---
 drivers/infiniband/core/umem_odp.c  |  2 +-
 drivers/infiniband/hw/hfi1/user_pages.c | 11 ---
 drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +++---
 drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ---
 drivers/infiniband/hw/qib/qib_user_sdma.c   |  8 
 drivers/infiniband/hw/usnic/usnic_uiom.c|  7 ---
 7 files changed, 24 insertions(+), 28 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index a41792dbae1f..7ab7a3a35eb4 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -58,9 +58,10 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) {
 
page = sg_page(sg);
-   if (!PageDirty(page) && umem->writable && dirty)
-   set_page_dirty_lock(page);
-   put_page(page);
+   if (umem->writable && dirty)
+   put_user_pages_dirty_lock(, 1);
+   else
+   put_user_page(page);
}
 
sg_free_table(>sg_head);
diff --git a/drivers/infiniband/core/umem_odp.c 
b/drivers/infiniband/core/umem_odp.c
index 6ec748eccff7..6227b89cf05c 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -717,7 +717,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 
user_virt, u64 bcnt,
ret = -EFAULT;
break;
}
-   put_page(local_page_list[j]);
+   put_user_page(local_page_list[j]);
continue;
}
 
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index e341e6dcc388..99ccc0483711 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -121,13 +121,10 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, 
unsigned long vaddr, size_t np
 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 size_t npages, bool dirty)
 {
-   size_t i;
-
-   for (i = 0; i < npages; i++) {
-   if (dirty)
-   set_page_dirty_lock(p[i]);
-   put_page(p[i]);
-   }
+   if (dirty)
+   put_user_pages_dirty_lock(p, npages);
+   else
+   put_user_pages(p, npages);
 
if (mm) { /* during close after signal, mm can be NULL */
down_write(>mmap_sem);
diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c 
b/drivers/infiniband/hw/mthca/mthca_memfree.c
index cc9c0c8ccba3..b8b12effd009 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -481,7 +481,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct 
mthca_uar *uar,
 
ret = pci_map_sg(dev->pdev, _tab->page[i].mem, 1, PCI_DMA_TODEVICE);
if (ret < 0) {
-   put_page(pages[0]);
+   put_user_page(pages[0]);
goto out;
}
 
@@ -489,7 +489,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct 
mthca_uar *uar,
 mthca_uarc_virt(dev, uar, i));
if (ret) {
pci_unmap_sg(dev->pdev, _tab->page[i].mem, 1, 
PCI_DMA_TODEVICE);
-   put_page(sg_page(_tab->page[i].mem));
+   put_user_page(sg_page(_tab->page[i].mem));
goto out;
}
 
@@ -555,7 +555,7 @@ void mthca_cleanup_user_db_tab(struct mthca_dev *dev, 
struct mthca_uar *uar,
if (db_tab->page[i].uvirt) {
mthca_UNMAP_ICM(dev, mthca_uarc_virt(dev, uar, i), 1);
pci_unmap_sg(dev->pdev, _tab->page[i].mem, 1, 
PCI_DMA_TODEVICE);
-

[PATCH v3 1/3] mm: get_user_pages: consolidate error handling

2018-10-05 Thread john . hubbard

From: John Hubbard 

An upcoming patch requires a way to operate on each page that
any of the get_user_pages_*() variants returns.

In preparation for that, consolidate the error handling for
__get_user_pages(). This provides a single location (the "out:" label)
for operating on the collected set of pages that are about to be returned.

As long every use of the "ret" variable is being edited, rename
"ret" --> "err", so that its name matches its true role.
This also gets rid of two shadowed variable declarations, as a
tiny beneficial a side effect.

Reviewed-by: Jan Kara 
Signed-off-by: John Hubbard 
---
 mm/gup.c | 37 ++---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 1abc8b4afff6..05ee7c18e59a 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -660,6 +660,7 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
struct vm_area_struct **vmas, int *nonblocking)
 {
long i = 0;
+   int err = 0;
unsigned int page_mask;
struct vm_area_struct *vma = NULL;
 
@@ -685,18 +686,19 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
if (!vma || start >= vma->vm_end) {
vma = find_extend_vma(mm, start);
if (!vma && in_gate_area(mm, start)) {
-   int ret;
-   ret = get_gate_page(mm, start & PAGE_MASK,
+   err = get_gate_page(mm, start & PAGE_MASK,
gup_flags, ,
pages ? [i] : NULL);
-   if (ret)
-   return i ? : ret;
+   if (err)
+   goto out;
page_mask = 0;
goto next_page;
}
 
-   if (!vma || check_vma_flags(vma, gup_flags))
-   return i ? : -EFAULT;
+   if (!vma || check_vma_flags(vma, gup_flags)) {
+   err = -EFAULT;
+   goto out;
+   }
if (is_vm_hugetlb_page(vma)) {
i = follow_hugetlb_page(mm, vma, pages, vmas,
, _pages, i,
@@ -709,23 +711,25 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
 * If we have a pending SIGKILL, don't keep faulting pages and
 * potentially allocating memory.
 */
-   if (unlikely(fatal_signal_pending(current)))
-   return i ? i : -ERESTARTSYS;
+   if (unlikely(fatal_signal_pending(current))) {
+   err = -ERESTARTSYS;
+   goto out;
+   }
cond_resched();
page = follow_page_mask(vma, start, foll_flags, _mask);
if (!page) {
-   int ret;
-   ret = faultin_page(tsk, vma, start, _flags,
+   err = faultin_page(tsk, vma, start, _flags,
nonblocking);
-   switch (ret) {
+   switch (err) {
case 0:
goto retry;
case -EFAULT:
case -ENOMEM:
case -EHWPOISON:
-   return i ? i : ret;
+   goto out;
case -EBUSY:
-   return i;
+   err = 0;
+   goto out;
case -ENOENT:
goto next_page;
}
@@ -737,7 +741,8 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
 */
goto next_page;
} else if (IS_ERR(page)) {
-   return i ? i : PTR_ERR(page);
+   err = PTR_ERR(page);
+   goto out;
}
if (pages) {
pages[i] = page;
@@ -757,7 +762,9 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
start += page_increm * PAGE_SIZE;
nr_pages -= page_increm;
} while (nr_pages);
-   return i;
+
+out:
+   return i ? i : err;
 }
 
 static bool vma_permits_fault(struct vm_area_struct *vma,
-- 
2.19.0

[PATCH v3 2/3] mm: introduce put_user_page*(), placeholder versions

2018-10-05 Thread john . hubbard

From: John Hubbard 

Introduces put_user_page(), which simply calls put_page().
This provides a way to update all get_user_pages*() callers,
so that they call put_user_page(), instead of put_page().

Also introduces put_user_pages(), and a few dirty/locked variations,
as a replacement for release_pages(), and also as a replacement
for open-coded loops that release multiple pages.
These may be used for subsequent performance improvements,
via batching of pages to be released.

This prepares for eventually fixing the problem described
in [1], and is following a plan listed in [2], [3], [4].

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

[2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubb...@nvidia.com
Proposed steps for fixing get_user_pages() + DMA problems.

[3]https://lkml.kernel.org/r/20180710082100.mkdwngdv5kkrc...@quack2.suse.cz
Bounce buffers (otherwise [2] is not really viable).

[4] https://lkml.kernel.org/r/20181003162115.gg24...@quack2.suse.cz
Follow-up discussions.

CC: Matthew Wilcox 
CC: Michal Hocko 
CC: Christopher Lameter 
CC: Jason Gunthorpe 
CC: Dan Williams 
CC: Jan Kara 
CC: Al Viro 
CC: Jerome Glisse 
CC: Christoph Hellwig 
CC: Ralph Campbell 
Signed-off-by: John Hubbard 
---
 include/linux/mm.h | 48 --
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0416a7204be3..305b206e6851 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -137,6 +137,8 @@ extern int overcommit_ratio_handler(struct ctl_table *, 
int, void __user *,
size_t *, loff_t *);
 extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *,
size_t *, loff_t *);
+int set_page_dirty(struct page *page);
+int set_page_dirty_lock(struct page *page);
 
 #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
 
@@ -943,6 +945,50 @@ static inline void put_page(struct page *page)
__put_page(page);
 }
 
+/* Pages that were pinned via get_user_pages*() should be released via
+ * either put_user_page(), or one of the put_user_pages*() routines
+ * below.
+ */
+static inline void put_user_page(struct page *page)
+{
+   put_page(page);
+}
+
+static inline void put_user_pages_dirty(struct page **pages,
+   unsigned long npages)
+{
+   unsigned long index;
+
+   for (index = 0; index < npages; index++) {
+   if (!PageDirty(pages[index]))
+   set_page_dirty(pages[index]);
+
+   put_user_page(pages[index]);
+   }
+}
+
+static inline void put_user_pages_dirty_lock(struct page **pages,
+unsigned long npages)
+{
+   unsigned long index;
+
+   for (index = 0; index < npages; index++) {
+   if (!PageDirty(pages[index]))
+   set_page_dirty_lock(pages[index]);
+
+   put_user_page(pages[index]);
+   }
+}
+
+static inline void put_user_pages(struct page **pages,
+ unsigned long npages)
+{
+   unsigned long index;
+
+   for (index = 0; index < npages; index++)
+   put_user_page(pages[index]);
+}
+
 #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
 #define SECTION_IN_PAGE_FLAGS
 #endif
@@ -1534,8 +1580,6 @@ int redirty_page_for_writepage(struct writeback_control 
*wbc,
 void account_page_dirtied(struct page *page, struct address_space *mapping);
 void account_page_cleaned(struct page *page, struct address_space *mapping,
  struct bdi_writeback *wb);
-int set_page_dirty(struct page *page);
-int set_page_dirty_lock(struct page *page);
 void __cancel_dirty_page(struct page *page);
 static inline void cancel_dirty_page(struct page *page)
 {
-- 
2.19.0

[PATCH v3 1/3] mm: get_user_pages: consolidate error handling

2018-10-05 Thread john . hubbard

From: John Hubbard 

An upcoming patch requires a way to operate on each page that
any of the get_user_pages_*() variants returns.

In preparation for that, consolidate the error handling for
__get_user_pages(). This provides a single location (the "out:" label)
for operating on the collected set of pages that are about to be returned.

As long every use of the "ret" variable is being edited, rename
"ret" --> "err", so that its name matches its true role.
This also gets rid of two shadowed variable declarations, as a
tiny beneficial a side effect.

Reviewed-by: Jan Kara 
Signed-off-by: John Hubbard 
---
 mm/gup.c | 37 ++---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 1abc8b4afff6..05ee7c18e59a 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -660,6 +660,7 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
struct vm_area_struct **vmas, int *nonblocking)
 {
long i = 0;
+   int err = 0;
unsigned int page_mask;
struct vm_area_struct *vma = NULL;
 
@@ -685,18 +686,19 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
if (!vma || start >= vma->vm_end) {
vma = find_extend_vma(mm, start);
if (!vma && in_gate_area(mm, start)) {
-   int ret;
-   ret = get_gate_page(mm, start & PAGE_MASK,
+   err = get_gate_page(mm, start & PAGE_MASK,
gup_flags, ,
pages ? [i] : NULL);
-   if (ret)
-   return i ? : ret;
+   if (err)
+   goto out;
page_mask = 0;
goto next_page;
}
 
-   if (!vma || check_vma_flags(vma, gup_flags))
-   return i ? : -EFAULT;
+   if (!vma || check_vma_flags(vma, gup_flags)) {
+   err = -EFAULT;
+   goto out;
+   }
if (is_vm_hugetlb_page(vma)) {
i = follow_hugetlb_page(mm, vma, pages, vmas,
, _pages, i,
@@ -709,23 +711,25 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
 * If we have a pending SIGKILL, don't keep faulting pages and
 * potentially allocating memory.
 */
-   if (unlikely(fatal_signal_pending(current)))
-   return i ? i : -ERESTARTSYS;
+   if (unlikely(fatal_signal_pending(current))) {
+   err = -ERESTARTSYS;
+   goto out;
+   }
cond_resched();
page = follow_page_mask(vma, start, foll_flags, _mask);
if (!page) {
-   int ret;
-   ret = faultin_page(tsk, vma, start, _flags,
+   err = faultin_page(tsk, vma, start, _flags,
nonblocking);
-   switch (ret) {
+   switch (err) {
case 0:
goto retry;
case -EFAULT:
case -ENOMEM:
case -EHWPOISON:
-   return i ? i : ret;
+   goto out;
case -EBUSY:
-   return i;
+   err = 0;
+   goto out;
case -ENOENT:
goto next_page;
}
@@ -737,7 +741,8 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
 */
goto next_page;
} else if (IS_ERR(page)) {
-   return i ? i : PTR_ERR(page);
+   err = PTR_ERR(page);
+   goto out;
}
if (pages) {
pages[i] = page;
@@ -757,7 +762,9 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
start += page_increm * PAGE_SIZE;
nr_pages -= page_increm;
} while (nr_pages);
-   return i;
+
+out:
+   return i ? i : err;
 }
 
 static bool vma_permits_fault(struct vm_area_struct *vma,
-- 
2.19.0

[PATCH v3 2/3] mm: introduce put_user_page*(), placeholder versions

2018-10-05 Thread john . hubbard

From: John Hubbard 

Introduces put_user_page(), which simply calls put_page().
This provides a way to update all get_user_pages*() callers,
so that they call put_user_page(), instead of put_page().

Also introduces put_user_pages(), and a few dirty/locked variations,
as a replacement for release_pages(), and also as a replacement
for open-coded loops that release multiple pages.
These may be used for subsequent performance improvements,
via batching of pages to be released.

This prepares for eventually fixing the problem described
in [1], and is following a plan listed in [2], [3], [4].

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

[2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubb...@nvidia.com
Proposed steps for fixing get_user_pages() + DMA problems.

[3]https://lkml.kernel.org/r/20180710082100.mkdwngdv5kkrc...@quack2.suse.cz
Bounce buffers (otherwise [2] is not really viable).

[4] https://lkml.kernel.org/r/20181003162115.gg24...@quack2.suse.cz
Follow-up discussions.

CC: Matthew Wilcox 
CC: Michal Hocko 
CC: Christopher Lameter 
CC: Jason Gunthorpe 
CC: Dan Williams 
CC: Jan Kara 
CC: Al Viro 
CC: Jerome Glisse 
CC: Christoph Hellwig 
CC: Ralph Campbell 
Signed-off-by: John Hubbard 
---
 include/linux/mm.h | 48 --
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0416a7204be3..305b206e6851 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -137,6 +137,8 @@ extern int overcommit_ratio_handler(struct ctl_table *, 
int, void __user *,
size_t *, loff_t *);
 extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *,
size_t *, loff_t *);
+int set_page_dirty(struct page *page);
+int set_page_dirty_lock(struct page *page);
 
 #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
 
@@ -943,6 +945,50 @@ static inline void put_page(struct page *page)
__put_page(page);
 }
 
+/* Pages that were pinned via get_user_pages*() should be released via
+ * either put_user_page(), or one of the put_user_pages*() routines
+ * below.
+ */
+static inline void put_user_page(struct page *page)
+{
+   put_page(page);
+}
+
+static inline void put_user_pages_dirty(struct page **pages,
+   unsigned long npages)
+{
+   unsigned long index;
+
+   for (index = 0; index < npages; index++) {
+   if (!PageDirty(pages[index]))
+   set_page_dirty(pages[index]);
+
+   put_user_page(pages[index]);
+   }
+}
+
+static inline void put_user_pages_dirty_lock(struct page **pages,
+unsigned long npages)
+{
+   unsigned long index;
+
+   for (index = 0; index < npages; index++) {
+   if (!PageDirty(pages[index]))
+   set_page_dirty_lock(pages[index]);
+
+   put_user_page(pages[index]);
+   }
+}
+
+static inline void put_user_pages(struct page **pages,
+ unsigned long npages)
+{
+   unsigned long index;
+
+   for (index = 0; index < npages; index++)
+   put_user_page(pages[index]);
+}
+
 #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
 #define SECTION_IN_PAGE_FLAGS
 #endif
@@ -1534,8 +1580,6 @@ int redirty_page_for_writepage(struct writeback_control 
*wbc,
 void account_page_dirtied(struct page *page, struct address_space *mapping);
 void account_page_cleaned(struct page *page, struct address_space *mapping,
  struct bdi_writeback *wb);
-int set_page_dirty(struct page *page);
-int set_page_dirty_lock(struct page *page);
 void __cancel_dirty_page(struct page *page);
 static inline void cancel_dirty_page(struct page *page)
 {
-- 
2.19.0

[PATCH v3 3/3] infiniband/mm: convert put_page() to put_user_page*()

2018-10-05 Thread john . hubbard

From: John Hubbard 

For code that retains pages via get_user_pages*(),
release those pages via the new put_user_page(), or
put_user_pages*(), instead of put_page()

This prepares for eventually fixing the problem described
in [1], and is following a plan listed in [2], [3], [4].

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

[2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubb...@nvidia.com
Proposed steps for fixing get_user_pages() + DMA problems.

[3]https://lkml.kernel.org/r/20180710082100.mkdwngdv5kkrc...@quack2.suse.cz
Bounce buffers (otherwise [2] is not really viable).

[4] https://lkml.kernel.org/r/20181003162115.gg24...@quack2.suse.cz
Follow-up discussions.

CC: Doug Ledford 
CC: Jason Gunthorpe 
CC: Mike Marciniszyn 
CC: Dennis Dalessandro 
CC: Christian Benvenuti 

CC: linux-r...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux...@kvack.org
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c  |  7 ---
 drivers/infiniband/core/umem_odp.c  |  2 +-
 drivers/infiniband/hw/hfi1/user_pages.c | 11 ---
 drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +++---
 drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ---
 drivers/infiniband/hw/qib/qib_user_sdma.c   |  8 
 drivers/infiniband/hw/usnic/usnic_uiom.c|  7 ---
 7 files changed, 24 insertions(+), 28 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index a41792dbae1f..7ab7a3a35eb4 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -58,9 +58,10 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) {
 
page = sg_page(sg);
-   if (!PageDirty(page) && umem->writable && dirty)
-   set_page_dirty_lock(page);
-   put_page(page);
+   if (umem->writable && dirty)
+   put_user_pages_dirty_lock(, 1);
+   else
+   put_user_page(page);
}
 
sg_free_table(>sg_head);
diff --git a/drivers/infiniband/core/umem_odp.c 
b/drivers/infiniband/core/umem_odp.c
index 6ec748eccff7..6227b89cf05c 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -717,7 +717,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 
user_virt, u64 bcnt,
ret = -EFAULT;
break;
}
-   put_page(local_page_list[j]);
+   put_user_page(local_page_list[j]);
continue;
}
 
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index e341e6dcc388..99ccc0483711 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -121,13 +121,10 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, 
unsigned long vaddr, size_t np
 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 size_t npages, bool dirty)
 {
-   size_t i;
-
-   for (i = 0; i < npages; i++) {
-   if (dirty)
-   set_page_dirty_lock(p[i]);
-   put_page(p[i]);
-   }
+   if (dirty)
+   put_user_pages_dirty_lock(p, npages);
+   else
+   put_user_pages(p, npages);
 
if (mm) { /* during close after signal, mm can be NULL */
down_write(>mmap_sem);
diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c 
b/drivers/infiniband/hw/mthca/mthca_memfree.c
index cc9c0c8ccba3..b8b12effd009 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -481,7 +481,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct 
mthca_uar *uar,
 
ret = pci_map_sg(dev->pdev, _tab->page[i].mem, 1, PCI_DMA_TODEVICE);
if (ret < 0) {
-   put_page(pages[0]);
+   put_user_page(pages[0]);
goto out;
}
 
@@ -489,7 +489,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct 
mthca_uar *uar,
 mthca_uarc_virt(dev, uar, i));
if (ret) {
pci_unmap_sg(dev->pdev, _tab->page[i].mem, 1, 
PCI_DMA_TODEVICE);
-   put_page(sg_page(_tab->page[i].mem));
+   put_user_page(sg_page(_tab->page[i].mem));
goto out;
}
 
@@ -555,7 +555,7 @@ void mthca_cleanup_user_db_tab(struct mthca_dev *dev, 
struct mthca_uar *uar,
if (db_tab->page[i].uvirt) {
mthca_UNMAP_ICM(dev, mthca_uarc_virt(dev, uar, i), 1);
pci_unmap_sg(dev->pdev, _tab->page[i].mem, 1, 
PCI_DMA_TODEVICE);
-

Re: [PATCH] kvm/x86 : avoid shifting signed 32-bit value by 31 bits

2018-10-05 Thread Wei Yang

On Thu, Oct 04, 2018 at 01:47:18PM -0400, Peng Hao wrote:
>
>From: Peng Hao 
>
>  modify AVIC_LOGICAL_ID_ENTRY_VALID_MASK to unsigned
>
>Signed-off-by: Peng Hao 
>---
> arch/x86/kvm/svm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>index d96092b..bf1ded4 100644
>--- a/arch/x86/kvm/svm.c
>+++ b/arch/x86/kvm/svm.c
>@@ -262,7 +262,7 @@ struct amd_svm_iommu_ir {
> };
> 
> #define AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK  (0xFF)
>-#define AVIC_LOGICAL_ID_ENTRY_VALID_MASK  (1 << 31)
>+#define AVIC_LOGICAL_ID_ENTRY_VALID_MASK  (1UL << 31)

It is reasonable to change to unsigned, while not necessary to unsigned
long?

> 
> #define AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK  (0xFFULL)
> #define AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK  (0xFFULL << 12)
>-- 
>1.8.3.1
>

-- 
Wei Yang
Help you, Help me

Re: [PATCH] kvm/x86 : avoid shifting signed 32-bit value by 31 bits

2018-10-05 Thread Wei Yang

On Thu, Oct 04, 2018 at 01:47:18PM -0400, Peng Hao wrote:
>
>From: Peng Hao 
>
>  modify AVIC_LOGICAL_ID_ENTRY_VALID_MASK to unsigned
>
>Signed-off-by: Peng Hao 
>---
> arch/x86/kvm/svm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>index d96092b..bf1ded4 100644
>--- a/arch/x86/kvm/svm.c
>+++ b/arch/x86/kvm/svm.c
>@@ -262,7 +262,7 @@ struct amd_svm_iommu_ir {
> };
> 
> #define AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK  (0xFF)
>-#define AVIC_LOGICAL_ID_ENTRY_VALID_MASK  (1 << 31)
>+#define AVIC_LOGICAL_ID_ENTRY_VALID_MASK  (1UL << 31)

It is reasonable to change to unsigned, while not necessary to unsigned
long?

> 
> #define AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK  (0xFFULL)
> #define AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK  (0xFFULL << 12)
>-- 
>1.8.3.1
>

-- 
Wei Yang
Help you, Help me

Re: [PATCH 2/3] namei: implement AT_THIS_ROOT chroot-like path resolution

2018-10-05 Thread Aleksa Sarai

On 2018-10-05, Jann Horn  wrote:
> > What if we took rename_lock (call it nd->r_seq) at the start of the
> > resolution, and then only tried the __d_path-style check
> >
> >   if (read_seqretry(_lock, nd->r_seq) ||
> >   read_seqretry(_lock, nd->m_seq))
> >   /* do the __d_path lookup. */
> >
> > That way you would only hit the slow path if there were concurrent
> > renames or mounts *and* you are doing a path resolution with
> > AT_THIS_ROOT or AT_BENEATH. I've attached a modified patch that does
> > this (and after some testing it also appears to work).
> 
> Yeah, I think that might do the job.

*phew* I was all out of other ideas. :P

> > ---
> >  fs/namei.c | 49 ++---
> >  1 file changed, 46 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 6f995e6de6b1..12c9be175cb4 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -493,7 +493,7 @@ struct nameidata {
> > struct path root;
> > struct inode*inode; /* path.dentry.d_inode */
> > unsigned intflags;
> > -   unsignedseq, m_seq;
> > +   unsignedseq, m_seq, r_seq;
> > int last_type;
> > unsigneddepth;
> > int total_link_count;
> > @@ -1375,6 +1375,27 @@ static int follow_dotdot_rcu(struct nameidata *nd)
> > return -EXDEV;
> > break;
> > }
> > +   if (unlikely((nd->flags & (LOOKUP_BENEATH | LOOKUP_CHROOT)) 
> > &&
> > +(read_seqretry(_lock, nd->r_seq) ||
> > + read_seqretry(_lock, nd->m_seq {
> > +   char *pathbuf, *pathptr;
> > +
> > +   nd->r_seq = read_seqbegin(_lock);
> > +   /* Cannot take m_seq here. */
> > +
> > +   pathbuf = kmalloc(PATH_MAX, GFP_ATOMIC);
> > +   if (!pathbuf)
> > +   return -ECHILD;
> > +   pathptr = __d_path(>path, >root, pathbuf, 
> > PATH_MAX);
> > +   kfree(pathbuf);
> 
> You're doing this check before actually looking up the parent, right?
> So as long as I don't trigger the "path_equal(>path, >root)"
> check that you do for O_BENEATH, escaping up by one level is possible,
> right? You should probably move this check so that it happens after
> following "..".

Yup, you're right. I'll do that.

> (Also: I assume that you're going to get rid of that memory allocation
> in a future version.)

Sure. Would you prefer adding some scratch space in nameidata, or that I
change __d_path so it accepts NULL as the buffer (and thus it doesn't
actually do any string operations)?

> > if (nd->path.dentry != nd->path.mnt->mnt_root) {
> > int ret = path_parent_directory(>path);
> > if (ret)
> > @@ -2269,6 +2311,9 @@ static const char *path_init(struct nameidata *nd, 
> > unsigned flags)
> > nd->last_type = LAST_ROOT; /* if there are only slashes... */
> > nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
> > nd->depth = 0;
> > +   nd->m_seq = read_seqbegin(_lock);
> > +   nd->r_seq = read_seqbegin(_lock);
> 
> This means that now, attempting to perform a lookup while something is
> holding the rename_lock will spin on the lock. I don't know whether
> that's a problem in practice though. Does anyone on this thread know
> whether this is problematic?

I could make it so that we only take _lock
  if (nd->flags & (FOLLOW_BENEATH | FOLLOW_CHROOT)),
since it's not used outside of that path.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature

Re: [PATCH 2/3] namei: implement AT_THIS_ROOT chroot-like path resolution

2018-10-05 Thread Aleksa Sarai

On 2018-10-05, Jann Horn  wrote:
> > What if we took rename_lock (call it nd->r_seq) at the start of the
> > resolution, and then only tried the __d_path-style check
> >
> >   if (read_seqretry(_lock, nd->r_seq) ||
> >   read_seqretry(_lock, nd->m_seq))
> >   /* do the __d_path lookup. */
> >
> > That way you would only hit the slow path if there were concurrent
> > renames or mounts *and* you are doing a path resolution with
> > AT_THIS_ROOT or AT_BENEATH. I've attached a modified patch that does
> > this (and after some testing it also appears to work).
> 
> Yeah, I think that might do the job.

*phew* I was all out of other ideas. :P

> > ---
> >  fs/namei.c | 49 ++---
> >  1 file changed, 46 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 6f995e6de6b1..12c9be175cb4 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -493,7 +493,7 @@ struct nameidata {
> > struct path root;
> > struct inode*inode; /* path.dentry.d_inode */
> > unsigned intflags;
> > -   unsignedseq, m_seq;
> > +   unsignedseq, m_seq, r_seq;
> > int last_type;
> > unsigneddepth;
> > int total_link_count;
> > @@ -1375,6 +1375,27 @@ static int follow_dotdot_rcu(struct nameidata *nd)
> > return -EXDEV;
> > break;
> > }
> > +   if (unlikely((nd->flags & (LOOKUP_BENEATH | LOOKUP_CHROOT)) 
> > &&
> > +(read_seqretry(_lock, nd->r_seq) ||
> > + read_seqretry(_lock, nd->m_seq {
> > +   char *pathbuf, *pathptr;
> > +
> > +   nd->r_seq = read_seqbegin(_lock);
> > +   /* Cannot take m_seq here. */
> > +
> > +   pathbuf = kmalloc(PATH_MAX, GFP_ATOMIC);
> > +   if (!pathbuf)
> > +   return -ECHILD;
> > +   pathptr = __d_path(>path, >root, pathbuf, 
> > PATH_MAX);
> > +   kfree(pathbuf);
> 
> You're doing this check before actually looking up the parent, right?
> So as long as I don't trigger the "path_equal(>path, >root)"
> check that you do for O_BENEATH, escaping up by one level is possible,
> right? You should probably move this check so that it happens after
> following "..".

Yup, you're right. I'll do that.

> (Also: I assume that you're going to get rid of that memory allocation
> in a future version.)

Sure. Would you prefer adding some scratch space in nameidata, or that I
change __d_path so it accepts NULL as the buffer (and thus it doesn't
actually do any string operations)?

> > if (nd->path.dentry != nd->path.mnt->mnt_root) {
> > int ret = path_parent_directory(>path);
> > if (ret)
> > @@ -2269,6 +2311,9 @@ static const char *path_init(struct nameidata *nd, 
> > unsigned flags)
> > nd->last_type = LAST_ROOT; /* if there are only slashes... */
> > nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
> > nd->depth = 0;
> > +   nd->m_seq = read_seqbegin(_lock);
> > +   nd->r_seq = read_seqbegin(_lock);
> 
> This means that now, attempting to perform a lookup while something is
> holding the rename_lock will spin on the lock. I don't know whether
> that's a problem in practice though. Does anyone on this thread know
> whether this is problematic?

I could make it so that we only take _lock
  if (nd->flags & (FOLLOW_BENEATH | FOLLOW_CHROOT)),
since it's not used outside of that path.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> +#ifndef PARAMS
> +#define PARAMS(x...) x
> +#endif
> +
> +#ifndef ARGS
> +#define ARGS(x...) x
> +#endif
> +

This is also leftover from the first attempt and can be nuked.

Yeah, yeah, I should have reviewed my patches better before sending
them. But I was so excited that I got it working I just wanted to share
the joy!

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> +#ifndef PARAMS
> +#define PARAMS(x...) x
> +#endif
> +
> +#ifndef ARGS
> +#define ARGS(x...) x
> +#endif
> +

This is also leftover from the first attempt and can be nuked.

Yeah, yeah, I should have reviewed my patches better before sending
them. But I was so excited that I got it working I just wanted to share
the joy!

-- Steve

linux-next: Signed-off-by missing for commit in the usb-gadget tree

2018-10-05 Thread Stephen Rothwell

Hi Felipe,

Commit

  89969a842e72 ("usb: gadget: uvc: configfs: Sort frame intervals upon writing")

is missing a Signed-off-by from its committer.

-- 
Cheers,
Stephen Rothwell


pgpfRrjX7qdXu.pgp
Description: OpenPGP digital signature

linux-next: Signed-off-by missing for commit in the usb-gadget tree

2018-10-05 Thread Stephen Rothwell

Hi Felipe,

Commit

  89969a842e72 ("usb: gadget: uvc: configfs: Sort frame intervals upon writing")

is missing a Signed-off-by from its committer.

-- 
Cheers,
Stephen Rothwell


pgpfRrjX7qdXu.pgp
Description: OpenPGP digital signature

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> +#define arch_dynfunc_trampoline(name, def)   \
> + asm volatile (  \
> + ".globl dynfunc_" #name "; \n\t"\
> + "dynfunc_" #name ": \n\t"   \
> + "jmp " #def " \n\t" \
> + ".balign 8 \n \t"   \
> + : : : "memory" )
> +

Note, the assembler can easily put in a two byte jump here. The .balign
was suppose to also have some padding (nop) incase that happens. It's
fine, because we can just replace it with a 5 byte jump, as long as we
have 3 bytes afterward if it is a two byte jump.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> +#define arch_dynfunc_trampoline(name, def)   \
> + asm volatile (  \
> + ".globl dynfunc_" #name "; \n\t"\
> + "dynfunc_" #name ": \n\t"   \
> + "jmp " #def " \n\t" \
> + ".balign 8 \n \t"   \
> + : : : "memory" )
> +

Note, the assembler can easily put in a two byte jump here. The .balign
was suppose to also have some padding (nop) incase that happens. It's
fine, because we can just replace it with a 5 byte jump, as long as we
have 3 bytes afterward if it is a two byte jump.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> From: "Steven Rostedt (VMware)" 
> 
> Signed-off-by: Steven Rostedt (VMware) 
> ---
>  include/asm-generic/vmlinux.lds.h |   4 +
>  include/linux/jump_function.h |  93 
>  kernel/Makefile   |   2 +-
>  kernel/jump_function.c| 368 ++
>  4 files changed, 466 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/jump_function.h
>  create mode 100644 kernel/jump_function.c
> 
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index 7b75ff6e2fce..0e205069ff36 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -257,6 +257,10 @@
>   __start___jump_table = .;   \
>   KEEP(*(__jump_table))   \
>   __stop___jump_table = .;\
> + . = ALIGN(8);   \
> + __start___dynfunc_table = .;\
> + KEEP(*(__dynfunc_table))\
> + __stop___dynfunc_table = .; \
>   . = ALIGN(8);   \
>   __start___verbose = .;  \
>   KEEP(*(__verbose))  \
>

BAH, this is leftover from my first attempt. It's not needed and can be
nuked.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> From: "Steven Rostedt (VMware)" 
> 
> Signed-off-by: Steven Rostedt (VMware) 
> ---
>  include/asm-generic/vmlinux.lds.h |   4 +
>  include/linux/jump_function.h |  93 
>  kernel/Makefile   |   2 +-
>  kernel/jump_function.c| 368 ++
>  4 files changed, 466 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/jump_function.h
>  create mode 100644 kernel/jump_function.c
> 
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index 7b75ff6e2fce..0e205069ff36 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -257,6 +257,10 @@
>   __start___jump_table = .;   \
>   KEEP(*(__jump_table))   \
>   __stop___jump_table = .;\
> + . = ALIGN(8);   \
> + __start___dynfunc_table = .;\
> + KEEP(*(__dynfunc_table))\
> + __stop___dynfunc_table = .; \
>   . = ALIGN(8);   \
>   __start___verbose = .;  \
>   KEEP(*(__verbose))  \
>

BAH, this is leftover from my first attempt. It's not needed and can be
nuked.

-- Steve

linux-next: Signed-off-by missing for commit in the integrity tree

2018-10-05 Thread Stephen Rothwell

Hi all,

Commit

  3dcee2d9c069 ("ima: fix showing large 'violations' or 
'runtime_measurements_count'")

is missing a Signed-off-by from its committer.

-- 
Cheers,
Stephen Rothwell


pgp4cZbLWM5sN.pgp
Description: OpenPGP digital signature

linux-next: Signed-off-by missing for commit in the integrity tree

2018-10-05 Thread Stephen Rothwell

Hi all,

Commit

  3dcee2d9c069 ("ima: fix showing large 'violations' or 
'runtime_measurements_count'")

is missing a Signed-off-by from its committer.

-- 
Cheers,
Stephen Rothwell


pgp4cZbLWM5sN.pgp
Description: OpenPGP digital signature

[POC][RFC][PATCH 0/2] PROOF OF CONCEPT: Dynamic Functions (jump functions)

2018-10-05 Thread Steven Rostedt



This is just a Proof Of Concept (POC), as I have done some "no no"s like
having x86 asm code in generic code paths, and it also needs a way of
working when an arch does not support this feature. Not to mention, I didn't
add proper change logs (that will come later).

Background:

 During David Woodhouse's presentation on Spectre and Meltdown at Kernel
Recipes he talked about how retpolines are implemented. I haven't had time
to look at the details so I haven't given it much thought. But as he
demonstrated that it has a measurable overhead on indirect calls, I realized
how much this can affect tracepoints. Tracepoints are implemented with
indirect calls, where the code iterates over an array calling each callback
that has registered with the tracepoint.

I ran a test to see how much overhead this entails.

With RETPOLINE disabled (CONFIG_RETPOLINE=n):

# trace-cmd start -e all
# perf stat -r 10 /work/c/hackbench 50
Time: 29.369
Time: 28.998
Time: 28.816
Time: 28.734
Time: 29.034
Time: 28.631
Time: 28.594
Time: 28.762
Time: 28.915
Time: 28.741

 Performance counter stats for '/work/c/hackbench 50' (10 runs):

 232926.801609  task-clock (msec) #7.465 CPUs utilized  
  ( +-  0.26% )
 3,175,526  context-switches  #0.014 M/sec  
  ( +-  0.50% )
   394,920  cpu-migrations#0.002 M/sec  
  ( +-  1.71% )
44,273  page-faults   #0.190 K/sec  
  ( +-  1.06% )
   859,904,212,284  cycles#3.692 GHz
  ( +-  0.26% )
   526,010,328,375  stalled-cycles-frontend   #   61.17% frontend cycles 
idle ( +-  0.26% )
   799,414,387,443  instructions  #0.93  insn per cycle
  #0.66  stalled cycles per 
insn  ( +-  0.25% )
   157,516,396,866  branches  #  676.248 M/sec  
  ( +-  0.25% )
   445,888,666  branch-misses #0.28% of all branches
  ( +-  0.19% )

  31.201263687 seconds time elapsed 
 ( +-  0.24% )

With RETPOLINE enabled (CONFIG_RETPOLINE=y)

# trace-cmd start -e all
# perf stat -r 10 /work/c/hackbench 50
Time: 31.087
Time: 31.180
Time: 31.250
Time: 30.905
Time: 31.024
Time: 32.056
Time: 31.312
Time: 31.409
Time: 31.451
Time: 31.275

 Performance counter stats for '/work/c/hackbench 50' (10 runs):

 252893.216212  task-clock (msec) #7.444 CPUs utilized  
  ( +-  0.31% )
 3,218,524  context-switches  #0.013 M/sec  
  ( +-  0.45% )
   427,129  cpu-migrations#0.002 M/sec  
  ( +-  1.52% )
43,666  page-faults   #0.173 K/sec  
  ( +-  0.92% )
   933,615,337,142  cycles#3.692 GHz
  ( +-  0.31% )
   593,141,521,286  stalled-cycles-frontend   #   63.53% frontend cycles 
idle ( +-  0.32% )
   806,848,677,318  instructions  #0.86  insn per cycle
  #0.74  stalled cycles per 
insn  ( +-  0.30% )
   161,289,933,342  branches  #  637.779 M/sec  
  ( +-  0.29% )
 2,070,719,044  branch-misses #1.28% of all branches
  ( +-  0.25% )

  33.971942318 seconds time elapsed 
 ( +-  0.28% )


What the above represents, is running "hackbench 50" with all trace events
enabled, went from: 31.201263687 to: 33.971942318 to perform, which is an
8.9% increase!

So I thought about how to solve this, and came up with "jump_functions".
These are similar to jump_labels, but instead of having a static branch, we
would have a dynamic function. A function "dynfunc_X()" that can be assigned
any other function, just as if it was a variable, and have it call the new
function. Talking with other kernel developers at Kernel Recipes, I was told
that this feature would be useful for other subsystems in the kernel and not
just for tracing.

The first attempt created a call in inline assembly, and did macro tricks to
create the parameters, but this was overly complex, especially when one of
the trace events has 12 parameters!

Then I decided to simplify it to have the dynfunc_X() call a trampoline,
that does a direct jump. It's similar to what a retpoline does, but a
retpoline does an indirect jump. A direct jump is much more efficient.

When changing what function a dynamic function should call, text_poke_bp()
is used to modify the trampoline to call the new function.

The first "no change log" patch implements the dynamic function (poorly, as
its just a proof of concept), and the second "no change log" patch
implements a way that tracepoints can take advantage of it.

The tracepoints

[POC][RFC][PATCH 0/2] PROOF OF CONCEPT: Dynamic Functions (jump functions)

2018-10-05 Thread Steven Rostedt



This is just a Proof Of Concept (POC), as I have done some "no no"s like
having x86 asm code in generic code paths, and it also needs a way of
working when an arch does not support this feature. Not to mention, I didn't
add proper change logs (that will come later).

Background:

 During David Woodhouse's presentation on Spectre and Meltdown at Kernel
Recipes he talked about how retpolines are implemented. I haven't had time
to look at the details so I haven't given it much thought. But as he
demonstrated that it has a measurable overhead on indirect calls, I realized
how much this can affect tracepoints. Tracepoints are implemented with
indirect calls, where the code iterates over an array calling each callback
that has registered with the tracepoint.

I ran a test to see how much overhead this entails.

With RETPOLINE disabled (CONFIG_RETPOLINE=n):

# trace-cmd start -e all
# perf stat -r 10 /work/c/hackbench 50
Time: 29.369
Time: 28.998
Time: 28.816
Time: 28.734
Time: 29.034
Time: 28.631
Time: 28.594
Time: 28.762
Time: 28.915
Time: 28.741

 Performance counter stats for '/work/c/hackbench 50' (10 runs):

 232926.801609  task-clock (msec) #7.465 CPUs utilized  
  ( +-  0.26% )
 3,175,526  context-switches  #0.014 M/sec  
  ( +-  0.50% )
   394,920  cpu-migrations#0.002 M/sec  
  ( +-  1.71% )
44,273  page-faults   #0.190 K/sec  
  ( +-  1.06% )
   859,904,212,284  cycles#3.692 GHz
  ( +-  0.26% )
   526,010,328,375  stalled-cycles-frontend   #   61.17% frontend cycles 
idle ( +-  0.26% )
   799,414,387,443  instructions  #0.93  insn per cycle
  #0.66  stalled cycles per 
insn  ( +-  0.25% )
   157,516,396,866  branches  #  676.248 M/sec  
  ( +-  0.25% )
   445,888,666  branch-misses #0.28% of all branches
  ( +-  0.19% )

  31.201263687 seconds time elapsed 
 ( +-  0.24% )

With RETPOLINE enabled (CONFIG_RETPOLINE=y)

# trace-cmd start -e all
# perf stat -r 10 /work/c/hackbench 50
Time: 31.087
Time: 31.180
Time: 31.250
Time: 30.905
Time: 31.024
Time: 32.056
Time: 31.312
Time: 31.409
Time: 31.451
Time: 31.275

 Performance counter stats for '/work/c/hackbench 50' (10 runs):

 252893.216212  task-clock (msec) #7.444 CPUs utilized  
  ( +-  0.31% )
 3,218,524  context-switches  #0.013 M/sec  
  ( +-  0.45% )
   427,129  cpu-migrations#0.002 M/sec  
  ( +-  1.52% )
43,666  page-faults   #0.173 K/sec  
  ( +-  0.92% )
   933,615,337,142  cycles#3.692 GHz
  ( +-  0.31% )
   593,141,521,286  stalled-cycles-frontend   #   63.53% frontend cycles 
idle ( +-  0.32% )
   806,848,677,318  instructions  #0.86  insn per cycle
  #0.74  stalled cycles per 
insn  ( +-  0.30% )
   161,289,933,342  branches  #  637.779 M/sec  
  ( +-  0.29% )
 2,070,719,044  branch-misses #1.28% of all branches
  ( +-  0.25% )

  33.971942318 seconds time elapsed 
 ( +-  0.28% )


What the above represents, is running "hackbench 50" with all trace events
enabled, went from: 31.201263687 to: 33.971942318 to perform, which is an
8.9% increase!

So I thought about how to solve this, and came up with "jump_functions".
These are similar to jump_labels, but instead of having a static branch, we
would have a dynamic function. A function "dynfunc_X()" that can be assigned
any other function, just as if it was a variable, and have it call the new
function. Talking with other kernel developers at Kernel Recipes, I was told
that this feature would be useful for other subsystems in the kernel and not
just for tracing.

The first attempt created a call in inline assembly, and did macro tricks to
create the parameters, but this was overly complex, especially when one of
the trace events has 12 parameters!

Then I decided to simplify it to have the dynfunc_X() call a trampoline,
that does a direct jump. It's similar to what a retpoline does, but a
retpoline does an indirect jump. A direct jump is much more efficient.

When changing what function a dynamic function should call, text_poke_bp()
is used to modify the trampoline to call the new function.

The first "no change log" patch implements the dynamic function (poorly, as
its just a proof of concept), and the second "no change log" patch
implements a way that tracepoints can take advantage of it.

The tracepoints

[POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Signed-off-by: Steven Rostedt (VMware) 
---
 include/asm-generic/vmlinux.lds.h |   4 +
 include/linux/jump_function.h |  93 
 kernel/Makefile   |   2 +-
 kernel/jump_function.c| 368 ++
 4 files changed, 466 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/jump_function.h
 create mode 100644 kernel/jump_function.c

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 7b75ff6e2fce..0e205069ff36 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -257,6 +257,10 @@
__start___jump_table = .;   \
KEEP(*(__jump_table))   \
__stop___jump_table = .;\
+   . = ALIGN(8);   \
+   __start___dynfunc_table = .;\
+   KEEP(*(__dynfunc_table))\
+   __stop___dynfunc_table = .; \
. = ALIGN(8);   \
__start___verbose = .;  \
KEEP(*(__verbose))  \
diff --git a/include/linux/jump_function.h b/include/linux/jump_function.h
new file mode 100644
index ..8c6b0bab5f10
--- /dev/null
+++ b/include/linux/jump_function.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_JUMP_FUNCTION_H
+#define _LINUX_JUMP_FUNCTION_H
+
+
+ This all should be in arch/x86/include/asm
+
+typedef long dynfunc_t;
+
+struct dynfunc_struct;
+
+#define arch_dynfunc_trampoline(name, def) \
+   asm volatile (  \
+   ".globl dynfunc_" #name "; \n\t"\
+   "dynfunc_" #name ": \n\t"   \
+   "jmp " #def " \n\t" \
+   ".balign 8 \n \t"   \
+   : : : "memory" )
+
+int arch_assign_dynamic_function(const struct dynfunc_struct *dynfunc, void 
*func);
+
+ The below should be in include/linux
+
+#ifndef PARAMS
+#define PARAMS(x...) x
+#endif
+
+#ifndef ARGS
+#define ARGS(x...) x
+#endif
+
+struct dynfunc_struct {
+   const void  *dynfunc;
+   void*func;
+};
+
+int assign_dynamic_function(const struct dynfunc_struct *dynfunc, void *func);
+
+/*
+ * DECLARE_DYNAMIC_FUNCTION - Declaration to create a dynamic function call
+ * @name: The name of the function call to create
+ * @proto: The proto-type of the function (up to 4 args)
+ * @args: The arguments used by @proto
+ *
+ * This macro creates the function that can by used to create a dynamic
+ * function call later. It also creates the function to modify what is
+ * called:
+ *
+ *   dynfunc_[name](args);
+ *
+ * This is placed in the code where the dynamic function should be called
+ * from.
+ *
+ *   assign_dynamic_function_[name](func);
+ *
+ * This is used to make the dynfunc_[name]() call a different function.
+ * It will then call (func) instead.
+ *
+ * This must be added in a header for users of the above two functions.
+ */
+#define DECLARE_DYNAMIC_FUNCTION(name, proto, args)\
+   extern struct dynfunc_struct ___dyn_func__##name;   \
+   static inline int assign_dynamic_function_##name(int(*func)(proto)) { \
+   return assign_dynamic_function(&___dyn_func__##name, func); \
+   }   \
+   extern int dynfunc_##name(proto)
+
+/*
+ * DEFINE_DYNAMIC_FUNCTION - Define the dynamic function and default
+ * @name: The name of the function call to create
+ * @def: The default function to call
+ * @proto: The proto-type of the function (up to 4 args)
+ *
+ * Must be placed in a C file.
+ *
+ * This sets up the dynamic function that other places may call
+ * dynfunc_[name]().
+ *
+ * It defines the default function that the dynamic function will start
+ * out calling at boot up.
+ */
+#define DEFINE_DYNAMIC_FUNCTION(name, def, proto)  \
+   static void __used __dyn_func_trampoline_##name(void)   \
+   {   \
+   arch_dynfunc_trampoline(name, def); \
+   unreachable();  \
+   }   \
+   struct dynfunc_struct ___dyn_func__##name __used = {\
+   .dynfunc= (void *)dynfunc_##name,   \
+   .func   = def,  \
+   }
+
+#endif /*  _LINUX_JUMP_FUNCTION_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index

[POC][RFC][PATCH 2/2] tracepoints: Implement it with dynamic functions

2018-10-05 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/tracepoint-defs.h |  3 ++
 include/linux/tracepoint.h  | 65 ++---
 include/trace/define_trace.h| 14 +++
 kernel/tracepoint.c | 29 +--
 4 files changed, 79 insertions(+), 32 deletions(-)

diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index 22c5a46e9693..a9d267be98de 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+struct dynfunc_struct;
+
 struct trace_print_flags {
unsigned long   mask;
const char  *name;
@@ -30,6 +32,7 @@ struct tracepoint_func {
 struct tracepoint {
const char *name;   /* Tracepoint name */
struct static_key key;
+   struct dynfunc_struct *dynfunc;
int (*regfunc)(void);
void (*unregfunc)(void);
struct tracepoint_func __rcu *funcs;
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 041f7e56a289..800c1b025e1f 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct module;
 struct tracepoint;
@@ -94,7 +95,9 @@ extern int syscall_regfunc(void);
 extern void syscall_unregfunc(void);
 #endif /* CONFIG_HAVE_SYSCALL_TRACEPOINTS */
 
+#ifndef PARAMS
 #define PARAMS(args...) args
+#endif
 
 #define TRACE_DEFINE_ENUM(x)
 #define TRACE_DEFINE_SIZEOF(x)
@@ -138,12 +141,11 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcuidle) \
+#define __DO_TRACE(name, proto, args, cond, rcuidle)   \
do {\
struct tracepoint_func *it_func_ptr;\
-   void *it_func;  \
-   void *__data;   \
int __maybe_unused idx = 0; \
+   void *__data;   \
\
if (!(cond))\
return; \
@@ -163,14 +165,11 @@ extern void syscall_unregfunc(void);
rcu_irq_enter_irqson(); \
}   \
\
-   it_func_ptr = rcu_dereference_raw((tp)->funcs); \
-   \
+   it_func_ptr =   \
+   rcu_dereference_raw((&__tracepoint_##name)->funcs); \
if (it_func_ptr) {  \
-   do {\
-   it_func = (it_func_ptr)->func;  \
-   __data = (it_func_ptr)->data;   \
-   ((void(*)(proto))(it_func))(args);  \
-   } while ((++it_func_ptr)->func);\
+   __data = (it_func_ptr)->data;   \
+   dynfunc_tp_func_##name(args);   \
}   \
\
if (rcuidle) {  \
@@ -186,7 +185,7 @@ extern void syscall_unregfunc(void);
static inline void trace_##name##_rcuidle(proto)\
{   \
if (static_key_false(&__tracepoint_##name.key)) \
-   __DO_TRACE(&__tracepoint_##name,\
+   __DO_TRACE(name,\
TP_PROTO(data_proto),   \
TP_ARGS(data_args), \
TP_CONDITION(cond), 1); \
@@ -208,11 +207,13 @@ extern void syscall_unregfunc(void);
  * poking RCU a bit.
  */
 #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
+   DECLARE_DYNAMIC_FUNCTION(tp_func_##name, PARAMS(data_proto),\
+PARAMS(data_args));\
extern struct tracepoint __tracepoint_##name;

[POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Signed-off-by: Steven Rostedt (VMware) 
---
 include/asm-generic/vmlinux.lds.h |   4 +
 include/linux/jump_function.h |  93 
 kernel/Makefile   |   2 +-
 kernel/jump_function.c| 368 ++
 4 files changed, 466 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/jump_function.h
 create mode 100644 kernel/jump_function.c

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 7b75ff6e2fce..0e205069ff36 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -257,6 +257,10 @@
__start___jump_table = .;   \
KEEP(*(__jump_table))   \
__stop___jump_table = .;\
+   . = ALIGN(8);   \
+   __start___dynfunc_table = .;\
+   KEEP(*(__dynfunc_table))\
+   __stop___dynfunc_table = .; \
. = ALIGN(8);   \
__start___verbose = .;  \
KEEP(*(__verbose))  \
diff --git a/include/linux/jump_function.h b/include/linux/jump_function.h
new file mode 100644
index ..8c6b0bab5f10
--- /dev/null
+++ b/include/linux/jump_function.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_JUMP_FUNCTION_H
+#define _LINUX_JUMP_FUNCTION_H
+
+
+ This all should be in arch/x86/include/asm
+
+typedef long dynfunc_t;
+
+struct dynfunc_struct;
+
+#define arch_dynfunc_trampoline(name, def) \
+   asm volatile (  \
+   ".globl dynfunc_" #name "; \n\t"\
+   "dynfunc_" #name ": \n\t"   \
+   "jmp " #def " \n\t" \
+   ".balign 8 \n \t"   \
+   : : : "memory" )
+
+int arch_assign_dynamic_function(const struct dynfunc_struct *dynfunc, void 
*func);
+
+ The below should be in include/linux
+
+#ifndef PARAMS
+#define PARAMS(x...) x
+#endif
+
+#ifndef ARGS
+#define ARGS(x...) x
+#endif
+
+struct dynfunc_struct {
+   const void  *dynfunc;
+   void*func;
+};
+
+int assign_dynamic_function(const struct dynfunc_struct *dynfunc, void *func);
+
+/*
+ * DECLARE_DYNAMIC_FUNCTION - Declaration to create a dynamic function call
+ * @name: The name of the function call to create
+ * @proto: The proto-type of the function (up to 4 args)
+ * @args: The arguments used by @proto
+ *
+ * This macro creates the function that can by used to create a dynamic
+ * function call later. It also creates the function to modify what is
+ * called:
+ *
+ *   dynfunc_[name](args);
+ *
+ * This is placed in the code where the dynamic function should be called
+ * from.
+ *
+ *   assign_dynamic_function_[name](func);
+ *
+ * This is used to make the dynfunc_[name]() call a different function.
+ * It will then call (func) instead.
+ *
+ * This must be added in a header for users of the above two functions.
+ */
+#define DECLARE_DYNAMIC_FUNCTION(name, proto, args)\
+   extern struct dynfunc_struct ___dyn_func__##name;   \
+   static inline int assign_dynamic_function_##name(int(*func)(proto)) { \
+   return assign_dynamic_function(&___dyn_func__##name, func); \
+   }   \
+   extern int dynfunc_##name(proto)
+
+/*
+ * DEFINE_DYNAMIC_FUNCTION - Define the dynamic function and default
+ * @name: The name of the function call to create
+ * @def: The default function to call
+ * @proto: The proto-type of the function (up to 4 args)
+ *
+ * Must be placed in a C file.
+ *
+ * This sets up the dynamic function that other places may call
+ * dynfunc_[name]().
+ *
+ * It defines the default function that the dynamic function will start
+ * out calling at boot up.
+ */
+#define DEFINE_DYNAMIC_FUNCTION(name, def, proto)  \
+   static void __used __dyn_func_trampoline_##name(void)   \
+   {   \
+   arch_dynfunc_trampoline(name, def); \
+   unreachable();  \
+   }   \
+   struct dynfunc_struct ___dyn_func__##name __used = {\
+   .dynfunc= (void *)dynfunc_##name,   \
+   .func   = def,  \
+   }
+
+#endif /*  _LINUX_JUMP_FUNCTION_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index

[POC][RFC][PATCH 2/2] tracepoints: Implement it with dynamic functions

2018-10-05 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/tracepoint-defs.h |  3 ++
 include/linux/tracepoint.h  | 65 ++---
 include/trace/define_trace.h| 14 +++
 kernel/tracepoint.c | 29 +--
 4 files changed, 79 insertions(+), 32 deletions(-)

diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index 22c5a46e9693..a9d267be98de 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+struct dynfunc_struct;
+
 struct trace_print_flags {
unsigned long   mask;
const char  *name;
@@ -30,6 +32,7 @@ struct tracepoint_func {
 struct tracepoint {
const char *name;   /* Tracepoint name */
struct static_key key;
+   struct dynfunc_struct *dynfunc;
int (*regfunc)(void);
void (*unregfunc)(void);
struct tracepoint_func __rcu *funcs;
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 041f7e56a289..800c1b025e1f 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct module;
 struct tracepoint;
@@ -94,7 +95,9 @@ extern int syscall_regfunc(void);
 extern void syscall_unregfunc(void);
 #endif /* CONFIG_HAVE_SYSCALL_TRACEPOINTS */
 
+#ifndef PARAMS
 #define PARAMS(args...) args
+#endif
 
 #define TRACE_DEFINE_ENUM(x)
 #define TRACE_DEFINE_SIZEOF(x)
@@ -138,12 +141,11 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcuidle) \
+#define __DO_TRACE(name, proto, args, cond, rcuidle)   \
do {\
struct tracepoint_func *it_func_ptr;\
-   void *it_func;  \
-   void *__data;   \
int __maybe_unused idx = 0; \
+   void *__data;   \
\
if (!(cond))\
return; \
@@ -163,14 +165,11 @@ extern void syscall_unregfunc(void);
rcu_irq_enter_irqson(); \
}   \
\
-   it_func_ptr = rcu_dereference_raw((tp)->funcs); \
-   \
+   it_func_ptr =   \
+   rcu_dereference_raw((&__tracepoint_##name)->funcs); \
if (it_func_ptr) {  \
-   do {\
-   it_func = (it_func_ptr)->func;  \
-   __data = (it_func_ptr)->data;   \
-   ((void(*)(proto))(it_func))(args);  \
-   } while ((++it_func_ptr)->func);\
+   __data = (it_func_ptr)->data;   \
+   dynfunc_tp_func_##name(args);   \
}   \
\
if (rcuidle) {  \
@@ -186,7 +185,7 @@ extern void syscall_unregfunc(void);
static inline void trace_##name##_rcuidle(proto)\
{   \
if (static_key_false(&__tracepoint_##name.key)) \
-   __DO_TRACE(&__tracepoint_##name,\
+   __DO_TRACE(name,\
TP_PROTO(data_proto),   \
TP_ARGS(data_args), \
TP_CONDITION(cond), 1); \
@@ -208,11 +207,13 @@ extern void syscall_unregfunc(void);
  * poking RCU a bit.
  */
 #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
+   DECLARE_DYNAMIC_FUNCTION(tp_func_##name, PARAMS(data_proto),\
+PARAMS(data_args));\
extern struct tracepoint __tracepoint_##name;

Re: [PATCH v9 04/10] x86: refcount: prevent gcc distortions

2018-10-05 Thread Rasmus Villemoes

On 2018-10-04 21:33, H. Peter Anvin wrote:

> Here is the horrible code I mentioned yesterday.  This is about
> implementing the immediate-patching framework that Linus and others have
> discussed (it helps both performance and kernel hardening):

Heh, I did a POC in userspace some years ago for loading an "eventually
constant" value into a register - the idea being to avoid a load in
cases like kmemcache_alloc(foo_cachep) or kmemcache_free(foo_cachep, p),
and I assume this is something along the same lines? I didn't do
anything with it since I had no idea if the performance gain would be
worth it, and at the time (before __ro_after_init) there was no good way
to know that the values would really be constant eventually. Also, I had
hoped to come up with a way to avoid having to annotate the loads.

I just tried expanding this to deal with some of the hash tables sized
at init time which I can see was also mentioned on LKML some time ago.
I'm probably missing something fundamental, but there's some sorta
working code at https://github.com/villemoes/rai which is not too
horrible (IMO). Attaching gcc at various times and doing disassembly
shows that the patching does take place. Can I get you to take a look at
raimacros.S and tell me why that won't work?

Thanks,
Rasmus

Re: [PATCH v9 04/10] x86: refcount: prevent gcc distortions

2018-10-05 Thread Rasmus Villemoes

On 2018-10-04 21:33, H. Peter Anvin wrote:

> Here is the horrible code I mentioned yesterday.  This is about
> implementing the immediate-patching framework that Linus and others have
> discussed (it helps both performance and kernel hardening):

Heh, I did a POC in userspace some years ago for loading an "eventually
constant" value into a register - the idea being to avoid a load in
cases like kmemcache_alloc(foo_cachep) or kmemcache_free(foo_cachep, p),
and I assume this is something along the same lines? I didn't do
anything with it since I had no idea if the performance gain would be
worth it, and at the time (before __ro_after_init) there was no good way
to know that the values would really be constant eventually. Also, I had
hoped to come up with a way to avoid having to annotate the loads.

I just tried expanding this to deal with some of the hash tables sized
at init time which I can see was also mentioned on LKML some time ago.
I'm probably missing something fundamental, but there's some sorta
working code at https://github.com/villemoes/rai which is not too
horrible (IMO). Attaching gcc at various times and doing disassembly
shows that the patching does take place. Can I get you to take a look at
raimacros.S and tell me why that won't work?

Thanks,
Rasmus

[PATCH] staging: iio: ad2s1210: fix 'assignment operator' style checks

2018-10-05 Thread Matheus Tavares Bernardino

This patch fixes all "Assignment operator '=' should be on the previous
line" checks found in ad2s1210.c by checkpatch.pl.

Signed-off-by: Matheus Tavares 
---
 drivers/staging/iio/resolver/ad2s1210.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/iio/resolver/ad2s1210.c
b/drivers/staging/iio/resolver/ad2s1210.c
index ac13b99bd9cb..d4b1c2c010f2 100644
--- a/drivers/staging/iio/resolver/ad2s1210.c
+++ b/drivers/staging/iio/resolver/ad2s1210.c
@@ -301,8 +301,8 @@ static ssize_t ad2s1210_store_control(struct device *dev,
 "ad2s1210: write control register fail\n");
 goto error_ret;
 }
-st->resolution
-= ad2s1210_resolution_value[data & AD2S1210_SET_RESOLUTION];
+st->resolution =
+ad2s1210_resolution_value[data & AD2S1210_SET_RESOLUTION];
 if (st->pdata->gpioin) {
 data = ad2s1210_read_resolution_pin(st);
 if (data != st->resolution)
@@ -363,8 +363,8 @@ static ssize_t ad2s1210_store_resolution(struct device *dev,
 dev_err(dev, "ad2s1210: setting resolution fail\n");
 goto error_ret;
 }
-st->resolution
-= ad2s1210_resolution_value[data & AD2S1210_SET_RESOLUTION];
+st->resolution =
+ad2s1210_resolution_value[data & AD2S1210_SET_RESOLUTION];
 if (st->pdata->gpioin) {
 data = ad2s1210_read_resolution_pin(st);
 if (data != st->resolution)
-- 
2.18.0

[PATCH] staging: iio: ad2s1210: fix 'assignment operator' style checks

2018-10-05 Thread Matheus Tavares Bernardino

This patch fixes all "Assignment operator '=' should be on the previous
line" checks found in ad2s1210.c by checkpatch.pl.

Signed-off-by: Matheus Tavares 
---
 drivers/staging/iio/resolver/ad2s1210.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/iio/resolver/ad2s1210.c
b/drivers/staging/iio/resolver/ad2s1210.c
index ac13b99bd9cb..d4b1c2c010f2 100644
--- a/drivers/staging/iio/resolver/ad2s1210.c
+++ b/drivers/staging/iio/resolver/ad2s1210.c
@@ -301,8 +301,8 @@ static ssize_t ad2s1210_store_control(struct device *dev,
 "ad2s1210: write control register fail\n");
 goto error_ret;
 }
-st->resolution
-= ad2s1210_resolution_value[data & AD2S1210_SET_RESOLUTION];
+st->resolution =
+ad2s1210_resolution_value[data & AD2S1210_SET_RESOLUTION];
 if (st->pdata->gpioin) {
 data = ad2s1210_read_resolution_pin(st);
 if (data != st->resolution)
@@ -363,8 +363,8 @@ static ssize_t ad2s1210_store_resolution(struct device *dev,
 dev_err(dev, "ad2s1210: setting resolution fail\n");
 goto error_ret;
 }
-st->resolution
-= ad2s1210_resolution_value[data & AD2S1210_SET_RESOLUTION];
+st->resolution =
+ad2s1210_resolution_value[data & AD2S1210_SET_RESOLUTION];
 if (st->pdata->gpioin) {
 data = ad2s1210_read_resolution_pin(st);
 if (data != st->resolution)
-- 
2.18.0

Re: [PATCH] staging/rtlwifi: Fixing formatting warnings from checkpatch.pl.

2018-10-05 Thread Scott Tracy

On Fri, Oct 5, 2018 at 6:37 PM Joe Perches  wrote:
>
> On Fri, 2018-10-05 at 16:58 -0600, Scott Tracy wrote:
> > Fixing formatting warnings in rtlwifi found by checkpatch.pl
> > Changes include breaking up functions calls into multi line calls.
> > No functional/logical changes.
>
> I believe the code is better before most of these changes.
>
> There are various tradeoffs do source code formatting.
>
> Here you are changing some alignment to open parentheses
> and converting to < 80 columns.
>
>

Joe,
Thanks for the criticism. Fair comments all. I was just working
through the KernelNewbies.org tutorial looking for low value changes
to get my feet wet and assumed that a "check" was better than a
"warning". A also didn't feel confident enough to refactor the code to
get under the 80 character limit. Maybe I will spend a little more
time refactoring something and then submitting that. Thanks again,

Scott Tracy

Re: [PATCH] staging/rtlwifi: Fixing formatting warnings from checkpatch.pl.

2018-10-05 Thread Scott Tracy

On Fri, Oct 5, 2018 at 6:37 PM Joe Perches  wrote:
>
> On Fri, 2018-10-05 at 16:58 -0600, Scott Tracy wrote:
> > Fixing formatting warnings in rtlwifi found by checkpatch.pl
> > Changes include breaking up functions calls into multi line calls.
> > No functional/logical changes.
>
> I believe the code is better before most of these changes.
>
> There are various tradeoffs do source code formatting.
>
> Here you are changing some alignment to open parentheses
> and converting to < 80 columns.
>
>

Joe,
Thanks for the criticism. Fair comments all. I was just working
through the KernelNewbies.org tutorial looking for low value changes
to get my feet wet and assumed that a "check" was better than a
"warning". A also didn't feel confident enough to refactor the code to
get under the 80 character limit. Maybe I will spend a little more
time refactoring something and then submitting that. Thanks again,

Scott Tracy

Re: [PATCH] staging/rtlwifi: Fixing formatting warnings from checkpatch.pl.

2018-10-05 Thread Joe Perches

On Fri, 2018-10-05 at 16:58 -0600, Scott Tracy wrote:
> Fixing formatting warnings in rtlwifi found by checkpatch.pl
> Changes include breaking up functions calls into multi line calls.
> No functional/logical changes. 

I believe the code is better before most of these changes.

There are various tradeoffs do source code formatting.

Here you are changing some alignment to open parentheses
and converting to < 80 columns.

Re: [PATCH] staging/rtlwifi: Fixing formatting warnings from checkpatch.pl.

2018-10-05 Thread Joe Perches

On Fri, 2018-10-05 at 16:58 -0600, Scott Tracy wrote:
> Fixing formatting warnings in rtlwifi found by checkpatch.pl
> Changes include breaking up functions calls into multi line calls.
> No functional/logical changes. 

I believe the code is better before most of these changes.

There are various tradeoffs do source code formatting.

Here you are changing some alignment to open parentheses
and converting to < 80 columns.

Re: [PATCH] fs/cifs: fix uninitialised variable warnings

2018-10-05 Thread Steve French

merged into cifs-2.6.git for-next
On Thu, Oct 4, 2018 at 3:16 AM Aurélien Aptel  wrote:
>
> Reviewed-by: Aurelien Aptel 
> --
> Aurélien Aptel / SUSE Labs Samba Team
> GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> SUSE Linux GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
> GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)



-- 
Thanks,

Steve

Re: [PATCH] fs/cifs: fix uninitialised variable warnings

2018-10-05 Thread Steve French

merged into cifs-2.6.git for-next
On Thu, Oct 4, 2018 at 3:16 AM Aurélien Aptel  wrote:
>
> Reviewed-by: Aurelien Aptel 
> --
> Aurélien Aptel / SUSE Labs Samba Team
> GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> SUSE Linux GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
> GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)



-- 
Thanks,

Steve

Re: [PATCH v4.19-rc7] treewide: Replace more open-coded allocation size multiplications

2018-10-05 Thread Greg KH

On Fri, Oct 05, 2018 at 05:04:16PM -0700, Kees Cook wrote:
> On Fri, Oct 5, 2018 at 4:51 PM, Greg KH  wrote:
> > On Fri, Oct 05, 2018 at 04:35:59PM -0700, Kees Cook wrote:
> >> As done treewide earlier, this catches several more open-coded
> >> allocation size calculations that were added to the kernel during the
> >> merge window. This performs the following mechanical transformations
> >> using Coccinelle:
> >>
> >>   kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
> >>   kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
> >>   devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)
> >>
> >> Signed-off-by: Kees Cook 
> >
> > Has this had any testing in linux-next?
> 
> No; they're mechanical transformations (though I did build test them).
> If you want I could add this to linux-next for a week?

That would be good, thanks.

> > And when was "earlier"?
> 
> v4.18, when all of these were originally eliminated:
> 
> 026f05079b00 treewide: Use array_size() in f2fs_kzalloc()
> c86065938aab treewide: Use array_size() in f2fs_kmalloc()
> 76e43e37a407 treewide: Use array_size() in sock_kmalloc()
> 84ca176bf54a treewide: Use array_size() in kvzalloc_node()
> fd7becedb1f0 treewide: Use array_size() in vzalloc_node()
> fad953ce0b22 treewide: Use array_size() in vzalloc()
> 42bc47b35320 treewide: Use array_size() in vmalloc()
> a86854d0c599 treewide: devm_kzalloc() -> devm_kcalloc()
> 3c4211ba8ad8 treewide: devm_kmalloc() -> devm_kmalloc_array()
> 778e1cdd81bb treewide: kvzalloc() -> kvcalloc()
> 344476e16acb treewide: kvmalloc() -> kvmalloc_array()
> 590b5b7d8671 treewide: kzalloc_node() -> kcalloc_node()
> 6396bb221514 treewide: kzalloc() -> kcalloc()
> 6da2ec56059c treewide: kmalloc() -> kmalloc_array()
> 
> The new patch is catching new open-coded multiplications introduced in v4.19.

As this is getting smaller, why not just break it up and do it through
all of the different subsystems instead of one large patch?

And do we have a way to add a rule to 0-day to catch these so that they
get a warning when they are added again?

thanks,

greg k-h

Re: [PATCH v4.19-rc7] treewide: Replace more open-coded allocation size multiplications

2018-10-05 Thread Greg KH

On Fri, Oct 05, 2018 at 05:04:16PM -0700, Kees Cook wrote:
> On Fri, Oct 5, 2018 at 4:51 PM, Greg KH  wrote:
> > On Fri, Oct 05, 2018 at 04:35:59PM -0700, Kees Cook wrote:
> >> As done treewide earlier, this catches several more open-coded
> >> allocation size calculations that were added to the kernel during the
> >> merge window. This performs the following mechanical transformations
> >> using Coccinelle:
> >>
> >>   kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
> >>   kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
> >>   devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)
> >>
> >> Signed-off-by: Kees Cook 
> >
> > Has this had any testing in linux-next?
> 
> No; they're mechanical transformations (though I did build test them).
> If you want I could add this to linux-next for a week?

That would be good, thanks.

> > And when was "earlier"?
> 
> v4.18, when all of these were originally eliminated:
> 
> 026f05079b00 treewide: Use array_size() in f2fs_kzalloc()
> c86065938aab treewide: Use array_size() in f2fs_kmalloc()
> 76e43e37a407 treewide: Use array_size() in sock_kmalloc()
> 84ca176bf54a treewide: Use array_size() in kvzalloc_node()
> fd7becedb1f0 treewide: Use array_size() in vzalloc_node()
> fad953ce0b22 treewide: Use array_size() in vzalloc()
> 42bc47b35320 treewide: Use array_size() in vmalloc()
> a86854d0c599 treewide: devm_kzalloc() -> devm_kcalloc()
> 3c4211ba8ad8 treewide: devm_kmalloc() -> devm_kmalloc_array()
> 778e1cdd81bb treewide: kvzalloc() -> kvcalloc()
> 344476e16acb treewide: kvmalloc() -> kvmalloc_array()
> 590b5b7d8671 treewide: kzalloc_node() -> kcalloc_node()
> 6396bb221514 treewide: kzalloc() -> kcalloc()
> 6da2ec56059c treewide: kmalloc() -> kmalloc_array()
> 
> The new patch is catching new open-coded multiplications introduced in v4.19.

As this is getting smaller, why not just break it up and do it through
all of the different subsystems instead of one large patch?

And do we have a way to add a rule to 0-day to catch these so that they
get a warning when they are added again?

thanks,

greg k-h

[PATCH v1 1/2] x86/cpufeature: Add facility to match microcode revisions

2018-10-05 Thread Andi Kleen

From: Andi Kleen 

For bug workarounds or checks it is useful to check for specific
microcode versions. Add a new table format to check for steppings
with min/max microcode revisions.

This does not change the existing x86_cpu_id because it's an ABI
shared with modutils, and also has quite difference requirements,
as in no wildcards, but everything has to be matched exactly.

Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/cpu_device_id.h | 22 ++
 arch/x86/kernel/cpu/match.c  | 43 
 2 files changed, 65 insertions(+)

diff --git a/arch/x86/include/asm/cpu_device_id.h 
b/arch/x86/include/asm/cpu_device_id.h
index baeba0567126..bfd5438c 100644
--- a/arch/x86/include/asm/cpu_device_id.h
+++ b/arch/x86/include/asm/cpu_device_id.h
@@ -11,4 +11,26 @@
 
 extern const struct x86_cpu_id *x86_match_cpu(const struct x86_cpu_id *match);
 
+/*
+ * Match specific microcodes or steppings.
+ *
+ * vendor/family/model/stepping must be all set.
+ * min_ucode/max_ucode/driver_data are optional and can be 0.
+ */
+
+struct x86_ucode_id {
+   __u16 vendor;
+   __u16 family;
+   __u16 model;
+   __u16 stepping;
+   __u32 min_ucode;
+   __u32 max_ucode;
+   kernel_ulong_t driver_data;
+};
+
+extern const struct x86_ucode_id *
+x86_match_ucode_cpu(int cpu, const struct x86_ucode_id *match);
+extern const struct x86_ucode_id *
+x86_match_ucode_all(const struct x86_ucode_id *match);
+
 #endif
diff --git a/arch/x86/kernel/cpu/match.c b/arch/x86/kernel/cpu/match.c
index 3fed38812eea..f29a21b2809c 100644
--- a/arch/x86/kernel/cpu/match.c
+++ b/arch/x86/kernel/cpu/match.c
@@ -48,3 +48,46 @@ const struct x86_cpu_id *x86_match_cpu(const struct 
x86_cpu_id *match)
return NULL;
 }
 EXPORT_SYMBOL(x86_match_cpu);
+
+const struct x86_ucode_id *x86_match_ucode_cpu(int cpu,
+  const struct x86_ucode_id *match)
+{
+   const struct x86_ucode_id *m;
+   struct cpuinfo_x86 *c = _data(cpu);
+
+   for (m = match; m->vendor | m->family | m->model; m++) {
+   if (c->x86_vendor != m->vendor)
+   continue;
+   if (c->x86 != m->family)
+   continue;
+   if (c->x86_model != m->model)
+   continue;
+   if (c->x86_stepping != m->stepping)
+   continue;
+   if (m->min_ucode && c->microcode < m->min_ucode)
+   continue;
+   if (m->max_ucode && c->microcode > m->max_ucode)
+   continue;
+   return m;
+   }
+   return NULL;
+}
+
+/* Check all CPUs */
+const struct x86_ucode_id *x86_match_ucode_all(const struct x86_ucode_id 
*match)
+{
+   int cpu;
+   const struct x86_ucode_id *all_m = NULL;
+   bool first = true;
+
+   for_each_online_cpu(cpu) {
+   const struct x86_ucode_id *m = x86_match_ucode_cpu(cpu, match);
+
+   if (first)
+   all_m = m;
+   else if (m != all_m)
+   return NULL;
+   first = false;
+   }
+   return all_m;
+}
-- 
2.17.1

[PATCH v1 1/2] x86/cpufeature: Add facility to match microcode revisions

2018-10-05 Thread Andi Kleen

From: Andi Kleen 

For bug workarounds or checks it is useful to check for specific
microcode versions. Add a new table format to check for steppings
with min/max microcode revisions.

This does not change the existing x86_cpu_id because it's an ABI
shared with modutils, and also has quite difference requirements,
as in no wildcards, but everything has to be matched exactly.

Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/cpu_device_id.h | 22 ++
 arch/x86/kernel/cpu/match.c  | 43 
 2 files changed, 65 insertions(+)

diff --git a/arch/x86/include/asm/cpu_device_id.h 
b/arch/x86/include/asm/cpu_device_id.h
index baeba0567126..bfd5438c 100644
--- a/arch/x86/include/asm/cpu_device_id.h
+++ b/arch/x86/include/asm/cpu_device_id.h
@@ -11,4 +11,26 @@
 
 extern const struct x86_cpu_id *x86_match_cpu(const struct x86_cpu_id *match);
 
+/*
+ * Match specific microcodes or steppings.
+ *
+ * vendor/family/model/stepping must be all set.
+ * min_ucode/max_ucode/driver_data are optional and can be 0.
+ */
+
+struct x86_ucode_id {
+   __u16 vendor;
+   __u16 family;
+   __u16 model;
+   __u16 stepping;
+   __u32 min_ucode;
+   __u32 max_ucode;
+   kernel_ulong_t driver_data;
+};
+
+extern const struct x86_ucode_id *
+x86_match_ucode_cpu(int cpu, const struct x86_ucode_id *match);
+extern const struct x86_ucode_id *
+x86_match_ucode_all(const struct x86_ucode_id *match);
+
 #endif
diff --git a/arch/x86/kernel/cpu/match.c b/arch/x86/kernel/cpu/match.c
index 3fed38812eea..f29a21b2809c 100644
--- a/arch/x86/kernel/cpu/match.c
+++ b/arch/x86/kernel/cpu/match.c
@@ -48,3 +48,46 @@ const struct x86_cpu_id *x86_match_cpu(const struct 
x86_cpu_id *match)
return NULL;
 }
 EXPORT_SYMBOL(x86_match_cpu);
+
+const struct x86_ucode_id *x86_match_ucode_cpu(int cpu,
+  const struct x86_ucode_id *match)
+{
+   const struct x86_ucode_id *m;
+   struct cpuinfo_x86 *c = _data(cpu);
+
+   for (m = match; m->vendor | m->family | m->model; m++) {
+   if (c->x86_vendor != m->vendor)
+   continue;
+   if (c->x86 != m->family)
+   continue;
+   if (c->x86_model != m->model)
+   continue;
+   if (c->x86_stepping != m->stepping)
+   continue;
+   if (m->min_ucode && c->microcode < m->min_ucode)
+   continue;
+   if (m->max_ucode && c->microcode > m->max_ucode)
+   continue;
+   return m;
+   }
+   return NULL;
+}
+
+/* Check all CPUs */
+const struct x86_ucode_id *x86_match_ucode_all(const struct x86_ucode_id 
*match)
+{
+   int cpu;
+   const struct x86_ucode_id *all_m = NULL;
+   bool first = true;
+
+   for_each_online_cpu(cpu) {
+   const struct x86_ucode_id *m = x86_match_ucode_cpu(cpu, match);
+
+   if (first)
+   all_m = m;
+   else if (m != all_m)
+   return NULL;
+   first = false;
+   }
+   return all_m;
+}
-- 
2.17.1

Re: [PATCH v4.19-rc7] treewide: Replace more open-coded allocation size multiplications

2018-10-05 Thread Kees Cook

On Fri, Oct 5, 2018 at 4:51 PM, Greg KH  wrote:
> On Fri, Oct 05, 2018 at 04:35:59PM -0700, Kees Cook wrote:
>> As done treewide earlier, this catches several more open-coded
>> allocation size calculations that were added to the kernel during the
>> merge window. This performs the following mechanical transformations
>> using Coccinelle:
>>
>>   kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
>>   kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
>>   devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)
>>
>> Signed-off-by: Kees Cook 
>
> Has this had any testing in linux-next?

No; they're mechanical transformations (though I did build test them).
If you want I could add this to linux-next for a week?

> And when was "earlier"?

v4.18, when all of these were originally eliminated:

026f05079b00 treewide: Use array_size() in f2fs_kzalloc()
c86065938aab treewide: Use array_size() in f2fs_kmalloc()
76e43e37a407 treewide: Use array_size() in sock_kmalloc()
84ca176bf54a treewide: Use array_size() in kvzalloc_node()
fd7becedb1f0 treewide: Use array_size() in vzalloc_node()
fad953ce0b22 treewide: Use array_size() in vzalloc()
42bc47b35320 treewide: Use array_size() in vmalloc()
a86854d0c599 treewide: devm_kzalloc() -> devm_kcalloc()
3c4211ba8ad8 treewide: devm_kmalloc() -> devm_kmalloc_array()
778e1cdd81bb treewide: kvzalloc() -> kvcalloc()
344476e16acb treewide: kvmalloc() -> kvmalloc_array()
590b5b7d8671 treewide: kzalloc_node() -> kcalloc_node()
6396bb221514 treewide: kzalloc() -> kcalloc()
6da2ec56059c treewide: kmalloc() -> kmalloc_array()

The new patch is catching new open-coded multiplications introduced in v4.19.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v4.19-rc7] treewide: Replace more open-coded allocation size multiplications

2018-10-05 Thread Kees Cook

On Fri, Oct 5, 2018 at 4:51 PM, Greg KH  wrote:
> On Fri, Oct 05, 2018 at 04:35:59PM -0700, Kees Cook wrote:
>> As done treewide earlier, this catches several more open-coded
>> allocation size calculations that were added to the kernel during the
>> merge window. This performs the following mechanical transformations
>> using Coccinelle:
>>
>>   kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
>>   kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
>>   devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)
>>
>> Signed-off-by: Kees Cook 
>
> Has this had any testing in linux-next?

No; they're mechanical transformations (though I did build test them).
If you want I could add this to linux-next for a week?

> And when was "earlier"?

v4.18, when all of these were originally eliminated:

026f05079b00 treewide: Use array_size() in f2fs_kzalloc()
c86065938aab treewide: Use array_size() in f2fs_kmalloc()
76e43e37a407 treewide: Use array_size() in sock_kmalloc()
84ca176bf54a treewide: Use array_size() in kvzalloc_node()
fd7becedb1f0 treewide: Use array_size() in vzalloc_node()
fad953ce0b22 treewide: Use array_size() in vzalloc()
42bc47b35320 treewide: Use array_size() in vmalloc()
a86854d0c599 treewide: devm_kzalloc() -> devm_kcalloc()
3c4211ba8ad8 treewide: devm_kmalloc() -> devm_kmalloc_array()
778e1cdd81bb treewide: kvzalloc() -> kvcalloc()
344476e16acb treewide: kvmalloc() -> kvmalloc_array()
590b5b7d8671 treewide: kzalloc_node() -> kcalloc_node()
6396bb221514 treewide: kzalloc() -> kcalloc()
6da2ec56059c treewide: kmalloc() -> kmalloc_array()

The new patch is catching new open-coded multiplications introduced in v4.19.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v2 2/3] mm: introduce put_user_page[s](), placeholder versions

2018-10-05 Thread John Hubbard

On 10/5/18 2:48 PM, Jason Gunthorpe wrote:
> On Fri, Oct 05, 2018 at 12:49:06PM -0700, John Hubbard wrote:
>> On 10/5/18 8:17 AM, Jason Gunthorpe wrote:
>>> On Thu, Oct 04, 2018 at 09:02:24PM -0700, john.hubb...@gmail.com wrote:
 From: John Hubbard 

 Introduces put_user_page(), which simply calls put_page().
 This provides a way to update all get_user_pages*() callers,
 so that they call put_user_page(), instead of put_page().

 Also introduces put_user_pages(), and a few dirty/locked variations,
 as a replacement for release_pages(), for the same reasons.
 These may be used for subsequent performance improvements,
 via batching of pages to be released.

 This prepares for eventually fixing the problem described
 in [1], and is following a plan listed in [2], [3], [4].

 [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

 [2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubb...@nvidia.com
 Proposed steps for fixing get_user_pages() + DMA problems.

 [3]https://lkml.kernel.org/r/20180710082100.mkdwngdv5kkrc...@quack2.suse.cz
 Bounce buffers (otherwise [2] is not really viable).

 [4] https://lkml.kernel.org/r/20181003162115.gg24...@quack2.suse.cz
 Follow-up discussions.

>> [...]

 +/* Placeholder version, until all get_user_pages*() callers are updated. 
 */
 +static inline void put_user_page(struct page *page)
 +{
 +  put_page(page);
 +}
 +
 +/* For get_user_pages*()-pinned pages, use these variants instead of
 + * release_pages():
 + */
 +static inline void put_user_pages_dirty(struct page **pages,
 +  unsigned long npages)
 +{
 +  while (npages) {
 +  set_page_dirty(pages[npages]);
 +  put_user_page(pages[npages]);
 +  --npages;
 +  }
 +}
>>>
>>> Shouldn't these do the !PageDirty(page) thing?
>>>
>>
>> Well, not yet. This is the "placeholder" patch, in which I planned to keep
>> the behavior the same, while I go to all the get_user_pages call sites and 
>> change 
>> put_page() and release_pages() over to use these new routines.
> 
> Hmm.. Well, if it is the right thing to do here, why not include it and
> take it out of callers when doing the conversion?
> 
> If it is the wrong thing, then let us still take it out of callers
> when doing the conversion :)
> 
> Just seems like things will be in a better place to make future
> changes if all the call sights are de-duplicated and correct.
> 

OK, yes. Let me send out a v3 with that included, then.

thanks,
-- 
John Hubbard
NVIDIA

Re: [PATCH v2 2/3] mm: introduce put_user_page[s](), placeholder versions

2018-10-05 Thread John Hubbard

On 10/5/18 2:48 PM, Jason Gunthorpe wrote:
> On Fri, Oct 05, 2018 at 12:49:06PM -0700, John Hubbard wrote:
>> On 10/5/18 8:17 AM, Jason Gunthorpe wrote:
>>> On Thu, Oct 04, 2018 at 09:02:24PM -0700, john.hubb...@gmail.com wrote:
 From: John Hubbard 

 Introduces put_user_page(), which simply calls put_page().
 This provides a way to update all get_user_pages*() callers,
 so that they call put_user_page(), instead of put_page().

 Also introduces put_user_pages(), and a few dirty/locked variations,
 as a replacement for release_pages(), for the same reasons.
 These may be used for subsequent performance improvements,
 via batching of pages to be released.

 This prepares for eventually fixing the problem described
 in [1], and is following a plan listed in [2], [3], [4].

 [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

 [2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubb...@nvidia.com
 Proposed steps for fixing get_user_pages() + DMA problems.

 [3]https://lkml.kernel.org/r/20180710082100.mkdwngdv5kkrc...@quack2.suse.cz
 Bounce buffers (otherwise [2] is not really viable).

 [4] https://lkml.kernel.org/r/20181003162115.gg24...@quack2.suse.cz
 Follow-up discussions.

>> [...]

 +/* Placeholder version, until all get_user_pages*() callers are updated. 
 */
 +static inline void put_user_page(struct page *page)
 +{
 +  put_page(page);
 +}
 +
 +/* For get_user_pages*()-pinned pages, use these variants instead of
 + * release_pages():
 + */
 +static inline void put_user_pages_dirty(struct page **pages,
 +  unsigned long npages)
 +{
 +  while (npages) {
 +  set_page_dirty(pages[npages]);
 +  put_user_page(pages[npages]);
 +  --npages;
 +  }
 +}
>>>
>>> Shouldn't these do the !PageDirty(page) thing?
>>>
>>
>> Well, not yet. This is the "placeholder" patch, in which I planned to keep
>> the behavior the same, while I go to all the get_user_pages call sites and 
>> change 
>> put_page() and release_pages() over to use these new routines.
> 
> Hmm.. Well, if it is the right thing to do here, why not include it and
> take it out of callers when doing the conversion?
> 
> If it is the wrong thing, then let us still take it out of callers
> when doing the conversion :)
> 
> Just seems like things will be in a better place to make future
> changes if all the call sights are de-duplicated and correct.
> 

OK, yes. Let me send out a v3 with that included, then.

thanks,
-- 
John Hubbard
NVIDIA

Re: [PATCH v4.19-rc7] treewide: Replace more open-coded allocation size multiplications

2018-10-05 Thread Greg KH

On Fri, Oct 05, 2018 at 04:35:59PM -0700, Kees Cook wrote:
> As done treewide earlier, this catches several more open-coded
> allocation size calculations that were added to the kernel during the
> merge window. This performs the following mechanical transformations
> using Coccinelle:
> 
>   kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
>   kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
>   devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)
> 
> Signed-off-by: Kees Cook 

Has this had any testing in linux-next?

And when was "earlier"?

thanks,

greg k-h

Re: [PATCH v4.19-rc7] treewide: Replace more open-coded allocation size multiplications

2018-10-05 Thread Greg KH

On Fri, Oct 05, 2018 at 04:35:59PM -0700, Kees Cook wrote:
> As done treewide earlier, this catches several more open-coded
> allocation size calculations that were added to the kernel during the
> merge window. This performs the following mechanical transformations
> using Coccinelle:
> 
>   kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
>   kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
>   devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)
> 
> Signed-off-by: Kees Cook 

Has this had any testing in linux-next?

And when was "earlier"?

thanks,

greg k-h

Re: [PATCH v2 05/11] arch/x86: Introduce a new config parameter PLATFORM_QOS

2018-10-05 Thread Fenghua Yu

On Fri, Oct 05, 2018 at 08:55:52PM +, Moger, Babu wrote:
> Introduces a new config parameter PLATFORM_QOS.
> 
> This will be used as a common config parameter for both Intel and AMD.
> Each vendor will have their own config parameter to enable RDT feature.
> One for Intel(INTEL_RDT) and one for AMD(AMD_QOS). It can be enabled or
> disabled separately. The new parameter PLATFORM_QOS will be dependent
> on INTEL_RDT or AMD_QOS.
> 
> Signed-off-by: Babu Moger 
> ---
>  arch/x86/Kconfig | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 1a0be022f91d..7f2da780a327 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -458,6 +458,10 @@ config INTEL_RDT
>  
> Say N if unsure.
>  
> +config PLATFORM_QOS
> + def_bool y
> + depends on X86 && INTEL_RDT
> +

Can change "PLATFORM_QOS" to a more neutral name "RESCTRL"?

Thanks.

-Fenghua

Re: [PATCH v2 05/11] arch/x86: Introduce a new config parameter PLATFORM_QOS

2018-10-05 Thread Fenghua Yu

On Fri, Oct 05, 2018 at 08:55:52PM +, Moger, Babu wrote:
> Introduces a new config parameter PLATFORM_QOS.
> 
> This will be used as a common config parameter for both Intel and AMD.
> Each vendor will have their own config parameter to enable RDT feature.
> One for Intel(INTEL_RDT) and one for AMD(AMD_QOS). It can be enabled or
> disabled separately. The new parameter PLATFORM_QOS will be dependent
> on INTEL_RDT or AMD_QOS.
> 
> Signed-off-by: Babu Moger 
> ---
>  arch/x86/Kconfig | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 1a0be022f91d..7f2da780a327 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -458,6 +458,10 @@ config INTEL_RDT
>  
> Say N if unsure.
>  
> +config PLATFORM_QOS
> + def_bool y
> + depends on X86 && INTEL_RDT
> +

Can change "PLATFORM_QOS" to a more neutral name "RESCTRL"?

Thanks.

-Fenghua

Re: [PATCH] writeback: fix range_cyclic writeback vs writepages deadlock

2018-10-05 Thread Dave Chinner

On Fri, Oct 05, 2018 at 12:46:40PM -0700, Andrew Morton wrote:
> On Fri,  5 Oct 2018 15:45:26 +1000 Dave Chinner  wrote:
> 
> > From: Dave Chinner 
> > 
> > We've recently seen a workload on XFS filesystems with a repeatable
> > deadlock between background writeback and a multi-process
> > application doing concurrent writes and fsyncs to a small range of a
> > file.
> > 
> > ...
> > 
> > Signed-off-by: Dave Chinner 
> > Reviewed-by: Jan Kara 
> 
> Not a serious enough problem for a -stable backport?

Don't have enough evidence to say one way or another. The reported
incident was from a RHEL 7 kernel, so the bug has been there for
years in one form or another, but it's only ever been triggered by
this one-off custom workload.

I haven't done any analysis on older kernels, nor have I looked to see
if there's any gotchas that a stable backport might encounter. And I
tend not to change stuff in a path that is critical to data integrity
without at least doing enough due diligence to suggest a stable
backport would be fine.

You can mark it for stable backports if you want, but I'm not
prepared to because I haven't done the work necessary to ensure it's
safe to do so.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [PATCH] writeback: fix range_cyclic writeback vs writepages deadlock

2018-10-05 Thread Dave Chinner

On Fri, Oct 05, 2018 at 12:46:40PM -0700, Andrew Morton wrote:
> On Fri,  5 Oct 2018 15:45:26 +1000 Dave Chinner  wrote:
> 
> > From: Dave Chinner 
> > 
> > We've recently seen a workload on XFS filesystems with a repeatable
> > deadlock between background writeback and a multi-process
> > application doing concurrent writes and fsyncs to a small range of a
> > file.
> > 
> > ...
> > 
> > Signed-off-by: Dave Chinner 
> > Reviewed-by: Jan Kara 
> 
> Not a serious enough problem for a -stable backport?

Don't have enough evidence to say one way or another. The reported
incident was from a RHEL 7 kernel, so the bug has been there for
years in one form or another, but it's only ever been triggered by
this one-off custom workload.

I haven't done any analysis on older kernels, nor have I looked to see
if there's any gotchas that a stable backport might encounter. And I
tend not to change stuff in a path that is critical to data integrity
without at least doing enough due diligence to suggest a stable
backport would be fine.

You can mark it for stable backports if you want, but I'm not
prepared to because I haven't done the work necessary to ensure it's
safe to do so.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

[PATCH v4.19-rc7] treewide: Replace more open-coded allocation size multiplications

2018-10-05 Thread Kees Cook

As done treewide earlier, this catches several more open-coded
allocation size calculations that were added to the kernel during the
merge window. This performs the following mechanical transformations
using Coccinelle:

kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)

Signed-off-by: Kees Cook 
---
 drivers/bluetooth/hci_qca.c |  2 +-
 drivers/crypto/inside-secure/safexcel.c |  8 +---
 drivers/gpu/drm/mediatek/mtk_drm_crtc.c |  2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c |  4 ++--
 drivers/hwmon/npcm750-pwm-fan.c |  2 +-
 drivers/md/dm-integrity.c   |  3 ++-
 drivers/net/wireless/mediatek/mt76/usb.c| 10 +-
 drivers/pci/controller/pcie-cadence.c   |  4 ++--
 drivers/tty/serial/qcom_geni_serial.c   |  4 ++--
 net/sched/sch_cake.c|  2 +-
 10 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index e182f6019f68..2fee65886d50 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -1322,7 +1322,7 @@ static int qca_init_regulators(struct qca_power *qca,
 {
int i;
 
-   qca->vreg_bulk = devm_kzalloc(qca->dev, num_vregs *
+   qca->vreg_bulk = devm_kcalloc(qca->dev, num_vregs,
  sizeof(struct regulator_bulk_data),
  GFP_KERNEL);
if (!qca->vreg_bulk)
diff --git a/drivers/crypto/inside-secure/safexcel.c 
b/drivers/crypto/inside-secure/safexcel.c
index 7e71043457a6..86c699c14f84 100644
--- a/drivers/crypto/inside-secure/safexcel.c
+++ b/drivers/crypto/inside-secure/safexcel.c
@@ -1044,7 +1044,8 @@ static int safexcel_probe(struct platform_device *pdev)
 
safexcel_configure(priv);
 
-   priv->ring = devm_kzalloc(dev, priv->config.rings * sizeof(*priv->ring),
+   priv->ring = devm_kcalloc(dev, priv->config.rings,
+ sizeof(*priv->ring),
  GFP_KERNEL);
if (!priv->ring) {
ret = -ENOMEM;
@@ -1063,8 +1064,9 @@ static int safexcel_probe(struct platform_device *pdev)
if (ret)
goto err_reg_clk;
 
-   priv->ring[i].rdr_req = devm_kzalloc(dev,
-   sizeof(priv->ring[i].rdr_req) * 
EIP197_DEFAULT_RING_SIZE,
+   priv->ring[i].rdr_req = devm_kcalloc(dev,
+   EIP197_DEFAULT_RING_SIZE,
+   sizeof(priv->ring[i].rdr_req),
GFP_KERNEL);
if (!priv->ring[i].rdr_req) {
ret = -ENOMEM;
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c 
b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
index 0b976dfd04df..92ecb9bf982c 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
@@ -600,7 +600,7 @@ int mtk_drm_crtc_create(struct drm_device *drm_dev,
}
 
mtk_crtc->layer_nr = mtk_ddp_comp_layer_nr(mtk_crtc->ddp_comp[0]);
-   mtk_crtc->planes = devm_kzalloc(dev, mtk_crtc->layer_nr *
+   mtk_crtc->planes = devm_kcalloc(dev, mtk_crtc->layer_nr,
sizeof(struct drm_plane),
GFP_KERNEL);
 
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c
index 790d39f816dc..b557687b1964 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c
@@ -153,8 +153,8 @@ int msm_dss_parse_clock(struct platform_device *pdev,
return 0;
}
 
-   mp->clk_config = devm_kzalloc(>dev,
- sizeof(struct dss_clk) * num_clk,
+   mp->clk_config = devm_kcalloc(>dev,
+ num_clk, sizeof(struct dss_clk),
  GFP_KERNEL);
if (!mp->clk_config)
return -ENOMEM;
diff --git a/drivers/hwmon/npcm750-pwm-fan.c b/drivers/hwmon/npcm750-pwm-fan.c
index 8474d601aa63..b998f9fbed41 100644
--- a/drivers/hwmon/npcm750-pwm-fan.c
+++ b/drivers/hwmon/npcm750-pwm-fan.c
@@ -908,7 +908,7 @@ static int npcm7xx_en_pwm_fan(struct device *dev,
if (fan_cnt < 1)
return -EINVAL;
 
-   fan_ch = devm_kzalloc(dev, sizeof(*fan_ch) * fan_cnt, GFP_KERNEL);
+   fan_ch = devm_kcalloc(dev, fan_cnt, sizeof(*fan_ch), GFP_KERNEL);
if (!fan_ch)
return -ENOMEM;
 
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index 89ccb64342de..e1fa6baf4e8e 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -3462,7 +3462,8 @@ static int dm_integrity_ctr(struct dm_target *ti, 
unsigned argc, char **argv)
r = -ENOMEM;

[PATCH v4.19-rc7] treewide: Replace more open-coded allocation size multiplications

2018-10-05 Thread Kees Cook

As done treewide earlier, this catches several more open-coded
allocation size calculations that were added to the kernel during the
merge window. This performs the following mechanical transformations
using Coccinelle:

kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)

Signed-off-by: Kees Cook 
---
 drivers/bluetooth/hci_qca.c |  2 +-
 drivers/crypto/inside-secure/safexcel.c |  8 +---
 drivers/gpu/drm/mediatek/mtk_drm_crtc.c |  2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c |  4 ++--
 drivers/hwmon/npcm750-pwm-fan.c |  2 +-
 drivers/md/dm-integrity.c   |  3 ++-
 drivers/net/wireless/mediatek/mt76/usb.c| 10 +-
 drivers/pci/controller/pcie-cadence.c   |  4 ++--
 drivers/tty/serial/qcom_geni_serial.c   |  4 ++--
 net/sched/sch_cake.c|  2 +-
 10 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index e182f6019f68..2fee65886d50 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -1322,7 +1322,7 @@ static int qca_init_regulators(struct qca_power *qca,
 {
int i;
 
-   qca->vreg_bulk = devm_kzalloc(qca->dev, num_vregs *
+   qca->vreg_bulk = devm_kcalloc(qca->dev, num_vregs,
  sizeof(struct regulator_bulk_data),
  GFP_KERNEL);
if (!qca->vreg_bulk)
diff --git a/drivers/crypto/inside-secure/safexcel.c 
b/drivers/crypto/inside-secure/safexcel.c
index 7e71043457a6..86c699c14f84 100644
--- a/drivers/crypto/inside-secure/safexcel.c
+++ b/drivers/crypto/inside-secure/safexcel.c
@@ -1044,7 +1044,8 @@ static int safexcel_probe(struct platform_device *pdev)
 
safexcel_configure(priv);
 
-   priv->ring = devm_kzalloc(dev, priv->config.rings * sizeof(*priv->ring),
+   priv->ring = devm_kcalloc(dev, priv->config.rings,
+ sizeof(*priv->ring),
  GFP_KERNEL);
if (!priv->ring) {
ret = -ENOMEM;
@@ -1063,8 +1064,9 @@ static int safexcel_probe(struct platform_device *pdev)
if (ret)
goto err_reg_clk;
 
-   priv->ring[i].rdr_req = devm_kzalloc(dev,
-   sizeof(priv->ring[i].rdr_req) * 
EIP197_DEFAULT_RING_SIZE,
+   priv->ring[i].rdr_req = devm_kcalloc(dev,
+   EIP197_DEFAULT_RING_SIZE,
+   sizeof(priv->ring[i].rdr_req),
GFP_KERNEL);
if (!priv->ring[i].rdr_req) {
ret = -ENOMEM;
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c 
b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
index 0b976dfd04df..92ecb9bf982c 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
@@ -600,7 +600,7 @@ int mtk_drm_crtc_create(struct drm_device *drm_dev,
}
 
mtk_crtc->layer_nr = mtk_ddp_comp_layer_nr(mtk_crtc->ddp_comp[0]);
-   mtk_crtc->planes = devm_kzalloc(dev, mtk_crtc->layer_nr *
+   mtk_crtc->planes = devm_kcalloc(dev, mtk_crtc->layer_nr,
sizeof(struct drm_plane),
GFP_KERNEL);
 
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c
index 790d39f816dc..b557687b1964 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c
@@ -153,8 +153,8 @@ int msm_dss_parse_clock(struct platform_device *pdev,
return 0;
}
 
-   mp->clk_config = devm_kzalloc(>dev,
- sizeof(struct dss_clk) * num_clk,
+   mp->clk_config = devm_kcalloc(>dev,
+ num_clk, sizeof(struct dss_clk),
  GFP_KERNEL);
if (!mp->clk_config)
return -ENOMEM;
diff --git a/drivers/hwmon/npcm750-pwm-fan.c b/drivers/hwmon/npcm750-pwm-fan.c
index 8474d601aa63..b998f9fbed41 100644
--- a/drivers/hwmon/npcm750-pwm-fan.c
+++ b/drivers/hwmon/npcm750-pwm-fan.c
@@ -908,7 +908,7 @@ static int npcm7xx_en_pwm_fan(struct device *dev,
if (fan_cnt < 1)
return -EINVAL;
 
-   fan_ch = devm_kzalloc(dev, sizeof(*fan_ch) * fan_cnt, GFP_KERNEL);
+   fan_ch = devm_kcalloc(dev, fan_cnt, sizeof(*fan_ch), GFP_KERNEL);
if (!fan_ch)
return -ENOMEM;
 
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index 89ccb64342de..e1fa6baf4e8e 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -3462,7 +3462,8 @@ static int dm_integrity_ctr(struct dm_target *ti, 
unsigned argc, char **argv)
r = -ENOMEM;

Re: [GIT PULL] PCI fixes for v4.19

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 03:47:30PM -0500, Bjorn Helgaas wrote:
> PCI fixes:
> 
>   - Reprogram bridge prefetch registers to fix NVIDIA and Radeon issues
> after suspend/resume (Daniel Drake)
> 
>   - Fix mvebu I/O mapping creation sequence (Thomas Petazzoni)
> 
>   - Fix minor MAINTAINERS file match issue (Bjorn Helgaas)
> 

Now merged, thanks.

greg k-h

Re: [GIT PULL] PCI fixes for v4.19

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 03:47:30PM -0500, Bjorn Helgaas wrote:
> PCI fixes:
> 
>   - Reprogram bridge prefetch registers to fix NVIDIA and Radeon issues
> after suspend/resume (Daniel Drake)
> 
>   - Fix mvebu I/O mapping creation sequence (Thomas Petazzoni)
> 
>   - Fix minor MAINTAINERS file match issue (Bjorn Helgaas)
> 

Now merged, thanks.

greg k-h

Re: [GIT PULL] GPIO fix for v4.19

2018-10-05 Thread Greg KH

On Fri, Oct 05, 2018 at 01:05:59PM +0200, Linus Walleij wrote:
> Hi Greg,
> 
> here is a single and hopefully final GPIO fix for the v4.19 series.
> Details in the signed tag.
> 
> Please pull it in!

Now merged, thanks.

greg k-h

Re: [GIT PULL] GPIO fix for v4.19

2018-10-05 Thread Greg KH

On Fri, Oct 05, 2018 at 01:05:59PM +0200, Linus Walleij wrote:
> Hi Greg,
> 
> here is a single and hopefully final GPIO fix for the v4.19 series.
> Details in the signed tag.
> 
> Please pull it in!

Now merged, thanks.

greg k-h

Re: [GIT PULL] Power management fix for v4.19-rc7

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 11:59:09AM +0200, Rafael J. Wysocki wrote:
> Hi Greg,
> 
> Please pull from the tag
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
>  pm-4.19-rc7

Now merged, thanks.

greg k-h

Re: [GIT PULL] perf fixes

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 11:55:24AM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar  wrote:
> 
> > Linus,
> 
> ... and Greg as well!! ;-)

Heh, not a big deal :)

Now merged, thanks.

greg k-h

Re: [GIT PULL] Power management fix for v4.19-rc7

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 11:59:09AM +0200, Rafael J. Wysocki wrote:
> Hi Greg,
> 
> Please pull from the tag
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
>  pm-4.19-rc7

Now merged, thanks.

greg k-h

Re: [GIT PULL] perf fixes

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 11:55:24AM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar  wrote:
> 
> > Linus,
> 
> ... and Greg as well!! ;-)

Heh, not a big deal :)

Now merged, thanks.

greg k-h

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread Andrea Arcangeli

Hi,

On Fri, Oct 05, 2018 at 01:35:15PM -0700, David Rientjes wrote:
> Why is it ever appropriate to do heavy reclaim and swap activity to 
> allocate a transparent hugepage?  This is exactly what the __GFP_NORETRY 
> check for high-order allocations is attempting to avoid, and it explicitly 
> states that it is for thp faults.  The fact that we lost __GFP_NORERY for 
> thp allocations for all settings, including the default setting, other 
> than yours (setting of "always") is what I'm focusing on.  There is no 
> guarantee that this activity will free an entire pageblock or that it is 
> even worthwhile.

I tried to add just __GFP_NORETRY but it changes nothing. Try it
yourself if you think that can resolve the swap storm and excessive
reclaim CPU overhead... and see if it works. I didn't intend to
reinvent the wheel with __GFP_COMPACT_ONLY, if __GFP_NORETRY would
have worked. I tried adding __GFP_NORETRY first of course.

Reason why it doesn't help is: compaction fails because not enough
free RAM, reclaim is invoked, compaction succeeds, THP is allocated to
your lib user, compaction fails because not enough free RAM, reclaim
is invoked etc.. compact_result is not COMPACT_DEFERRED, but
COMPACT_SKIPPED.

See the part "reclaim is invoked" (with __GFP_THISNODE), is enough to
still create the same heavy swap storm and unfairly penalize all apps
with memory allocated in the local node like if your library had
actually the kernel privilege to run mbind or mlock, which is not ok.

Only __GFP_COMPACT_ONLY truly can avoid reclaim, the moment reclaim
can run with __GFP_THISNODE set, all bets are off and we're back to
square one, no difference (at best marginal difference) with
__GFP_NORETRY being set.

> That aside, removing __GFP_THISNODE can make the fault latency much worse 
> if remote notes are fragmented and/or reclaim has the inability to free 
> contiguous memory, which it likely cannot.  This is where I measured over 
> 40% fault latency regression from Linus's tree with this patch on a 
> fragmnented system where order-9 memory is neither available from node 0 
> or node 1 on Haswell.

Discussing the drawbacks of removing __GFP_THISNODE is an orthogonal
topic. __GFP_COMPACT_ONLY approach didn't have any of those drawbacks
about the remote latency because __GFP_THISNODE was still set at all
times, just as you like it. You seem to think __GFP_NORETRY will work
as well as __GFP_COMPACT_ONLY but it doesn't.

Calling compaction (and only compaction!) with __GFP_THISNODE set
doesn't break anything and that was what __GFP_COMPACT_ONLY was about.

> The behavior that MADV_HUGEPAGE specifies is certainly not clearly 
> defined, unfortunately.  The way that an application writer may read it, 
> as we have, is that it will make a stronger attempt at allocating a 
> hugepage at fault.  This actually works quite well when the allocation 
> correctly has __GFP_NORETRY, as it's supposed to, and compaction is 
> MIGRATE_ASYNC.

Like Mel said, your app just happens to fit in a local node, if the
user of the lib is slightly different and allocates 16G on a system
where each node is 4G, the post-fix MADV_HUGEPAGE will perform
extremely better also for the lib user.

And you know, if the lib user fits in one node, it can use mbind and
it won't hit OOM... and you'd need some capability giving the app
privilege anyway to keep MADV_HUGEPAGE as deep and unfair to the rest
of the processes running the local node (like mbind and mlock require
too).

Could you just run a test with the special lib and allocate 4 times
the size of a node, and see how the lib performs with upstream and
upstream+fix? Feel free to add __GFP_NORETRY anywhere you like in the
test of the upstream without fix.

The only constraint I would ask for the test (if the app using the lib
is not a massively multithreaded app, like qemu is, and you just
intend to run malloc(SIZEOFNODE*4); memset) is to run the app-lib
under "taskset -c 0". Otherwise NUMA balancing could move the the CPU
next to the last memory touched, which couldn't be done if each thread
accesses all ram at random from all 4 nodes at the same time (which is
a totally legitimate workload too and must not hit the "pathological
THP allocation performance").

> removed in a thp allocation.  I don't think anybody in this thread wants 
> 14% remote access latency regression if we allocate remotely or 40% fault 
> latency regression when remote nodes are fragmented as well.

Did you try the __GFP_COMPACT_ONLY patch? That won't have the 40%
fault latency already.

Also you're underestimating the benefit of THP given from remote nodes
for virt a bit, the 40% fault latency is not an issue when the
allocation is long lived, which is what MADV_HUGEPAGE is telling the
kernel, and the benefit of THP for guest is multiplied. It's more a
feature than a bug that 40% fault latency with MADV_HUGEPAGE set at
least for all long lived allocations (but if the allocations aren't
long lived, why should MADV_HUGEPAGE

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread Andrea Arcangeli

Hi,

On Fri, Oct 05, 2018 at 01:35:15PM -0700, David Rientjes wrote:
> Why is it ever appropriate to do heavy reclaim and swap activity to 
> allocate a transparent hugepage?  This is exactly what the __GFP_NORETRY 
> check for high-order allocations is attempting to avoid, and it explicitly 
> states that it is for thp faults.  The fact that we lost __GFP_NORERY for 
> thp allocations for all settings, including the default setting, other 
> than yours (setting of "always") is what I'm focusing on.  There is no 
> guarantee that this activity will free an entire pageblock or that it is 
> even worthwhile.

I tried to add just __GFP_NORETRY but it changes nothing. Try it
yourself if you think that can resolve the swap storm and excessive
reclaim CPU overhead... and see if it works. I didn't intend to
reinvent the wheel with __GFP_COMPACT_ONLY, if __GFP_NORETRY would
have worked. I tried adding __GFP_NORETRY first of course.

Reason why it doesn't help is: compaction fails because not enough
free RAM, reclaim is invoked, compaction succeeds, THP is allocated to
your lib user, compaction fails because not enough free RAM, reclaim
is invoked etc.. compact_result is not COMPACT_DEFERRED, but
COMPACT_SKIPPED.

See the part "reclaim is invoked" (with __GFP_THISNODE), is enough to
still create the same heavy swap storm and unfairly penalize all apps
with memory allocated in the local node like if your library had
actually the kernel privilege to run mbind or mlock, which is not ok.

Only __GFP_COMPACT_ONLY truly can avoid reclaim, the moment reclaim
can run with __GFP_THISNODE set, all bets are off and we're back to
square one, no difference (at best marginal difference) with
__GFP_NORETRY being set.

> That aside, removing __GFP_THISNODE can make the fault latency much worse 
> if remote notes are fragmented and/or reclaim has the inability to free 
> contiguous memory, which it likely cannot.  This is where I measured over 
> 40% fault latency regression from Linus's tree with this patch on a 
> fragmnented system where order-9 memory is neither available from node 0 
> or node 1 on Haswell.

Discussing the drawbacks of removing __GFP_THISNODE is an orthogonal
topic. __GFP_COMPACT_ONLY approach didn't have any of those drawbacks
about the remote latency because __GFP_THISNODE was still set at all
times, just as you like it. You seem to think __GFP_NORETRY will work
as well as __GFP_COMPACT_ONLY but it doesn't.

Calling compaction (and only compaction!) with __GFP_THISNODE set
doesn't break anything and that was what __GFP_COMPACT_ONLY was about.

> The behavior that MADV_HUGEPAGE specifies is certainly not clearly 
> defined, unfortunately.  The way that an application writer may read it, 
> as we have, is that it will make a stronger attempt at allocating a 
> hugepage at fault.  This actually works quite well when the allocation 
> correctly has __GFP_NORETRY, as it's supposed to, and compaction is 
> MIGRATE_ASYNC.

Like Mel said, your app just happens to fit in a local node, if the
user of the lib is slightly different and allocates 16G on a system
where each node is 4G, the post-fix MADV_HUGEPAGE will perform
extremely better also for the lib user.

And you know, if the lib user fits in one node, it can use mbind and
it won't hit OOM... and you'd need some capability giving the app
privilege anyway to keep MADV_HUGEPAGE as deep and unfair to the rest
of the processes running the local node (like mbind and mlock require
too).

Could you just run a test with the special lib and allocate 4 times
the size of a node, and see how the lib performs with upstream and
upstream+fix? Feel free to add __GFP_NORETRY anywhere you like in the
test of the upstream without fix.

The only constraint I would ask for the test (if the app using the lib
is not a massively multithreaded app, like qemu is, and you just
intend to run malloc(SIZEOFNODE*4); memset) is to run the app-lib
under "taskset -c 0". Otherwise NUMA balancing could move the the CPU
next to the last memory touched, which couldn't be done if each thread
accesses all ram at random from all 4 nodes at the same time (which is
a totally legitimate workload too and must not hit the "pathological
THP allocation performance").

> removed in a thp allocation.  I don't think anybody in this thread wants 
> 14% remote access latency regression if we allocate remotely or 40% fault 
> latency regression when remote nodes are fragmented as well.

Did you try the __GFP_COMPACT_ONLY patch? That won't have the 40%
fault latency already.

Also you're underestimating the benefit of THP given from remote nodes
for virt a bit, the 40% fault latency is not an issue when the
allocation is long lived, which is what MADV_HUGEPAGE is telling the
kernel, and the benefit of THP for guest is multiplied. It's more a
feature than a bug that 40% fault latency with MADV_HUGEPAGE set at
least for all long lived allocations (but if the allocations aren't
long lived, why should MADV_HUGEPAGE

Re: [GIT PULL] scheduler fixes

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 11:50:17AM +0200, Ingo Molnar wrote:
> Greg,
> 
> Please pull the latest sched-urgent-for-linus git tree from:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> sched-urgent-for-linus

Now merged, thanks.

greg k-h

Re: [GIT PULL] x86 fixes

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 11:53:54AM +0200, Ingo Molnar wrote:
> Greg,
> 
> Please pull the latest x86-urgent-for-linus git tree from:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> x86-urgent-for-linus

Now merged, thanks.

greg k-h

Re: [GIT PULL] scheduler fixes

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 11:50:17AM +0200, Ingo Molnar wrote:
> Greg,
> 
> Please pull the latest sched-urgent-for-linus git tree from:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> sched-urgent-for-linus

Now merged, thanks.

greg k-h

Re: [GIT PULL] x86 fixes

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 11:53:54AM +0200, Ingo Molnar wrote:
> Greg,
> 
> Please pull the latest x86-urgent-for-linus git tree from:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> x86-urgent-for-linus

Now merged, thanks.

greg k-h

Re: [GIT PULL] sound fixes for 4.19-rc7

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 11:51:17AM +0200, Takashi Iwai wrote:
> Greg,
> 
> please pull sound fixes for v4.19-rc7 from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git 
> tags/sound-4.19-rc7

Now merged, thanks.

greg k-h

Re: [GIT PULL] locking fixes

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 11:36:47AM +0200, Ingo Molnar wrote:
> Greg,
> 
> Please pull the latest locking-urgent-for-linus git tree from:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> locking-urgent-for-linus

Now merged, thanks.

greg k-h

Re: [GIT PULL] sound fixes for 4.19-rc7

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 11:51:17AM +0200, Takashi Iwai wrote:
> Greg,
> 
> please pull sound fixes for v4.19-rc7 from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git 
> tags/sound-4.19-rc7

Now merged, thanks.

greg k-h

Re: [GIT PULL] locking fixes

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 11:36:47AM +0200, Ingo Molnar wrote:
> Greg,
> 
> Please pull the latest locking-urgent-for-linus git tree from:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> locking-urgent-for-linus

Now merged, thanks.

greg k-h

[PATCH] staging/rtlwifi: Fixing formatting warnings from checkpatch.pl.

2018-10-05 Thread Scott Tracy

Fixing formatting warnings in rtlwifi found by checkpatch.pl
Changes include breaking up functions calls into multi line calls.
No functional/logical changes. 

Signed-off-by: Scott Tracy 
---
 drivers/staging/rtlwifi/core.c  | 5 +++--
 drivers/staging/rtlwifi/efuse.c | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rtlwifi/core.c b/drivers/staging/rtlwifi/core.c
index ca37f7511c4d..a36cb44a5388 100644
--- a/drivers/staging/rtlwifi/core.c
+++ b/drivers/staging/rtlwifi/core.c
@@ -1109,7 +1109,7 @@ static void rtl_op_bss_info_changed(struct ieee80211_hw 
*hw,
if (rtlpriv->dm.supp_phymode_switch) {
if (sta->ht_cap.ht_supported)
rtl_send_smps_action(hw, sta,
-
IEEE80211_SMPS_STATIC);
+   IEEE80211_SMPS_STATIC);
}
 
if (rtlhal->current_bandtype == BAND_ON_5G) {
@@ -1882,7 +1882,8 @@ bool rtl_hal_pwrseqcmdparsing(struct rtl_priv *rtlpriv, 
u8 cut_version,
return true;
default:
WARN_ONCE(true,
- "rtlwifi: %s(): Unknown CMD!!\n", 
__func__);
+"rtlwifi: %s(): Unknown CMD!!\n",
+ __func__);
break;
}
}
diff --git a/drivers/staging/rtlwifi/efuse.c b/drivers/staging/rtlwifi/efuse.c
index 1dc71455f270..5b8afdb3e0fe 100644
--- a/drivers/staging/rtlwifi/efuse.c
+++ b/drivers/staging/rtlwifi/efuse.c
@@ -245,7 +245,8 @@ void read_efuse(struct ieee80211_hw *hw, u16 _offset, u16 
_size_byte, u8 *pbuf)
if (!efuse_word)
goto out;
for (i = 0; i < EFUSE_MAX_WORD_UNIT; i++) {
-   efuse_word[i] = kcalloc(efuse_max_section, sizeof(u16), 
GFP_ATOMIC);
+   efuse_word[i] = kcalloc(efuse_max_section,
+   sizeof(u16), GFP_ATOMIC);
if (!efuse_word[i])
goto done;
}
@@ -375,7 +376,7 @@ bool efuse_shadow_update_chk(struct ieee80211_hw *hw)
for (i = 0; i < 8; i = i + 2) {
if ((rtlefuse->efuse_map[EFUSE_INIT_MAP][base + i] !=
 rtlefuse->efuse_map[EFUSE_MODIFY_MAP][base + i]) ||
-   (rtlefuse->efuse_map[EFUSE_INIT_MAP][base + i + 1] 
!=
+  (rtlefuse->efuse_map[EFUSE_INIT_MAP][base + i + 1] !=
 rtlefuse->efuse_map[EFUSE_MODIFY_MAP][base + i +
   1])) {
words_need++;
-- 
2.17.1

[PATCH] staging/rtlwifi: Fixing formatting warnings from checkpatch.pl.

2018-10-05 Thread Scott Tracy

Fixing formatting warnings in rtlwifi found by checkpatch.pl
Changes include breaking up functions calls into multi line calls.
No functional/logical changes. 

Signed-off-by: Scott Tracy 
---
 drivers/staging/rtlwifi/core.c  | 5 +++--
 drivers/staging/rtlwifi/efuse.c | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rtlwifi/core.c b/drivers/staging/rtlwifi/core.c
index ca37f7511c4d..a36cb44a5388 100644
--- a/drivers/staging/rtlwifi/core.c
+++ b/drivers/staging/rtlwifi/core.c
@@ -1109,7 +1109,7 @@ static void rtl_op_bss_info_changed(struct ieee80211_hw 
*hw,
if (rtlpriv->dm.supp_phymode_switch) {
if (sta->ht_cap.ht_supported)
rtl_send_smps_action(hw, sta,
-
IEEE80211_SMPS_STATIC);
+   IEEE80211_SMPS_STATIC);
}
 
if (rtlhal->current_bandtype == BAND_ON_5G) {
@@ -1882,7 +1882,8 @@ bool rtl_hal_pwrseqcmdparsing(struct rtl_priv *rtlpriv, 
u8 cut_version,
return true;
default:
WARN_ONCE(true,
- "rtlwifi: %s(): Unknown CMD!!\n", 
__func__);
+"rtlwifi: %s(): Unknown CMD!!\n",
+ __func__);
break;
}
}
diff --git a/drivers/staging/rtlwifi/efuse.c b/drivers/staging/rtlwifi/efuse.c
index 1dc71455f270..5b8afdb3e0fe 100644
--- a/drivers/staging/rtlwifi/efuse.c
+++ b/drivers/staging/rtlwifi/efuse.c
@@ -245,7 +245,8 @@ void read_efuse(struct ieee80211_hw *hw, u16 _offset, u16 
_size_byte, u8 *pbuf)
if (!efuse_word)
goto out;
for (i = 0; i < EFUSE_MAX_WORD_UNIT; i++) {
-   efuse_word[i] = kcalloc(efuse_max_section, sizeof(u16), 
GFP_ATOMIC);
+   efuse_word[i] = kcalloc(efuse_max_section,
+   sizeof(u16), GFP_ATOMIC);
if (!efuse_word[i])
goto done;
}
@@ -375,7 +376,7 @@ bool efuse_shadow_update_chk(struct ieee80211_hw *hw)
for (i = 0; i < 8; i = i + 2) {
if ((rtlefuse->efuse_map[EFUSE_INIT_MAP][base + i] !=
 rtlefuse->efuse_map[EFUSE_MODIFY_MAP][base + i]) ||
-   (rtlefuse->efuse_map[EFUSE_INIT_MAP][base + i + 1] 
!=
+  (rtlefuse->efuse_map[EFUSE_INIT_MAP][base + i + 1] !=
 rtlefuse->efuse_map[EFUSE_MODIFY_MAP][base + i +
   1])) {
words_need++;
-- 
2.17.1

Re: [PATCH RFC] mm: Add an fs-write seal to memfd

2018-10-05 Thread Joel Fernandes

On Fri, Oct 5, 2018 at 3:28 PM, Greg KH  wrote:
> On Fri, Oct 05, 2018 at 02:10:58PM -0700, Joel Fernandes wrote:
>> On Fri, Oct 05, 2018 at 12:53:39PM -0700, Andrew Morton wrote:
>> > On Fri,  5 Oct 2018 12:27:27 -0700 "Joel Fernandes (Google)" 
>> >  wrote:
>> >
>> > > To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
>> > > prevents any future mmap and write syscalls from succeeding while
>> > > keeping the existing mmap active. The following program shows the seal
>> > > working in action:
>> >
>> > Please be prepared to create a manpage patch for this one.
>>
>> Sure, I will do that. thanks,
>
> And a test case to the in-kernel memfd tests would be appreciated.


Sure, I will do add to those self-tests.

thanks,

 - Joel

Re: [PATCH RFC] mm: Add an fs-write seal to memfd

2018-10-05 Thread Joel Fernandes

On Fri, Oct 5, 2018 at 3:28 PM, Greg KH  wrote:
> On Fri, Oct 05, 2018 at 02:10:58PM -0700, Joel Fernandes wrote:
>> On Fri, Oct 05, 2018 at 12:53:39PM -0700, Andrew Morton wrote:
>> > On Fri,  5 Oct 2018 12:27:27 -0700 "Joel Fernandes (Google)" 
>> >  wrote:
>> >
>> > > To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
>> > > prevents any future mmap and write syscalls from succeeding while
>> > > keeping the existing mmap active. The following program shows the seal
>> > > working in action:
>> >
>> > Please be prepared to create a manpage patch for this one.
>>
>> Sure, I will do that. thanks,
>
> And a test case to the in-kernel memfd tests would be appreciated.


Sure, I will do add to those self-tests.

thanks,

 - Joel

Re: [PATCH] staging/rtlwifi: Fixing formatting warnings.

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 02:58:15PM -0600, Scott Tracy wrote:
> Signed-off-by: Scott Tracy 
> ---
>  drivers/staging/rtlwifi/core.c  | 5 +++--
>  drivers/staging/rtlwifi/efuse.c | 5 +++--
>  2 files changed, 6 insertions(+), 4 deletions(-)

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- Your patch did many different things all at once, making it difficult
  to review.  All Linux kernel patches need to only do one thing at a
  time.  If you need to do multiple things (such as clean up all coding
  style issues in a file/driver), do it in a sequence of patches, each
  one doing only one thing.  This will make it easier to review the
  patches to ensure that they are correct, and to help alleviate any
  merge issues that larger patches can cause.

- You did not specify a description of why the patch is needed, or
  possibly, any description at all, in the email body.  Please read the
  section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what is needed in order to
  properly describe the change.

- You did not write a descriptive Subject: for the patch, allowing Greg,
  and everyone else, to know what this patch is all about.  Please read
  the section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what a proper Subject: line should
  look like.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

Re: [PATCH] staging/rtlwifi: Fixing formatting warnings.

2018-10-05 Thread Greg Kroah-Hartman

On Fri, Oct 05, 2018 at 02:58:15PM -0600, Scott Tracy wrote:
> Signed-off-by: Scott Tracy 
> ---
>  drivers/staging/rtlwifi/core.c  | 5 +++--
>  drivers/staging/rtlwifi/efuse.c | 5 +++--
>  2 files changed, 6 insertions(+), 4 deletions(-)

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- Your patch did many different things all at once, making it difficult
  to review.  All Linux kernel patches need to only do one thing at a
  time.  If you need to do multiple things (such as clean up all coding
  style issues in a file/driver), do it in a sequence of patches, each
  one doing only one thing.  This will make it easier to review the
  patches to ensure that they are correct, and to help alleviate any
  merge issues that larger patches can cause.

- You did not specify a description of why the patch is needed, or
  possibly, any description at all, in the email body.  Please read the
  section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what is needed in order to
  properly describe the change.

- You did not write a descriptive Subject: for the patch, allowing Greg,
  and everyone else, to know what this patch is all about.  Please read
  the section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what a proper Subject: line should
  look like.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

Re: [PATCH RFC] mm: Add an fs-write seal to memfd

2018-10-05 Thread Greg KH

On Fri, Oct 05, 2018 at 02:10:58PM -0700, Joel Fernandes wrote:
> On Fri, Oct 05, 2018 at 12:53:39PM -0700, Andrew Morton wrote:
> > On Fri,  5 Oct 2018 12:27:27 -0700 "Joel Fernandes (Google)" 
> >  wrote:
> > 
> > > To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
> > > prevents any future mmap and write syscalls from succeeding while
> > > keeping the existing mmap active. The following program shows the seal
> > > working in action:
> > 
> > Please be prepared to create a manpage patch for this one.
> 
> Sure, I will do that. thanks,

And a test case to the in-kernel memfd tests would be appreciated.

thanks,

greg k-h

Re: [PATCH RFC] mm: Add an fs-write seal to memfd

2018-10-05 Thread Greg KH

On Fri, Oct 05, 2018 at 02:10:58PM -0700, Joel Fernandes wrote:
> On Fri, Oct 05, 2018 at 12:53:39PM -0700, Andrew Morton wrote:
> > On Fri,  5 Oct 2018 12:27:27 -0700 "Joel Fernandes (Google)" 
> >  wrote:
> > 
> > > To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
> > > prevents any future mmap and write syscalls from succeeding while
> > > keeping the existing mmap active. The following program shows the seal
> > > working in action:
> > 
> > Please be prepared to create a manpage patch for this one.
> 
> Sure, I will do that. thanks,

And a test case to the in-kernel memfd tests would be appreciated.

thanks,

greg k-h

Re: [PATCH] KVM: X86: Add missing KVM_AMD dependency

2018-10-05 Thread Paolo Bonzini

On 06/10/2018 00:03, Guenter Roeck wrote:
>> This should be handled by
>>
>> config KVM_AMD_SEV
>> def_bool y
>> bool "AMD Secure Encrypted Virtualization (SEV) support"
>> depends on KVM_AMD && X86_64
>> depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
>> ---help---
>> Provides support for launching Encrypted VMs on AMD processors.
>>
> Unfortunately it doesn't. It disables KVM_AMD_SEV, but that doesn't prevent
> the calls.

Yes, exactly - that's why I mentioned the sev_guest patch that should
cull all the SEV code from a !KVM_AMD_SEV build.

>> Maybe this works as well?  I haven't tested it yet:
>>
> I am sure there are many possible solutions. I would personally prefer one
> that enforces KVM_AMD=m with CRYPTO_DEV_CCP_DD=m, but that is just me.

Well, KVM_AMD=y is a relatively unusual choice to begin with.  The
question is whether then you want to disable this choice completely when
CRYPTO_DEV_CCP_DD=m, or just disable SEV.

My patch is a good idea anyway, if I may say so :), because it culls a
lot of code from a !KVM_AMD_SEV build.  But if it is not enough, we
certainly have to do something else about the failure you're reporting.

Paolo

Re: [PATCH] KVM: X86: Add missing KVM_AMD dependency

2018-10-05 Thread Paolo Bonzini

On 06/10/2018 00:03, Guenter Roeck wrote:
>> This should be handled by
>>
>> config KVM_AMD_SEV
>> def_bool y
>> bool "AMD Secure Encrypted Virtualization (SEV) support"
>> depends on KVM_AMD && X86_64
>> depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
>> ---help---
>> Provides support for launching Encrypted VMs on AMD processors.
>>
> Unfortunately it doesn't. It disables KVM_AMD_SEV, but that doesn't prevent
> the calls.

Yes, exactly - that's why I mentioned the sev_guest patch that should
cull all the SEV code from a !KVM_AMD_SEV build.

>> Maybe this works as well?  I haven't tested it yet:
>>
> I am sure there are many possible solutions. I would personally prefer one
> that enforces KVM_AMD=m with CRYPTO_DEV_CCP_DD=m, but that is just me.

Well, KVM_AMD=y is a relatively unusual choice to begin with.  The
question is whether then you want to disable this choice completely when
CRYPTO_DEV_CCP_DD=m, or just disable SEV.

My patch is a good idea anyway, if I may say so :), because it culls a
lot of code from a !KVM_AMD_SEV build.  But if it is not enough, we
certainly have to do something else about the failure you're reporting.

Paolo

Re: [RFC] x86/cpu_entry_area: move part of it back to fixmap

2018-10-05 Thread Nadav Amit

at 3:10 PM, Andy Lutomirski  wrote:

> On Fri, Oct 5, 2018 at 3:08 PM Nadav Amit  wrote:
>> at 10:02 AM, Andy Lutomirski  wrote:
>> 
>>> On Thu, Oct 4, 2018 at 9:31 AM Nadav Amit  wrote:
 at 7:11 AM, Andy Lutomirski  wrote:
 
> On Oct 3, 2018, at 9:59 PM, Nadav Amit  wrote:
> 
>> This RFC proposes to return part of the entry-area back to the fixmap to
>> improve system-call performance. Currently, since the entry-area is
>> mapped far (more than 2GB) away from the kernel text, an indirect branch
>> is needed to jump from the trampoline into the kernel. Due to Spectre
>> v2, vulnerable CPUs need to use a retpoline, which introduces an
>> overhead of >20 cycles.
> 
> That retpoline is gone in -tip. Can you see how your code stacks up 
> against -tip?  If it’s enough of a win to justify the added complexity, 
> we can try it.
> 
> You can see some pros and cons in the changelog:
> 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Ftip%2Fbf904d2762ee6fc1e4acfcb0772bbfb4a27ad8a6data=02%7C01%7Cnamit%40vmware.com%7C481a83f5323242399efd08d62b0f69ba%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636743742543114742sdata=uI5X3PITzEVeXHyafSGNV6oVNklpHbmhhRbtyoIurkk%3Dreserved=0
 
 Err.. That’s what I get for not following lkml. Very nice discussion.
 Based on it, I may be able to do an additional micro-optimizations or
 two. Let me give it a try.
>>> 
>>> I think you should at least try to benchmark your code against mine,
>>> since you more or less implemented the alternative I suggested. :)
>> 
>> That’s what I meant. So I made a couple of tweaksin my implementation to
>> make as performant as possible. Eventually, there is a 2ns benefit for the
>> trampoline over the unified entry-path on average on my Haswell VM (254ns vs
>> 256ns), yet there is some variance (1.2 & 1.5ns stdev correspondingly).
>> 
>> I don’t know whether such a difference should make one option to be preferred
>> over the other. I think it boils down to whether:
>> 
>> 1. KASLR is needed.
> 
> Why?  KASLR is basically worthless on any existing CPU against
> attackers who can run local code.
> 
>> 2. Can you specialize the code-paths of trampoline/non-trampoline to gain
>> better performance. For example, by removing the ALTERNATIVE from
>> SWITCH_TO_KERNEL_CR3 and not reload CR3 on the non-trampoline path, you can
>> avoid an unconditional jmp on machines which are not vulnerable to Meltdown.
>> 
>> So I can guess what you’d prefer. Let’s see if I’m right.
> 
> 2 ns isn't bad, at least on a non-PTI system.  Which, I suppose, means
> that you should benchmark on AMD :)
> 
> If the code is reasonably clean, I could get on board.

Fair enough. I’ll clean it and resend.

Thanks,
Nadav

Re: [RFC] x86/cpu_entry_area: move part of it back to fixmap

2018-10-05 Thread Nadav Amit

at 3:10 PM, Andy Lutomirski  wrote:

> On Fri, Oct 5, 2018 at 3:08 PM Nadav Amit  wrote:
>> at 10:02 AM, Andy Lutomirski  wrote:
>> 
>>> On Thu, Oct 4, 2018 at 9:31 AM Nadav Amit  wrote:
 at 7:11 AM, Andy Lutomirski  wrote:
 
> On Oct 3, 2018, at 9:59 PM, Nadav Amit  wrote:
> 
>> This RFC proposes to return part of the entry-area back to the fixmap to
>> improve system-call performance. Currently, since the entry-area is
>> mapped far (more than 2GB) away from the kernel text, an indirect branch
>> is needed to jump from the trampoline into the kernel. Due to Spectre
>> v2, vulnerable CPUs need to use a retpoline, which introduces an
>> overhead of >20 cycles.
> 
> That retpoline is gone in -tip. Can you see how your code stacks up 
> against -tip?  If it’s enough of a win to justify the added complexity, 
> we can try it.
> 
> You can see some pros and cons in the changelog:
> 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Ftip%2Fbf904d2762ee6fc1e4acfcb0772bbfb4a27ad8a6data=02%7C01%7Cnamit%40vmware.com%7C481a83f5323242399efd08d62b0f69ba%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636743742543114742sdata=uI5X3PITzEVeXHyafSGNV6oVNklpHbmhhRbtyoIurkk%3Dreserved=0
 
 Err.. That’s what I get for not following lkml. Very nice discussion.
 Based on it, I may be able to do an additional micro-optimizations or
 two. Let me give it a try.
>>> 
>>> I think you should at least try to benchmark your code against mine,
>>> since you more or less implemented the alternative I suggested. :)
>> 
>> That’s what I meant. So I made a couple of tweaksin my implementation to
>> make as performant as possible. Eventually, there is a 2ns benefit for the
>> trampoline over the unified entry-path on average on my Haswell VM (254ns vs
>> 256ns), yet there is some variance (1.2 & 1.5ns stdev correspondingly).
>> 
>> I don’t know whether such a difference should make one option to be preferred
>> over the other. I think it boils down to whether:
>> 
>> 1. KASLR is needed.
> 
> Why?  KASLR is basically worthless on any existing CPU against
> attackers who can run local code.
> 
>> 2. Can you specialize the code-paths of trampoline/non-trampoline to gain
>> better performance. For example, by removing the ALTERNATIVE from
>> SWITCH_TO_KERNEL_CR3 and not reload CR3 on the non-trampoline path, you can
>> avoid an unconditional jmp on machines which are not vulnerable to Meltdown.
>> 
>> So I can guess what you’d prefer. Let’s see if I’m right.
> 
> 2 ns isn't bad, at least on a non-PTI system.  Which, I suppose, means
> that you should benchmark on AMD :)
> 
> If the code is reasonably clean, I could get on board.

Fair enough. I’ll clean it and resend.

Thanks,
Nadav

Re: [RFC] x86/cpu_entry_area: move part of it back to fixmap

2018-10-05 Thread Andy Lutomirski

On Fri, Oct 5, 2018 at 3:08 PM Nadav Amit  wrote:
>
> at 10:02 AM, Andy Lutomirski  wrote:
>
> > On Thu, Oct 4, 2018 at 9:31 AM Nadav Amit  wrote:
> >> at 7:11 AM, Andy Lutomirski  wrote:
> >>
> >>> On Oct 3, 2018, at 9:59 PM, Nadav Amit  wrote:
> >>>
>  This RFC proposes to return part of the entry-area back to the fixmap to
>  improve system-call performance. Currently, since the entry-area is
>  mapped far (more than 2GB) away from the kernel text, an indirect branch
>  is needed to jump from the trampoline into the kernel. Due to Spectre
>  v2, vulnerable CPUs need to use a retpoline, which introduces an
>  overhead of >20 cycles.
> >>>
> >>> That retpoline is gone in -tip. Can you see how your code stacks up 
> >>> against -tip?  If it’s enough of a win to justify the added complexity, 
> >>> we can try it.
> >>>
> >>> You can see some pros and cons in the changelog:
> >>>
> >>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Ftip%2Fbf904d2762ee6fc1e4acfcb0772bbfb4a27ad8a6data=02%7C01%7Cnamit%40vmware.com%7C9996b2dd6f1745dce10b08d62a1b3f3e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636742693864878787sdata=NW0R%2Fv5OahZlTbbNgnFk20sF4Wt1W0MDjtv9g1k%2BWdg%3Dreserved=0
> >>
> >> Err.. That’s what I get for not following lkml. Very nice discussion.
> >> Based on it, I may be able to do an additional micro-optimizations or
> >> two. Let me give it a try.
> >
> > I think you should at least try to benchmark your code against mine,
> > since you more or less implemented the alternative I suggested. :)
>
> That’s what I meant. So I made a couple of tweaksin my implementation to
> make as performant as possible. Eventually, there is a 2ns benefit for the
> trampoline over the unified entry-path on average on my Haswell VM (254ns vs
> 256ns), yet there is some variance (1.2 & 1.5ns stdev correspondingly).
>
> I don’t know whether such a difference should make one option to be preferred
> over the other. I think it boils down to whether:
>
> 1. KASLR is needed.

Why?  KASLR is basically worthless on any existing CPU against
attackers who can run local code.

>
> 2. Can you specialize the code-paths of trampoline/non-trampoline to gain
> better performance. For example, by removing the ALTERNATIVE from
> SWITCH_TO_KERNEL_CR3 and not reload CR3 on the non-trampoline path, you can
> avoid an unconditional jmp on machines which are not vulnerable to Meltdown.
>
> So I can guess what you’d prefer. Let’s see if I’m right.
>

2 ns isn't bad, at least on a non-PTI system.  Which, I suppose, means
that you should benchmark on AMD :)

If the code is reasonably clean, I could get on board.

Re: [RFC] x86/cpu_entry_area: move part of it back to fixmap

2018-10-05 Thread Andy Lutomirski

On Fri, Oct 5, 2018 at 3:08 PM Nadav Amit  wrote:
>
> at 10:02 AM, Andy Lutomirski  wrote:
>
> > On Thu, Oct 4, 2018 at 9:31 AM Nadav Amit  wrote:
> >> at 7:11 AM, Andy Lutomirski  wrote:
> >>
> >>> On Oct 3, 2018, at 9:59 PM, Nadav Amit  wrote:
> >>>
>  This RFC proposes to return part of the entry-area back to the fixmap to
>  improve system-call performance. Currently, since the entry-area is
>  mapped far (more than 2GB) away from the kernel text, an indirect branch
>  is needed to jump from the trampoline into the kernel. Due to Spectre
>  v2, vulnerable CPUs need to use a retpoline, which introduces an
>  overhead of >20 cycles.
> >>>
> >>> That retpoline is gone in -tip. Can you see how your code stacks up 
> >>> against -tip?  If it’s enough of a win to justify the added complexity, 
> >>> we can try it.
> >>>
> >>> You can see some pros and cons in the changelog:
> >>>
> >>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Ftip%2Fbf904d2762ee6fc1e4acfcb0772bbfb4a27ad8a6data=02%7C01%7Cnamit%40vmware.com%7C9996b2dd6f1745dce10b08d62a1b3f3e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636742693864878787sdata=NW0R%2Fv5OahZlTbbNgnFk20sF4Wt1W0MDjtv9g1k%2BWdg%3Dreserved=0
> >>
> >> Err.. That’s what I get for not following lkml. Very nice discussion.
> >> Based on it, I may be able to do an additional micro-optimizations or
> >> two. Let me give it a try.
> >
> > I think you should at least try to benchmark your code against mine,
> > since you more or less implemented the alternative I suggested. :)
>
> That’s what I meant. So I made a couple of tweaksin my implementation to
> make as performant as possible. Eventually, there is a 2ns benefit for the
> trampoline over the unified entry-path on average on my Haswell VM (254ns vs
> 256ns), yet there is some variance (1.2 & 1.5ns stdev correspondingly).
>
> I don’t know whether such a difference should make one option to be preferred
> over the other. I think it boils down to whether:
>
> 1. KASLR is needed.

Why?  KASLR is basically worthless on any existing CPU against
attackers who can run local code.

>
> 2. Can you specialize the code-paths of trampoline/non-trampoline to gain
> better performance. For example, by removing the ALTERNATIVE from
> SWITCH_TO_KERNEL_CR3 and not reload CR3 on the non-trampoline path, you can
> avoid an unconditional jmp on machines which are not vulnerable to Meltdown.
>
> So I can guess what you’d prefer. Let’s see if I’m right.
>

2 ns isn't bad, at least on a non-PTI system.  Which, I suppose, means
that you should benchmark on AMD :)

If the code is reasonably clean, I could get on board.

Re: [PATCH] KVM: X86: Add missing KVM_AMD dependency

2018-10-05 Thread Guenter Roeck

On Fri, Oct 05, 2018 at 10:41:55PM +0200, Paolo Bonzini wrote:
> On 05/10/2018 20:46, Guenter Roeck wrote:
> > Analysis shows that commit 59414c9892208 ("KVM: SVM: Add support for
> > KVM_SEV_LAUNCH_START command") added a dependency of KVM_AMD on
> > CRYPTO_DEV_CCP_DD if CRYPTO_DEV_SP_PSP is enabled: If CRYPTO_DEV_CCP_DD
> > is built as module, KVM_AMD must be built as module as well.
> > 
> > Fixes: 59414c9892208 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START 
> > command")
> > Cc: Brijesh Singh 
> > Cc: Borislav Petkov 
> > Signed-off-by: Guenter Roeck 
> 
> This should be handled by
> 
> config KVM_AMD_SEV
> def_bool y
> bool "AMD Secure Encrypted Virtualization (SEV) support"
> depends on KVM_AMD && X86_64
> depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
> ---help---
> Provides support for launching Encrypted VMs on AMD processors.
> 

Unfortunately it doesn't. It disables KVM_AMD_SEV, but that doesn't prevent
the calls.

> Maybe this works as well?  I haven't tested it yet:
> 

I am sure there are many possible solutions. I would personally prefer one
that enforces KVM_AMD=m with CRYPTO_DEV_CCP_DD=m, but that is just me.

Thanks,
Guenter

> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 89c4c5aa15f1..55f10b17d044 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -441,9 +441,13 @@ static inline bool svm_sev_enabled(void)
> 
>  static inline bool sev_guest(struct kvm *kvm)
>  {
> +#ifdef CONFIG_KVM_AMD_SEV
>   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
> 
>   return sev->active;
> +#else
> + return false;
> +#endif
>  }
> 
>  static inline int sev_get_asid(struct kvm *kvm)
> 
> Thanks,
> 
> Paolo

Re: [PATCH] KVM: X86: Add missing KVM_AMD dependency

2018-10-05 Thread Guenter Roeck

On Fri, Oct 05, 2018 at 10:41:55PM +0200, Paolo Bonzini wrote:
> On 05/10/2018 20:46, Guenter Roeck wrote:
> > Analysis shows that commit 59414c9892208 ("KVM: SVM: Add support for
> > KVM_SEV_LAUNCH_START command") added a dependency of KVM_AMD on
> > CRYPTO_DEV_CCP_DD if CRYPTO_DEV_SP_PSP is enabled: If CRYPTO_DEV_CCP_DD
> > is built as module, KVM_AMD must be built as module as well.
> > 
> > Fixes: 59414c9892208 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START 
> > command")
> > Cc: Brijesh Singh 
> > Cc: Borislav Petkov 
> > Signed-off-by: Guenter Roeck 
> 
> This should be handled by
> 
> config KVM_AMD_SEV
> def_bool y
> bool "AMD Secure Encrypted Virtualization (SEV) support"
> depends on KVM_AMD && X86_64
> depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
> ---help---
> Provides support for launching Encrypted VMs on AMD processors.
> 

Unfortunately it doesn't. It disables KVM_AMD_SEV, but that doesn't prevent
the calls.

> Maybe this works as well?  I haven't tested it yet:
> 

I am sure there are many possible solutions. I would personally prefer one
that enforces KVM_AMD=m with CRYPTO_DEV_CCP_DD=m, but that is just me.

Thanks,
Guenter

> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 89c4c5aa15f1..55f10b17d044 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -441,9 +441,13 @@ static inline bool svm_sev_enabled(void)
> 
>  static inline bool sev_guest(struct kvm *kvm)
>  {
> +#ifdef CONFIG_KVM_AMD_SEV
>   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
> 
>   return sev->active;
> +#else
> + return false;
> +#endif
>  }
> 
>  static inline int sev_get_asid(struct kvm *kvm)
> 
> Thanks,
> 
> Paolo

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1296 matches

Mail list logo