linux-next: Tree for Mar 30

2017-03-29 Thread Stephen Rothwell
Hi all,

Changes since 20170329:

Undropped tree: xen-tip

The vfs tree gained a conflict against Linus' tree.

The drm tree gained conflicts against the drm-intel-fixes tree.

The mailbox tree lost its build failure.

The phy-next tree gained a build failure, so I used the version from
next-20170329.

The vhost tree gained a build failure, so I used the version from
next-20170329.

Non-merge commits (relative to Linus' tree): 5728
 6122 files changed, 441716 insertions(+), 108564 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 255 trees (counting Linus' and 37 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (806276b7f07a Merge branch 'for-linus' of 
git://git.kernel.dk/linux-block)
Merging fixes/master (97da3854c526 Linux 4.11-rc3)
Merging kbuild-current/fixes (9be3213b14d4 gconfig: remove misleading 
parentheses around a condition)
Merging arc-current/for-curr (ae9955aeb8e4 ARC: vdk: Fix support of UIO)
Merging arm-current/fixes (a1016e94cce9 ARM: wire up statx syscall)
Merging m68k-current/for-linus (e3b1ebd67387 m68k: Wire up statx)
Merging metag-fixes/fixes (35d04077ad96 metag: Only define 
atomic_dec_if_positive conditionally)
Merging powerpc-fixes/fixes (cc638a488a52 gcc-plugins: update architecture list 
in documentation)
Merging sparc/master (0ae2d26ffe70 arch/sparc: Avoid DCTI Couples)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (8f1f7eeb22c1 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf)
Merging ipsec/master (72ef9c4125c7 dccp: fix memory leak during tear-down of 
unsuccessful connection request)
Merging netfilter/master (77c1c03c5b8e netfilter: nfnetlink_queue: fix secctx 
memory leak)
Merging ipvs/master (5371bbf4b295 net: bcmgenet: Do not suspend PHY if 
Wake-on-LAN is enabled)
Merging wireless-drivers/master (6be3b6cce1e2 ath10k: fix incorrect 
wlan_mac_base in qca6174_regs)
Merging mac80211/master (7d65f82954da mac80211: unconditionally start new 
netdev queues with iTXQ support)
Merging sound-current/for-linus (2d7d54002e39 ALSA: seq: Fix race during FIFO 
resize)
Merging pci-current/for-linus (9abb27c7594a PCI: thunder-pem: Add legacy 
firmware support for Cavium ThunderX host controller)
Merging driver-core.current/driver-core-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging tty.current/tty-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging usb.current/usb-linus (a7f12a21f6b3 usb: phy: isp1301: Fix build 
warning when CONFIG_OF is disabled)
Merging usb-gadget-fixes/fixes (25cd9721c2b1 usb: gadget: f_hid: fix: Don't 
access hidg->req without spinlock held)
Merging usb-serial-fixes/usb-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move 
the lock initialization to core file)
Merging phy/fixes (1a09b6a7c10e phy: qcom-usb-hs: Add depends on EXTCON)
Merging staging.current/staging-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging char-misc.current/char-misc-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging input-current/for-linus (5659495a7a14 uapi: add missing install of 
userio.h)
Merging crypto-current/master (9df0eb180c20 crypto: xts,lrw - fix out-of-bounds 
write after kmalloc failure)
Merging ide/master (96297aee8bce ide: palm_bk3710: add __initdata to 
palm_bk3710_port_info)
Merging vfio-fixes/for-linus (65b1adeb

linux-next: Tree for Mar 30

2017-03-29 Thread Stephen Rothwell
Hi all,

Changes since 20170329:

Undropped tree: xen-tip

The vfs tree gained a conflict against Linus' tree.

The drm tree gained conflicts against the drm-intel-fixes tree.

The mailbox tree lost its build failure.

The phy-next tree gained a build failure, so I used the version from
next-20170329.

The vhost tree gained a build failure, so I used the version from
next-20170329.

Non-merge commits (relative to Linus' tree): 5728
 6122 files changed, 441716 insertions(+), 108564 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 255 trees (counting Linus' and 37 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (806276b7f07a Merge branch 'for-linus' of 
git://git.kernel.dk/linux-block)
Merging fixes/master (97da3854c526 Linux 4.11-rc3)
Merging kbuild-current/fixes (9be3213b14d4 gconfig: remove misleading 
parentheses around a condition)
Merging arc-current/for-curr (ae9955aeb8e4 ARC: vdk: Fix support of UIO)
Merging arm-current/fixes (a1016e94cce9 ARM: wire up statx syscall)
Merging m68k-current/for-linus (e3b1ebd67387 m68k: Wire up statx)
Merging metag-fixes/fixes (35d04077ad96 metag: Only define 
atomic_dec_if_positive conditionally)
Merging powerpc-fixes/fixes (cc638a488a52 gcc-plugins: update architecture list 
in documentation)
Merging sparc/master (0ae2d26ffe70 arch/sparc: Avoid DCTI Couples)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (8f1f7eeb22c1 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf)
Merging ipsec/master (72ef9c4125c7 dccp: fix memory leak during tear-down of 
unsuccessful connection request)
Merging netfilter/master (77c1c03c5b8e netfilter: nfnetlink_queue: fix secctx 
memory leak)
Merging ipvs/master (5371bbf4b295 net: bcmgenet: Do not suspend PHY if 
Wake-on-LAN is enabled)
Merging wireless-drivers/master (6be3b6cce1e2 ath10k: fix incorrect 
wlan_mac_base in qca6174_regs)
Merging mac80211/master (7d65f82954da mac80211: unconditionally start new 
netdev queues with iTXQ support)
Merging sound-current/for-linus (2d7d54002e39 ALSA: seq: Fix race during FIFO 
resize)
Merging pci-current/for-linus (9abb27c7594a PCI: thunder-pem: Add legacy 
firmware support for Cavium ThunderX host controller)
Merging driver-core.current/driver-core-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging tty.current/tty-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging usb.current/usb-linus (a7f12a21f6b3 usb: phy: isp1301: Fix build 
warning when CONFIG_OF is disabled)
Merging usb-gadget-fixes/fixes (25cd9721c2b1 usb: gadget: f_hid: fix: Don't 
access hidg->req without spinlock held)
Merging usb-serial-fixes/usb-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move 
the lock initialization to core file)
Merging phy/fixes (1a09b6a7c10e phy: qcom-usb-hs: Add depends on EXTCON)
Merging staging.current/staging-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging char-misc.current/char-misc-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging input-current/for-linus (5659495a7a14 uapi: add missing install of 
userio.h)
Merging crypto-current/master (9df0eb180c20 crypto: xts,lrw - fix out-of-bounds 
write after kmalloc failure)
Merging ide/master (96297aee8bce ide: palm_bk3710: add __initdata to 
palm_bk3710_port_info)
Merging vfio-fixes/for-linus (65b1adeb

Re: [PATCH v8 3/3] printk: fix double printing with earlycon

2017-03-29 Thread Sergey Senozhatsky
On (03/28/17 14:56), Petr Mladek wrote:
[..]
> > > Is it better?  If not, I will send a version with console_cmdline_last.
> > 
> > personally I'm fine with the nested loop. the latest version
> > "for (last = MAX_CMDLINECONSOLES - 1; last >= 0;..."
> > 
> > is even easier to read.
> 
> The number of elements is bumped on a single location, so there
> is not much to synchronize. The old approach was fine because
> the for cycles were needed anyway, they started on the 0th element,
> and NULL ended arrays are rather common practice.
> 
> But we are searching the array from the end now. Also we use the
> for cycle just to get the number here. This is not a common
> practice and it makes the code more complicated and strange from
> my point of view.

I'm fine with either way :)

[..]
> > neither add_preferred_console() nor __add_preferred_console() have any
> > serialization. and I assume that we can call add_preferred_console()
> > concurrently, can't we?
[..]
> If I did not miss anything, it would seem that
> __add_preferred_console() are called synchronously
> and only during boot by design.

thanks. I think you are right. it's console_initcall or __init.

-ss


Re: [PATCH v8 3/3] printk: fix double printing with earlycon

2017-03-29 Thread Sergey Senozhatsky
On (03/28/17 14:56), Petr Mladek wrote:
[..]
> > > Is it better?  If not, I will send a version with console_cmdline_last.
> > 
> > personally I'm fine with the nested loop. the latest version
> > "for (last = MAX_CMDLINECONSOLES - 1; last >= 0;..."
> > 
> > is even easier to read.
> 
> The number of elements is bumped on a single location, so there
> is not much to synchronize. The old approach was fine because
> the for cycles were needed anyway, they started on the 0th element,
> and NULL ended arrays are rather common practice.
> 
> But we are searching the array from the end now. Also we use the
> for cycle just to get the number here. This is not a common
> practice and it makes the code more complicated and strange from
> my point of view.

I'm fine with either way :)

[..]
> > neither add_preferred_console() nor __add_preferred_console() have any
> > serialization. and I assume that we can call add_preferred_console()
> > concurrently, can't we?
[..]
> If I did not miss anything, it would seem that
> __add_preferred_console() are called synchronously
> and only during boot by design.

thanks. I think you are right. it's console_initcall or __init.

-ss


Re: [PATCH v4 19/23] drivers/fsi: Add GPIO based FSI master

2017-03-29 Thread Joel Stanley
On Thu, Mar 30, 2017 at 4:13 AM, Christopher Bostic
 wrote:
> From: Chris Bostic 
>
> Implement a FSI master using GPIO.  Will generate FSI protocol for
> read and write commands to particular addresses.  Sends master command
> and waits for and decodes a slave response.
>
> Includes changes from Edward A. James  and Jeremy
> Kerr .
>
> Signed-off-by: Edward A. James 
> Signed-off-by: Jeremy Kerr 
> Signed-off-by: Chris Bostic 
> Signed-off-by: Joel Stanley 
> ---
>  arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts |  10 +
>  arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts  |  12 +
>  drivers/fsi/Kconfig   |  11 +
>  drivers/fsi/Makefile  |   1 +
>  drivers/fsi/fsi-master-gpio.c | 614 
> ++
>  5 files changed, 648 insertions(+)
>  create mode 100644 drivers/fsi/fsi-master-gpio.c
>
> diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts 
> b/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts
> index 1d2fc1e..cf7d7d7 100644
> --- a/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts
> +++ b/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts
> @@ -29,6 +29,16 @@
> reg = <0x5f00 0x0100>; /* 16M */
> };
> };
> +
> +   gpio-fsi {
> +   compatible = "fsi-master-gpio", "fsi-master";
> +
> +   clock-gpios = < ASPEED_GPIO(A, 4) GPIO_ACTIVE_HIGH>;
> +   data-gpios = < ASPEED_GPIO(A, 5) GPIO_ACTIVE_HIGH>;
> +   mux-gpios = < ASPEED_GPIO(A, 6) GPIO_ACTIVE_HIGH>;
> +   enable-gpios = < ASPEED_GPIO(D, 0) GPIO_ACTIVE_HIGH>;
> +   trans-gpios = < ASPEED_GPIO(H, 6) GPIO_ACTIVE_HIGH>;
> +   };
>  };
>
>   {
> diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts 
> b/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts
> index 7a3b2b5..2fd7db7 100644
> --- a/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts
> +++ b/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts
> @@ -29,6 +29,18 @@
> reg = <0xbf00 0x0100>; /* 16M */
> };
> };
> +
> +   gpio-fsi {
> +   compatible = "fsi-master-gpio", "fsi-master";
> +
> +   status = "okay";
> +
> +   clock-gpios = < ASPEED_GPIO(AA, 0) GPIO_ACTIVE_HIGH>;
> +   data-gpios = < ASPEED_GPIO(AA, 2) GPIO_ACTIVE_HIGH>;
> +   mux-gpios = < ASPEED_GPIO(A, 6) GPIO_ACTIVE_HIGH>;
> +   enable-gpios = < ASPEED_GPIO(D, 0) GPIO_ACTIVE_HIGH>;
> +   trans-gpios = < ASPEED_GPIO(R, 2) GPIO_ACTIVE_HIGH>;
> +   };
>  };

I'm not sure what happened here. The changes to the device trees
should not be in this patch.

Cheers,

Joel

>
>   {
> diff --git a/drivers/fsi/Kconfig b/drivers/fsi/Kconfig
> index 04c1a0e..9cf8345 100644
> --- a/drivers/fsi/Kconfig
> +++ b/drivers/fsi/Kconfig
> @@ -9,4 +9,15 @@ config FSI
> ---help---
>   FSI - the FRU Support Interface - is a simple bus for low-level
>   access to POWER-based hardware.
> +
> +if FSI
> +
> +config FSI_MASTER_GPIO
> +   tristate "GPIO-based FSI master"
> +   depends on FSI && GPIOLIB
> +   ---help---
> +   This option enables a FSI master driver using GPIO lines.
> +
> +endif
> +
>  endmenu
> diff --git a/drivers/fsi/Makefile b/drivers/fsi/Makefile
> index db0e5e7..ed28ac0 100644
> --- a/drivers/fsi/Makefile
> +++ b/drivers/fsi/Makefile
> @@ -1,2 +1,3 @@
>
>  obj-$(CONFIG_FSI) += fsi-core.o
> +obj-$(CONFIG_FSI_MASTER_GPIO) += fsi-master-gpio.o
> diff --git a/drivers/fsi/fsi-master-gpio.c b/drivers/fsi/fsi-master-gpio.c
> new file mode 100644
> index 000..0bf5caa
> --- /dev/null
> +++ b/drivers/fsi/fsi-master-gpio.c
> @@ -0,0 +1,614 @@
> +/*
> + * A FSI master controller, using a simple GPIO bit-banging interface
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "fsi-master.h"
> +
> +#defineFSI_GPIO_STD_DLY1   /* Standard pin delay in nS */
> +#defineFSI_ECHO_DELAY_CLOCKS   16  /* Number clocks for echo 
> delay */
> +#defineFSI_PRE_BREAK_CLOCKS50  /* Number clocks to prep for 
> break */
> +#defineFSI_BREAK_CLOCKS256 /* Number of clocks to issue 
> break */
> +#defineFSI_POST_BREAK_CLOCKS   16000   /* Number clocks to set up 
> cfam */
> +#defineFSI_INIT_CLOCKS 5000/* Clock out any old data */
> +#defineFSI_GPIO_STD_DELAY  10  /* Standard GPIO delay in nS 
> */
> +   /* todo: adjust down as low as */
> +   /* possible or eliminate */
> +#defineFSI_GPIO_CMD_DPOLL  0x2
> +#defineFSI_GPIO_CMD_TERM   0x3f
> +#define 

Re: [PATCH v4 19/23] drivers/fsi: Add GPIO based FSI master

2017-03-29 Thread Joel Stanley
On Thu, Mar 30, 2017 at 4:13 AM, Christopher Bostic
 wrote:
> From: Chris Bostic 
>
> Implement a FSI master using GPIO.  Will generate FSI protocol for
> read and write commands to particular addresses.  Sends master command
> and waits for and decodes a slave response.
>
> Includes changes from Edward A. James  and Jeremy
> Kerr .
>
> Signed-off-by: Edward A. James 
> Signed-off-by: Jeremy Kerr 
> Signed-off-by: Chris Bostic 
> Signed-off-by: Joel Stanley 
> ---
>  arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts |  10 +
>  arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts  |  12 +
>  drivers/fsi/Kconfig   |  11 +
>  drivers/fsi/Makefile  |   1 +
>  drivers/fsi/fsi-master-gpio.c | 614 
> ++
>  5 files changed, 648 insertions(+)
>  create mode 100644 drivers/fsi/fsi-master-gpio.c
>
> diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts 
> b/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts
> index 1d2fc1e..cf7d7d7 100644
> --- a/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts
> +++ b/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts
> @@ -29,6 +29,16 @@
> reg = <0x5f00 0x0100>; /* 16M */
> };
> };
> +
> +   gpio-fsi {
> +   compatible = "fsi-master-gpio", "fsi-master";
> +
> +   clock-gpios = < ASPEED_GPIO(A, 4) GPIO_ACTIVE_HIGH>;
> +   data-gpios = < ASPEED_GPIO(A, 5) GPIO_ACTIVE_HIGH>;
> +   mux-gpios = < ASPEED_GPIO(A, 6) GPIO_ACTIVE_HIGH>;
> +   enable-gpios = < ASPEED_GPIO(D, 0) GPIO_ACTIVE_HIGH>;
> +   trans-gpios = < ASPEED_GPIO(H, 6) GPIO_ACTIVE_HIGH>;
> +   };
>  };
>
>   {
> diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts 
> b/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts
> index 7a3b2b5..2fd7db7 100644
> --- a/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts
> +++ b/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts
> @@ -29,6 +29,18 @@
> reg = <0xbf00 0x0100>; /* 16M */
> };
> };
> +
> +   gpio-fsi {
> +   compatible = "fsi-master-gpio", "fsi-master";
> +
> +   status = "okay";
> +
> +   clock-gpios = < ASPEED_GPIO(AA, 0) GPIO_ACTIVE_HIGH>;
> +   data-gpios = < ASPEED_GPIO(AA, 2) GPIO_ACTIVE_HIGH>;
> +   mux-gpios = < ASPEED_GPIO(A, 6) GPIO_ACTIVE_HIGH>;
> +   enable-gpios = < ASPEED_GPIO(D, 0) GPIO_ACTIVE_HIGH>;
> +   trans-gpios = < ASPEED_GPIO(R, 2) GPIO_ACTIVE_HIGH>;
> +   };
>  };

I'm not sure what happened here. The changes to the device trees
should not be in this patch.

Cheers,

Joel

>
>   {
> diff --git a/drivers/fsi/Kconfig b/drivers/fsi/Kconfig
> index 04c1a0e..9cf8345 100644
> --- a/drivers/fsi/Kconfig
> +++ b/drivers/fsi/Kconfig
> @@ -9,4 +9,15 @@ config FSI
> ---help---
>   FSI - the FRU Support Interface - is a simple bus for low-level
>   access to POWER-based hardware.
> +
> +if FSI
> +
> +config FSI_MASTER_GPIO
> +   tristate "GPIO-based FSI master"
> +   depends on FSI && GPIOLIB
> +   ---help---
> +   This option enables a FSI master driver using GPIO lines.
> +
> +endif
> +
>  endmenu
> diff --git a/drivers/fsi/Makefile b/drivers/fsi/Makefile
> index db0e5e7..ed28ac0 100644
> --- a/drivers/fsi/Makefile
> +++ b/drivers/fsi/Makefile
> @@ -1,2 +1,3 @@
>
>  obj-$(CONFIG_FSI) += fsi-core.o
> +obj-$(CONFIG_FSI_MASTER_GPIO) += fsi-master-gpio.o
> diff --git a/drivers/fsi/fsi-master-gpio.c b/drivers/fsi/fsi-master-gpio.c
> new file mode 100644
> index 000..0bf5caa
> --- /dev/null
> +++ b/drivers/fsi/fsi-master-gpio.c
> @@ -0,0 +1,614 @@
> +/*
> + * A FSI master controller, using a simple GPIO bit-banging interface
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "fsi-master.h"
> +
> +#defineFSI_GPIO_STD_DLY1   /* Standard pin delay in nS */
> +#defineFSI_ECHO_DELAY_CLOCKS   16  /* Number clocks for echo 
> delay */
> +#defineFSI_PRE_BREAK_CLOCKS50  /* Number clocks to prep for 
> break */
> +#defineFSI_BREAK_CLOCKS256 /* Number of clocks to issue 
> break */
> +#defineFSI_POST_BREAK_CLOCKS   16000   /* Number clocks to set up 
> cfam */
> +#defineFSI_INIT_CLOCKS 5000/* Clock out any old data */
> +#defineFSI_GPIO_STD_DELAY  10  /* Standard GPIO delay in nS 
> */
> +   /* todo: adjust down as low as */
> +   /* possible or eliminate */
> +#defineFSI_GPIO_CMD_DPOLL  0x2
> +#defineFSI_GPIO_CMD_TERM   0x3f
> +#define FSI_GPIO_CMD_ABS_AR0x4
> +
> +#defineFSI_GPIO_DPOLL_CLOCKS   100  /* < 21 will cause slave to 
> hang */
> +
> +/* Bus errors */
> +#define

Re: [REGRESSION 4.11] Commit d8514d8edb5b ("ovl: copy up regular file using O_TMPFILE") breaks ubifs

2017-03-29 Thread Ralph Sennhauser
Hi Richard,

On Thu, 30 Mar 2017 00:15:31 +0200
Richard Weinberger  wrote:

> Ralph,
> 
> Am 29.03.2017 um 23:26 schrieb Ralph Sennhauser:
> >> # create and link a tmpfile - then remove it
> >> sudo rm -rf foo; sudo xfs_io -T -c "flink foo" . ; ls -l foo; sudo
> >> rm foo  
> > 
> > next-20170328, obviously without your patch:
> > 
> >   # xfs_io -T -c "flink foo" .
> >   .: Not supported  
> 
> -T tells xfs_io to create a tmpfile but you have it disabled in UBIFS.

Bash history says I booted the other partition than I flashed (uname
just happened to match), the downside of the dual boot layout. :)

  # rm -rf foo
  # xfs_io -T -c "flink foo" .
  # ls -l foo
  -rw---1 root root 0 Mar 30 02:00 foo
  # rm foo
  [  305.001436] UBIFS error (ubi0:0 pid 2493): ubifs_add_orphan: orphaned twice

However, unlike with the overlay setup the filesystem can be mounted
just fine without imediatly visible side effects.

> 
> Anyway, can you please test the attached patch?
> Amir was right, UBIFS misses a corner case in ubifs_link(). ;-\
> 
> I think I understand also why we never noticed this in xfstests,
> UBIFS prints only a non-fatal error while umounting the filesystem in
> this case. But will test tomorrow in more detail, need some sleep now.
> 
> diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
> index 30825d882aa9..95e348a2da29 100644
> --- a/fs/ubifs/dir.c
> +++ b/fs/ubifs/dir.c
> @@ -748,6 +748,9 @@ static int ubifs_link(struct dentry *old_dentry,
> struct inode *dir, goto out_fname;
> 
>   lock_2_inodes(dir, inode);
> + if (inode->i_nlink == 0)
> + ubifs_delete_orphan(c, inode->i_ino);
> +
>   inc_nlink(inode);
>   ihold(inode);
>   inode->i_ctime = ubifs_current_time(inode);
> 
> Thanks,
> //richard

With this patch I'm no longer able to reproduce _this_ issue, however,
rename no longer works either:

  # mv /etc/config/wireless /etc/config/wireless.back
  mv: can't rename '/etc/config/wireless': Invalid argument
  # ls /etc/config/wireless
  /etc/config/wireless
  # rm /etc/config/wireless
  # ls /etc/config/wireless
  ls: /etc/config/wireless: No such file or directory


Thanks
Ralph


Re: [REGRESSION 4.11] Commit d8514d8edb5b ("ovl: copy up regular file using O_TMPFILE") breaks ubifs

2017-03-29 Thread Ralph Sennhauser
Hi Richard,

On Thu, 30 Mar 2017 00:15:31 +0200
Richard Weinberger  wrote:

> Ralph,
> 
> Am 29.03.2017 um 23:26 schrieb Ralph Sennhauser:
> >> # create and link a tmpfile - then remove it
> >> sudo rm -rf foo; sudo xfs_io -T -c "flink foo" . ; ls -l foo; sudo
> >> rm foo  
> > 
> > next-20170328, obviously without your patch:
> > 
> >   # xfs_io -T -c "flink foo" .
> >   .: Not supported  
> 
> -T tells xfs_io to create a tmpfile but you have it disabled in UBIFS.

Bash history says I booted the other partition than I flashed (uname
just happened to match), the downside of the dual boot layout. :)

  # rm -rf foo
  # xfs_io -T -c "flink foo" .
  # ls -l foo
  -rw---1 root root 0 Mar 30 02:00 foo
  # rm foo
  [  305.001436] UBIFS error (ubi0:0 pid 2493): ubifs_add_orphan: orphaned twice

However, unlike with the overlay setup the filesystem can be mounted
just fine without imediatly visible side effects.

> 
> Anyway, can you please test the attached patch?
> Amir was right, UBIFS misses a corner case in ubifs_link(). ;-\
> 
> I think I understand also why we never noticed this in xfstests,
> UBIFS prints only a non-fatal error while umounting the filesystem in
> this case. But will test tomorrow in more detail, need some sleep now.
> 
> diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
> index 30825d882aa9..95e348a2da29 100644
> --- a/fs/ubifs/dir.c
> +++ b/fs/ubifs/dir.c
> @@ -748,6 +748,9 @@ static int ubifs_link(struct dentry *old_dentry,
> struct inode *dir, goto out_fname;
> 
>   lock_2_inodes(dir, inode);
> + if (inode->i_nlink == 0)
> + ubifs_delete_orphan(c, inode->i_ino);
> +
>   inc_nlink(inode);
>   ihold(inode);
>   inode->i_ctime = ubifs_current_time(inode);
> 
> Thanks,
> //richard

With this patch I'm no longer able to reproduce _this_ issue, however,
rename no longer works either:

  # mv /etc/config/wireless /etc/config/wireless.back
  mv: can't rename '/etc/config/wireless': Invalid argument
  # ls /etc/config/wireless
  /etc/config/wireless
  # rm /etc/config/wireless
  # ls /etc/config/wireless
  ls: /etc/config/wireless: No such file or directory


Thanks
Ralph


[PATCH v2 3/4] zram: make deduplication feature optional

2017-03-29 Thread js1304
From: Joonsoo Kim 

Benefit of deduplication is dependent on the workload so it's not
preferable to always enable. Therefore, make it optional in Kconfig
and device param. Default is 'off'. This option will be beneficial
for users who use the zram as blockdev and stores build output to it.

Signed-off-by: Joonsoo Kim 
---
 Documentation/ABI/testing/sysfs-block-zram | 10 +
 Documentation/blockdev/zram.txt|  1 +
 drivers/block/zram/Kconfig | 14 +++
 drivers/block/zram/Makefile|  5 ++-
 drivers/block/zram/zram_dedup.c| 31 ++-
 drivers/block/zram/zram_dedup.h| 33 +++-
 drivers/block/zram/zram_drv.c  | 62 ++
 drivers/block/zram/zram_drv.h  |  1 +
 8 files changed, 146 insertions(+), 11 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block-zram 
b/Documentation/ABI/testing/sysfs-block-zram
index 451b6d8..3c1945f 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -90,3 +90,13 @@ Description:
device's debugging info useful for kernel developers. Its
format is not documented intentionally and may change
anytime without any notice.
+
+What:  /sys/block/zram/use_dedup
+Date:  March 2017
+Contact:   Joonsoo Kim 
+Description:
+   The use_dedup file is read-write and specifies deduplication
+   feature is used or not. If enabled, duplicated data is
+   managed by reference count and will not be stored in memory
+   twice. Benefit of this feature largely depends on the workload
+   so keep attention when use.
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 2cdc303..cbbe39b 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -168,6 +168,7 @@ max_comp_streams  RWthe number of possible concurrent 
compress operations
 comp_algorithmRWshow and change the compression algorithm
 compact   WOtrigger memory compaction
 debug_statROthis file is used for zram debugging purposes
+use_dedupRWshow and set deduplication feature
 
 
 User space is advised to use the following files to read the device statistics.
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index b8ecba6..2f3dd1f 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -13,3 +13,17 @@ config ZRAM
  disks and maybe many more.
 
  See zram.txt for more information.
+
+config ZRAM_DEDUP
+   bool "Deduplication support for ZRAM data"
+   depends on ZRAM
+   default n
+   help
+ Deduplicate ZRAM data to reduce amount of memory consumption.
+ Advantage largely depends on the workload. In some cases, this
+ option reduces memory usage to the half. However, if there is no
+ duplicated data, the amount of memory consumption would be
+ increased due to additional metadata usage. And, there is
+ computation time trade-off. Please check the benefit before
+ enabling this option. Experiment shows the positive effect when
+ the zram is used as blockdev and is used to store build output.
diff --git a/drivers/block/zram/Makefile b/drivers/block/zram/Makefile
index 29cb008..1f6fecd 100644
--- a/drivers/block/zram/Makefile
+++ b/drivers/block/zram/Makefile
@@ -1,3 +1,4 @@
-zram-y :=  zcomp.o zram_drv.o zram_dedup.o
+zram-y :=  zcomp.o zram_drv.o
 
-obj-$(CONFIG_ZRAM) +=  zram.o
+obj-$(CONFIG_ZRAM) +=  zram.o
+obj-$(CONFIG_ZRAM_DEDUP)   +=  zram_dedup.o
diff --git a/drivers/block/zram/zram_dedup.c b/drivers/block/zram/zram_dedup.c
index d313fc8..1df1ce1 100644
--- a/drivers/block/zram/zram_dedup.c
+++ b/drivers/block/zram/zram_dedup.c
@@ -27,6 +27,19 @@ u64 zram_dedup_meta_size(struct zram *zram)
return (u64)atomic64_read(>stats.meta_data_size);
 }
 
+static inline bool zram_dedup_enabled(struct zram_meta *meta)
+{
+   return meta->hash;
+}
+
+unsigned long zram_dedup_handle(struct zram *zram, struct zram_entry *entry)
+{
+   if (!zram_dedup_enabled(zram->meta))
+   return (unsigned long)entry;
+
+   return entry->handle;
+}
+
 static u32 zram_dedup_checksum(unsigned char *mem)
 {
return jhash(mem, PAGE_SIZE, 0);
@@ -41,6 +54,9 @@ void zram_dedup_insert(struct zram *zram, struct zram_entry 
*new,
struct rb_node **rb_node, *parent = NULL;
struct zram_entry *entry;
 
+   if (!zram_dedup_enabled(zram->meta))
+   return;
+
new->checksum = checksum;
hash = >hash[checksum % meta->hash_size];
rb_root = >rb_root;
@@ -149,6 +165,9 @@ static struct zram_entry *zram_dedup_get(struct zram *zram,
 

[PATCH v2 4/4] zram: compare all the entries with same checksum for deduplication

2017-03-29 Thread js1304
From: Joonsoo Kim 

Until now, we compare just one entry with same checksum when
checking duplication since it is the simplest way to implement.
However, for the completeness, checking all the entries is better
so this patch implement to compare all the entries with same checksum.
Since this event would be rare so there would be no performance loss.

Signed-off-by: Joonsoo Kim 
---
 drivers/block/zram/zram_dedup.c | 59 +
 1 file changed, 48 insertions(+), 11 deletions(-)

diff --git a/drivers/block/zram/zram_dedup.c b/drivers/block/zram/zram_dedup.c
index 1df1ce1..c4dfd21 100644
--- a/drivers/block/zram/zram_dedup.c
+++ b/drivers/block/zram/zram_dedup.c
@@ -125,6 +125,51 @@ static unsigned long zram_dedup_put(struct zram *zram, 
struct zram_meta *meta,
return refcount;
 }
 
+static struct zram_entry *__zram_dedup_get(struct zram *zram,
+   struct zram_hash *hash, unsigned char *mem,
+   struct zram_entry *entry)
+{
+   struct zram_entry *tmp, *prev = NULL;
+   struct rb_node *rb_node;
+
+   /* find left-most entry with same checksum */
+   while ((rb_node = rb_prev(>rb_node))) {
+   tmp = rb_entry(rb_node, struct zram_entry, rb_node);
+   if (tmp->checksum != entry->checksum)
+   break;
+
+   entry = tmp;
+   }
+
+again:
+   entry->refcount++;
+   atomic64_add(entry->len, >stats.dup_data_size);
+   spin_unlock(>lock);
+
+   if (prev)
+   zram_entry_free(zram, zram->meta, prev);
+
+   if (zram_dedup_match(zram, entry, mem))
+   return entry;
+
+   spin_lock(>lock);
+   tmp = NULL;
+   rb_node = rb_next(>rb_node);
+   if (rb_node)
+   tmp = rb_entry(rb_node, struct zram_entry, rb_node);
+
+   if (tmp && (tmp->checksum == entry->checksum)) {
+   prev = entry;
+   entry = tmp;
+   goto again;
+   }
+
+   spin_unlock(>lock);
+   zram_entry_free(zram, zram->meta, entry);
+
+   return NULL;
+}
+
 static struct zram_entry *zram_dedup_get(struct zram *zram,
unsigned char *mem, u32 checksum)
 {
@@ -139,18 +184,10 @@ static struct zram_entry *zram_dedup_get(struct zram 
*zram,
rb_node = hash->rb_root.rb_node;
while (rb_node) {
entry = rb_entry(rb_node, struct zram_entry, rb_node);
-   if (checksum == entry->checksum) {
-   entry->refcount++;
-   atomic64_add(entry->len, >stats.dup_data_size);
-   spin_unlock(>lock);
-
-   if (zram_dedup_match(zram, entry, mem))
-   return entry;
-
-   zram_entry_free(zram, meta, entry);
 
-   return NULL;
-   }
+   /* lock will be released in the following function */
+   if (checksum == entry->checksum)
+   return __zram_dedup_get(zram, hash, mem, entry);
 
if (checksum < entry->checksum)
rb_node = rb_node->rb_left;
-- 
2.7.4



Re: [PATCH net-next] virtio_net: don't reset twice on XDP on/off

2017-03-29 Thread Jason Wang



On 2017年03月30日 04:14, Michael S. Tsirkin wrote:

We already do a reset once in remove_vq_common -
there appears to be no point in doing another one
when we add/remove XDP.

Signed-off-by: Michael S. Tsirkin 
---
  drivers/net/virtio_net.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index de42e9a..ed8f548 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1810,7 +1810,6 @@ static int virtnet_reset(struct virtnet_info *vi, int 
curr_qp, int xdp_qp)
virtnet_freeze_down(dev);
_remove_vq_common(vi);
  
-	dev->config->reset(dev);

virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
  


Acked-by: Jason Wang 


[PATCH v2 3/4] zram: make deduplication feature optional

2017-03-29 Thread js1304
From: Joonsoo Kim 

Benefit of deduplication is dependent on the workload so it's not
preferable to always enable. Therefore, make it optional in Kconfig
and device param. Default is 'off'. This option will be beneficial
for users who use the zram as blockdev and stores build output to it.

Signed-off-by: Joonsoo Kim 
---
 Documentation/ABI/testing/sysfs-block-zram | 10 +
 Documentation/blockdev/zram.txt|  1 +
 drivers/block/zram/Kconfig | 14 +++
 drivers/block/zram/Makefile|  5 ++-
 drivers/block/zram/zram_dedup.c| 31 ++-
 drivers/block/zram/zram_dedup.h| 33 +++-
 drivers/block/zram/zram_drv.c  | 62 ++
 drivers/block/zram/zram_drv.h  |  1 +
 8 files changed, 146 insertions(+), 11 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block-zram 
b/Documentation/ABI/testing/sysfs-block-zram
index 451b6d8..3c1945f 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -90,3 +90,13 @@ Description:
device's debugging info useful for kernel developers. Its
format is not documented intentionally and may change
anytime without any notice.
+
+What:  /sys/block/zram/use_dedup
+Date:  March 2017
+Contact:   Joonsoo Kim 
+Description:
+   The use_dedup file is read-write and specifies deduplication
+   feature is used or not. If enabled, duplicated data is
+   managed by reference count and will not be stored in memory
+   twice. Benefit of this feature largely depends on the workload
+   so keep attention when use.
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 2cdc303..cbbe39b 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -168,6 +168,7 @@ max_comp_streams  RWthe number of possible concurrent 
compress operations
 comp_algorithmRWshow and change the compression algorithm
 compact   WOtrigger memory compaction
 debug_statROthis file is used for zram debugging purposes
+use_dedupRWshow and set deduplication feature
 
 
 User space is advised to use the following files to read the device statistics.
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index b8ecba6..2f3dd1f 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -13,3 +13,17 @@ config ZRAM
  disks and maybe many more.
 
  See zram.txt for more information.
+
+config ZRAM_DEDUP
+   bool "Deduplication support for ZRAM data"
+   depends on ZRAM
+   default n
+   help
+ Deduplicate ZRAM data to reduce amount of memory consumption.
+ Advantage largely depends on the workload. In some cases, this
+ option reduces memory usage to the half. However, if there is no
+ duplicated data, the amount of memory consumption would be
+ increased due to additional metadata usage. And, there is
+ computation time trade-off. Please check the benefit before
+ enabling this option. Experiment shows the positive effect when
+ the zram is used as blockdev and is used to store build output.
diff --git a/drivers/block/zram/Makefile b/drivers/block/zram/Makefile
index 29cb008..1f6fecd 100644
--- a/drivers/block/zram/Makefile
+++ b/drivers/block/zram/Makefile
@@ -1,3 +1,4 @@
-zram-y :=  zcomp.o zram_drv.o zram_dedup.o
+zram-y :=  zcomp.o zram_drv.o
 
-obj-$(CONFIG_ZRAM) +=  zram.o
+obj-$(CONFIG_ZRAM) +=  zram.o
+obj-$(CONFIG_ZRAM_DEDUP)   +=  zram_dedup.o
diff --git a/drivers/block/zram/zram_dedup.c b/drivers/block/zram/zram_dedup.c
index d313fc8..1df1ce1 100644
--- a/drivers/block/zram/zram_dedup.c
+++ b/drivers/block/zram/zram_dedup.c
@@ -27,6 +27,19 @@ u64 zram_dedup_meta_size(struct zram *zram)
return (u64)atomic64_read(>stats.meta_data_size);
 }
 
+static inline bool zram_dedup_enabled(struct zram_meta *meta)
+{
+   return meta->hash;
+}
+
+unsigned long zram_dedup_handle(struct zram *zram, struct zram_entry *entry)
+{
+   if (!zram_dedup_enabled(zram->meta))
+   return (unsigned long)entry;
+
+   return entry->handle;
+}
+
 static u32 zram_dedup_checksum(unsigned char *mem)
 {
return jhash(mem, PAGE_SIZE, 0);
@@ -41,6 +54,9 @@ void zram_dedup_insert(struct zram *zram, struct zram_entry 
*new,
struct rb_node **rb_node, *parent = NULL;
struct zram_entry *entry;
 
+   if (!zram_dedup_enabled(zram->meta))
+   return;
+
new->checksum = checksum;
hash = >hash[checksum % meta->hash_size];
rb_root = >rb_root;
@@ -149,6 +165,9 @@ static struct zram_entry *zram_dedup_get(struct zram *zram,
 struct zram_entry *zram_dedup_find(struct zram *zram, unsigned char *mem,
  

[PATCH v2 4/4] zram: compare all the entries with same checksum for deduplication

2017-03-29 Thread js1304
From: Joonsoo Kim 

Until now, we compare just one entry with same checksum when
checking duplication since it is the simplest way to implement.
However, for the completeness, checking all the entries is better
so this patch implement to compare all the entries with same checksum.
Since this event would be rare so there would be no performance loss.

Signed-off-by: Joonsoo Kim 
---
 drivers/block/zram/zram_dedup.c | 59 +
 1 file changed, 48 insertions(+), 11 deletions(-)

diff --git a/drivers/block/zram/zram_dedup.c b/drivers/block/zram/zram_dedup.c
index 1df1ce1..c4dfd21 100644
--- a/drivers/block/zram/zram_dedup.c
+++ b/drivers/block/zram/zram_dedup.c
@@ -125,6 +125,51 @@ static unsigned long zram_dedup_put(struct zram *zram, 
struct zram_meta *meta,
return refcount;
 }
 
+static struct zram_entry *__zram_dedup_get(struct zram *zram,
+   struct zram_hash *hash, unsigned char *mem,
+   struct zram_entry *entry)
+{
+   struct zram_entry *tmp, *prev = NULL;
+   struct rb_node *rb_node;
+
+   /* find left-most entry with same checksum */
+   while ((rb_node = rb_prev(>rb_node))) {
+   tmp = rb_entry(rb_node, struct zram_entry, rb_node);
+   if (tmp->checksum != entry->checksum)
+   break;
+
+   entry = tmp;
+   }
+
+again:
+   entry->refcount++;
+   atomic64_add(entry->len, >stats.dup_data_size);
+   spin_unlock(>lock);
+
+   if (prev)
+   zram_entry_free(zram, zram->meta, prev);
+
+   if (zram_dedup_match(zram, entry, mem))
+   return entry;
+
+   spin_lock(>lock);
+   tmp = NULL;
+   rb_node = rb_next(>rb_node);
+   if (rb_node)
+   tmp = rb_entry(rb_node, struct zram_entry, rb_node);
+
+   if (tmp && (tmp->checksum == entry->checksum)) {
+   prev = entry;
+   entry = tmp;
+   goto again;
+   }
+
+   spin_unlock(>lock);
+   zram_entry_free(zram, zram->meta, entry);
+
+   return NULL;
+}
+
 static struct zram_entry *zram_dedup_get(struct zram *zram,
unsigned char *mem, u32 checksum)
 {
@@ -139,18 +184,10 @@ static struct zram_entry *zram_dedup_get(struct zram 
*zram,
rb_node = hash->rb_root.rb_node;
while (rb_node) {
entry = rb_entry(rb_node, struct zram_entry, rb_node);
-   if (checksum == entry->checksum) {
-   entry->refcount++;
-   atomic64_add(entry->len, >stats.dup_data_size);
-   spin_unlock(>lock);
-
-   if (zram_dedup_match(zram, entry, mem))
-   return entry;
-
-   zram_entry_free(zram, meta, entry);
 
-   return NULL;
-   }
+   /* lock will be released in the following function */
+   if (checksum == entry->checksum)
+   return __zram_dedup_get(zram, hash, mem, entry);
 
if (checksum < entry->checksum)
rb_node = rb_node->rb_left;
-- 
2.7.4



Re: [PATCH net-next] virtio_net: don't reset twice on XDP on/off

2017-03-29 Thread Jason Wang



On 2017年03月30日 04:14, Michael S. Tsirkin wrote:

We already do a reset once in remove_vq_common -
there appears to be no point in doing another one
when we add/remove XDP.

Signed-off-by: Michael S. Tsirkin 
---
  drivers/net/virtio_net.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index de42e9a..ed8f548 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1810,7 +1810,6 @@ static int virtnet_reset(struct virtnet_info *vi, int 
curr_qp, int xdp_qp)
virtnet_freeze_down(dev);
_remove_vq_common(vi);
  
-	dev->config->reset(dev);

virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
  


Acked-by: Jason Wang 


[PATCH v2 1/4] zram: introduce zram_entry to prepare dedup functionality

2017-03-29 Thread js1304
From: Joonsoo Kim 

Following patch will implement deduplication functionality
in the zram and it requires an indirection layer to manage
the life cycle of zsmalloc handle. To prepare that, this patch
introduces zram_entry which can be used to manage the life-cycle
of zsmalloc handle. Many lines are changed due to rename but
core change is just simple introduction about newly data structure.

Signed-off-by: Joonsoo Kim 
---
 drivers/block/zram/zram_drv.c | 82 +--
 drivers/block/zram/zram_drv.h |  6 +++-
 2 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0194441..f3949da 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -419,6 +419,32 @@ static DEVICE_ATTR_RO(io_stat);
 static DEVICE_ATTR_RO(mm_stat);
 static DEVICE_ATTR_RO(debug_stat);
 
+static struct zram_entry *zram_entry_alloc(struct zram *zram,
+   unsigned int len, gfp_t flags)
+{
+   struct zram_meta *meta = zram->meta;
+   struct zram_entry *entry;
+
+   entry = kzalloc(sizeof(*entry), flags);
+   if (!entry)
+   return NULL;
+
+   entry->handle = zs_malloc(meta->mem_pool, len, flags);
+   if (!entry->handle) {
+   kfree(entry);
+   return NULL;
+   }
+
+   return entry;
+}
+
+static inline void zram_entry_free(struct zram_meta *meta,
+   struct zram_entry *entry)
+{
+   zs_free(meta->mem_pool, entry->handle);
+   kfree(entry);
+}
+
 static void zram_meta_free(struct zram_meta *meta, u64 disksize)
 {
size_t num_pages = disksize >> PAGE_SHIFT;
@@ -426,15 +452,15 @@ static void zram_meta_free(struct zram_meta *meta, u64 
disksize)
 
/* Free all pages that are still in this zram device */
for (index = 0; index < num_pages; index++) {
-   unsigned long handle = meta->table[index].handle;
+   struct zram_entry *entry = meta->table[index].entry;
/*
 * No memory is allocated for same element filled pages.
 * Simply clear same page flag.
 */
-   if (!handle || zram_test_flag(meta, index, ZRAM_SAME))
+   if (!entry || zram_test_flag(meta, index, ZRAM_SAME))
continue;
 
-   zs_free(meta->mem_pool, handle);
+   zram_entry_free(meta, entry);
}
 
zs_destroy_pool(meta->mem_pool);
@@ -479,7 +505,7 @@ static struct zram_meta *zram_meta_alloc(char *pool_name, 
u64 disksize)
 static void zram_free_page(struct zram *zram, size_t index)
 {
struct zram_meta *meta = zram->meta;
-   unsigned long handle = meta->table[index].handle;
+   struct zram_entry *entry = meta->table[index].entry;
 
/*
 * No memory is allocated for same element filled pages.
@@ -492,16 +518,16 @@ static void zram_free_page(struct zram *zram, size_t 
index)
return;
}
 
-   if (!handle)
+   if (!entry)
return;
 
-   zs_free(meta->mem_pool, handle);
+   zram_entry_free(meta, entry);
 
atomic64_sub(zram_get_obj_size(meta, index),
>stats.compr_data_size);
atomic64_dec(>stats.pages_stored);
 
-   meta->table[index].handle = 0;
+   meta->table[index].entry = NULL;
zram_set_obj_size(meta, index, 0);
 }
 
@@ -510,20 +536,20 @@ static int zram_decompress_page(struct zram *zram, char 
*mem, u32 index)
int ret = 0;
unsigned char *cmem;
struct zram_meta *meta = zram->meta;
-   unsigned long handle;
+   struct zram_entry *entry;
unsigned int size;
 
bit_spin_lock(ZRAM_ACCESS, >table[index].value);
-   handle = meta->table[index].handle;
+   entry = meta->table[index].entry;
size = zram_get_obj_size(meta, index);
 
-   if (!handle || zram_test_flag(meta, index, ZRAM_SAME)) {
+   if (!entry || zram_test_flag(meta, index, ZRAM_SAME)) {
bit_spin_unlock(ZRAM_ACCESS, >table[index].value);
zram_fill_page(mem, PAGE_SIZE, meta->table[index].element);
return 0;
}
 
-   cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO);
+   cmem = zs_map_object(meta->mem_pool, entry->handle, ZS_MM_RO);
if (size == PAGE_SIZE) {
copy_page(mem, cmem);
} else {
@@ -532,7 +558,7 @@ static int zram_decompress_page(struct zram *zram, char 
*mem, u32 index)
ret = zcomp_decompress(zstrm, cmem, size, mem);
zcomp_stream_put(zram->comp);
}
-   zs_unmap_object(meta->mem_pool, handle);
+   zs_unmap_object(meta->mem_pool, entry->handle);
bit_spin_unlock(ZRAM_ACCESS, >table[index].value);
 
/* Should NEVER happen. Return bio error if it does. */
@@ -554,7 

[PATCH v2 2/4] zram: implement deduplication in zram

2017-03-29 Thread js1304
From: Joonsoo Kim 

This patch implements deduplication feature in zram. The purpose
of this work is naturally to save amount of memory usage by zram.

Android is one of the biggest users to use zram as swap and it's
really important to save amount of memory usage. There is a paper
that reports that duplication ratio of Android's memory content is
rather high [1]. And, there is a similar work on zswap that also
reports that experiments has shown that around 10-15% of pages
stored in zswp are duplicates and deduplicate them provides some
benefits [2].

Also, there is a different kind of workload that uses zram as blockdev
and store build outputs into it to reduce wear-out problem of real
blockdev. In this workload, deduplication hit is very high due to
temporary files and intermediate object files. Detailed analysis is
on the bottom of this description.

Anyway, if we can detect duplicated content and avoid to store duplicated
content at different memory space, we can save memory. This patch
tries to do that.

Implementation is almost simple and intuitive but I should note
one thing about implementation detail.

To check duplication, this patch uses checksum of the page and
collision of this checksum could be possible. There would be
many choices to handle this situation but this patch chooses
to allow entry with duplicated checksum to be added to the hash,
but, not to compare all entries with duplicated checksum
when checking duplication. I guess that checksum collision is quite
rare event and we don't need to pay any attention to such a case.
Therefore, I decided the most simplest way to implement the feature.
If there is a different opinion, I can accept and go that way.

Following is the result of this patch.

Test result #1 (Swap):
Android Marshmallow, emulator, x86_64, Backporting to kernel v3.18

orig_data_size: 145297408
compr_data_size: 32408125
mem_used_total: 32276480
dup_data_size: 3188134
meta_data_size: 1444272

Last two metrics added to mm_stat are related to this work.
First one, dup_data_size, is amount of saved memory by avoiding
to store duplicated page. Later one, meta_data_size, is the amount of
data structure to support deduplication. If dup > meta, we can judge
that the patch improves memory usage.

In Adnroid, we can save 5% of memory usage by this work.

Test result #2 (Blockdev):
build the kernel and store output to ext4 FS on zram


Elapsed time: 249 s
mm_stat: 430845952 191014886 196898816 0 196898816 28320 0 0 0


Elapsed time: 250 s
mm_stat: 430505984 190971334 148365312 0 148365312 28404 0 47287038  3945792

There is no performance degradation and save 23% memory.

Test result #3 (Blockdev):
copy android build output dir(out/host) to ext4 FS on zram


Elapsed time: out/host: 88 s
mm_stat: 8834420736 3658184579 3834208256 0 3834208256 32889 0 0 0


Elapsed time: out/host: 100 s
mm_stat: 8832929792 3657329322 2832015360 0 2832015360 32609 0 952568877 
80880336

It shows performance degradation roughly 13% and save 24% memory. Maybe,
it is due to overhead of calculating checksum and comparison.

Test result #4 (Blockdev):
copy android build output dir(out/target/common) to ext4 FS on zram


Elapsed time: out/host: 203 s
mm_stat: 4041678848 2310355010 2346577920 0 2346582016 500 4 0 0


Elapsed time: out/host: 201 s
mm_stat: 4041666560 2310488276 1338150912 0 1338150912 476 0 989088794 24564336

Memory is saved by 42% and performance is the same. Even if there is overhead
of calculating checksum and comparison, large hit ratio compensate it since
hit leads to less compression attempt.

I checked the detailed reason of savings on kernel build workload and
there are some cases that deduplication happens.

1) *.cmd
Build command is usually similar in one directory so content of these file
are very similar. In my system, more than 789 lines in fs/ext4/.namei.o.cmd
and fs/ext4/.inode.o.cmd are the same in 944 and 938 lines of the file,
respectively.

2) intermediate object files
built-in.o and temporary object file have the similar contents. More than
50% of fs/ext4/ext4.o is the same with fs/ext4/built-in.o.

3) vmlinux
.tmp_vmlinux1 and .tmp_vmlinux2 and arch/x86/boo/compressed/vmlinux.bin
have the similar contents.

Android test has similar case that some of object files(.class
and .so) are similar with another ones.
(./host/linux-x86/lib/libartd.so and
./host/linux-x86-lib/libartd-comiler.so)

Anyway, benefit seems to be largely dependent on the workload so
following patch will make this feature optional. However, this feature
can help some usecases so is deserved to be merged.

[1]: MemScope: Analyzing Memory Duplication on Android Systems,
dl.acm.org/citation.cfm?id=2797023
[2]: zswap: Optimize compressed pool memory utilization,
lkml.kernel.org/r/1341407574.7551.1471584870761.JavaMail.weblogic@epwas3p2

Signed-off-by: Joonsoo Kim 
---
 Documentation/blockdev/zram.txt |   2 +
 drivers/block/zram/Makefile |   2 +-
 

[PATCH v2 1/4] zram: introduce zram_entry to prepare dedup functionality

2017-03-29 Thread js1304
From: Joonsoo Kim 

Following patch will implement deduplication functionality
in the zram and it requires an indirection layer to manage
the life cycle of zsmalloc handle. To prepare that, this patch
introduces zram_entry which can be used to manage the life-cycle
of zsmalloc handle. Many lines are changed due to rename but
core change is just simple introduction about newly data structure.

Signed-off-by: Joonsoo Kim 
---
 drivers/block/zram/zram_drv.c | 82 +--
 drivers/block/zram/zram_drv.h |  6 +++-
 2 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0194441..f3949da 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -419,6 +419,32 @@ static DEVICE_ATTR_RO(io_stat);
 static DEVICE_ATTR_RO(mm_stat);
 static DEVICE_ATTR_RO(debug_stat);
 
+static struct zram_entry *zram_entry_alloc(struct zram *zram,
+   unsigned int len, gfp_t flags)
+{
+   struct zram_meta *meta = zram->meta;
+   struct zram_entry *entry;
+
+   entry = kzalloc(sizeof(*entry), flags);
+   if (!entry)
+   return NULL;
+
+   entry->handle = zs_malloc(meta->mem_pool, len, flags);
+   if (!entry->handle) {
+   kfree(entry);
+   return NULL;
+   }
+
+   return entry;
+}
+
+static inline void zram_entry_free(struct zram_meta *meta,
+   struct zram_entry *entry)
+{
+   zs_free(meta->mem_pool, entry->handle);
+   kfree(entry);
+}
+
 static void zram_meta_free(struct zram_meta *meta, u64 disksize)
 {
size_t num_pages = disksize >> PAGE_SHIFT;
@@ -426,15 +452,15 @@ static void zram_meta_free(struct zram_meta *meta, u64 
disksize)
 
/* Free all pages that are still in this zram device */
for (index = 0; index < num_pages; index++) {
-   unsigned long handle = meta->table[index].handle;
+   struct zram_entry *entry = meta->table[index].entry;
/*
 * No memory is allocated for same element filled pages.
 * Simply clear same page flag.
 */
-   if (!handle || zram_test_flag(meta, index, ZRAM_SAME))
+   if (!entry || zram_test_flag(meta, index, ZRAM_SAME))
continue;
 
-   zs_free(meta->mem_pool, handle);
+   zram_entry_free(meta, entry);
}
 
zs_destroy_pool(meta->mem_pool);
@@ -479,7 +505,7 @@ static struct zram_meta *zram_meta_alloc(char *pool_name, 
u64 disksize)
 static void zram_free_page(struct zram *zram, size_t index)
 {
struct zram_meta *meta = zram->meta;
-   unsigned long handle = meta->table[index].handle;
+   struct zram_entry *entry = meta->table[index].entry;
 
/*
 * No memory is allocated for same element filled pages.
@@ -492,16 +518,16 @@ static void zram_free_page(struct zram *zram, size_t 
index)
return;
}
 
-   if (!handle)
+   if (!entry)
return;
 
-   zs_free(meta->mem_pool, handle);
+   zram_entry_free(meta, entry);
 
atomic64_sub(zram_get_obj_size(meta, index),
>stats.compr_data_size);
atomic64_dec(>stats.pages_stored);
 
-   meta->table[index].handle = 0;
+   meta->table[index].entry = NULL;
zram_set_obj_size(meta, index, 0);
 }
 
@@ -510,20 +536,20 @@ static int zram_decompress_page(struct zram *zram, char 
*mem, u32 index)
int ret = 0;
unsigned char *cmem;
struct zram_meta *meta = zram->meta;
-   unsigned long handle;
+   struct zram_entry *entry;
unsigned int size;
 
bit_spin_lock(ZRAM_ACCESS, >table[index].value);
-   handle = meta->table[index].handle;
+   entry = meta->table[index].entry;
size = zram_get_obj_size(meta, index);
 
-   if (!handle || zram_test_flag(meta, index, ZRAM_SAME)) {
+   if (!entry || zram_test_flag(meta, index, ZRAM_SAME)) {
bit_spin_unlock(ZRAM_ACCESS, >table[index].value);
zram_fill_page(mem, PAGE_SIZE, meta->table[index].element);
return 0;
}
 
-   cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO);
+   cmem = zs_map_object(meta->mem_pool, entry->handle, ZS_MM_RO);
if (size == PAGE_SIZE) {
copy_page(mem, cmem);
} else {
@@ -532,7 +558,7 @@ static int zram_decompress_page(struct zram *zram, char 
*mem, u32 index)
ret = zcomp_decompress(zstrm, cmem, size, mem);
zcomp_stream_put(zram->comp);
}
-   zs_unmap_object(meta->mem_pool, handle);
+   zs_unmap_object(meta->mem_pool, entry->handle);
bit_spin_unlock(ZRAM_ACCESS, >table[index].value);
 
/* Should NEVER happen. Return bio error if it does. */
@@ -554,7 +580,7 @@ static int zram_bvec_read(struct zram 

[PATCH v2 2/4] zram: implement deduplication in zram

2017-03-29 Thread js1304
From: Joonsoo Kim 

This patch implements deduplication feature in zram. The purpose
of this work is naturally to save amount of memory usage by zram.

Android is one of the biggest users to use zram as swap and it's
really important to save amount of memory usage. There is a paper
that reports that duplication ratio of Android's memory content is
rather high [1]. And, there is a similar work on zswap that also
reports that experiments has shown that around 10-15% of pages
stored in zswp are duplicates and deduplicate them provides some
benefits [2].

Also, there is a different kind of workload that uses zram as blockdev
and store build outputs into it to reduce wear-out problem of real
blockdev. In this workload, deduplication hit is very high due to
temporary files and intermediate object files. Detailed analysis is
on the bottom of this description.

Anyway, if we can detect duplicated content and avoid to store duplicated
content at different memory space, we can save memory. This patch
tries to do that.

Implementation is almost simple and intuitive but I should note
one thing about implementation detail.

To check duplication, this patch uses checksum of the page and
collision of this checksum could be possible. There would be
many choices to handle this situation but this patch chooses
to allow entry with duplicated checksum to be added to the hash,
but, not to compare all entries with duplicated checksum
when checking duplication. I guess that checksum collision is quite
rare event and we don't need to pay any attention to such a case.
Therefore, I decided the most simplest way to implement the feature.
If there is a different opinion, I can accept and go that way.

Following is the result of this patch.

Test result #1 (Swap):
Android Marshmallow, emulator, x86_64, Backporting to kernel v3.18

orig_data_size: 145297408
compr_data_size: 32408125
mem_used_total: 32276480
dup_data_size: 3188134
meta_data_size: 1444272

Last two metrics added to mm_stat are related to this work.
First one, dup_data_size, is amount of saved memory by avoiding
to store duplicated page. Later one, meta_data_size, is the amount of
data structure to support deduplication. If dup > meta, we can judge
that the patch improves memory usage.

In Adnroid, we can save 5% of memory usage by this work.

Test result #2 (Blockdev):
build the kernel and store output to ext4 FS on zram


Elapsed time: 249 s
mm_stat: 430845952 191014886 196898816 0 196898816 28320 0 0 0


Elapsed time: 250 s
mm_stat: 430505984 190971334 148365312 0 148365312 28404 0 47287038  3945792

There is no performance degradation and save 23% memory.

Test result #3 (Blockdev):
copy android build output dir(out/host) to ext4 FS on zram


Elapsed time: out/host: 88 s
mm_stat: 8834420736 3658184579 3834208256 0 3834208256 32889 0 0 0


Elapsed time: out/host: 100 s
mm_stat: 8832929792 3657329322 2832015360 0 2832015360 32609 0 952568877 
80880336

It shows performance degradation roughly 13% and save 24% memory. Maybe,
it is due to overhead of calculating checksum and comparison.

Test result #4 (Blockdev):
copy android build output dir(out/target/common) to ext4 FS on zram


Elapsed time: out/host: 203 s
mm_stat: 4041678848 2310355010 2346577920 0 2346582016 500 4 0 0


Elapsed time: out/host: 201 s
mm_stat: 4041666560 2310488276 1338150912 0 1338150912 476 0 989088794 24564336

Memory is saved by 42% and performance is the same. Even if there is overhead
of calculating checksum and comparison, large hit ratio compensate it since
hit leads to less compression attempt.

I checked the detailed reason of savings on kernel build workload and
there are some cases that deduplication happens.

1) *.cmd
Build command is usually similar in one directory so content of these file
are very similar. In my system, more than 789 lines in fs/ext4/.namei.o.cmd
and fs/ext4/.inode.o.cmd are the same in 944 and 938 lines of the file,
respectively.

2) intermediate object files
built-in.o and temporary object file have the similar contents. More than
50% of fs/ext4/ext4.o is the same with fs/ext4/built-in.o.

3) vmlinux
.tmp_vmlinux1 and .tmp_vmlinux2 and arch/x86/boo/compressed/vmlinux.bin
have the similar contents.

Android test has similar case that some of object files(.class
and .so) are similar with another ones.
(./host/linux-x86/lib/libartd.so and
./host/linux-x86-lib/libartd-comiler.so)

Anyway, benefit seems to be largely dependent on the workload so
following patch will make this feature optional. However, this feature
can help some usecases so is deserved to be merged.

[1]: MemScope: Analyzing Memory Duplication on Android Systems,
dl.acm.org/citation.cfm?id=2797023
[2]: zswap: Optimize compressed pool memory utilization,
lkml.kernel.org/r/1341407574.7551.1471584870761.JavaMail.weblogic@epwas3p2

Signed-off-by: Joonsoo Kim 
---
 Documentation/blockdev/zram.txt |   2 +
 drivers/block/zram/Makefile |   2 +-
 drivers/block/zram/zram_dedup.c | 222 

[PATCH v2 0/4] zram: implement deduplication in zram

2017-03-29 Thread js1304
From: Joonsoo Kim 

Changes from v1
o reogranize dedup specific functions
o support Kconfig on/off
o update zram documents
o compare all the entries with same checksum (patch #4)

This patchset implements deduplication feature in zram. Motivation
is to save memory usage by zram. There are detailed description
about motivation and experimental results on patch #2 so please
refer it.

Thanks.

Joonsoo Kim (4):
  zram: introduce zram_entry to prepare dedup functionality
  zram: implement deduplication in zram
  zram: make deduplication feature optional
  zram: compare all the entries with same checksum for deduplication

 Documentation/ABI/testing/sysfs-block-zram |  10 +
 Documentation/blockdev/zram.txt|   3 +
 drivers/block/zram/Kconfig |  14 ++
 drivers/block/zram/Makefile|   3 +-
 drivers/block/zram/zram_dedup.c| 288 +
 drivers/block/zram/zram_dedup.h|  56 ++
 drivers/block/zram/zram_drv.c  | 170 +
 drivers/block/zram/zram_drv.h  |  25 ++-
 8 files changed, 535 insertions(+), 34 deletions(-)
 create mode 100644 drivers/block/zram/zram_dedup.c
 create mode 100644 drivers/block/zram/zram_dedup.h

-- 
2.7.4



[PATCH v2 0/4] zram: implement deduplication in zram

2017-03-29 Thread js1304
From: Joonsoo Kim 

Changes from v1
o reogranize dedup specific functions
o support Kconfig on/off
o update zram documents
o compare all the entries with same checksum (patch #4)

This patchset implements deduplication feature in zram. Motivation
is to save memory usage by zram. There are detailed description
about motivation and experimental results on patch #2 so please
refer it.

Thanks.

Joonsoo Kim (4):
  zram: introduce zram_entry to prepare dedup functionality
  zram: implement deduplication in zram
  zram: make deduplication feature optional
  zram: compare all the entries with same checksum for deduplication

 Documentation/ABI/testing/sysfs-block-zram |  10 +
 Documentation/blockdev/zram.txt|   3 +
 drivers/block/zram/Kconfig |  14 ++
 drivers/block/zram/Makefile|   3 +-
 drivers/block/zram/zram_dedup.c| 288 +
 drivers/block/zram/zram_dedup.h|  56 ++
 drivers/block/zram/zram_drv.c  | 170 +
 drivers/block/zram/zram_drv.h  |  25 ++-
 8 files changed, 535 insertions(+), 34 deletions(-)
 create mode 100644 drivers/block/zram/zram_dedup.c
 create mode 100644 drivers/block/zram/zram_dedup.h

-- 
2.7.4



Re: linux-next: build failure after merge of the phy-next tree

2017-03-29 Thread Kishon Vijay Abraham I
Hi,

On Thursday 30 March 2017 09:17 AM, Stephen Rothwell wrote:
> Hi Kishon,
> 
> After merging the phy-next tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> drivers/phy/phy-rockchip-inno-usb2.c:1148:3: error: unknown field 
> 'phy_tuning' specified in initializer
>.phy_tuning = rk3328_usb2phy_tuning,
>^
> drivers/phy/phy-rockchip-inno-usb2.c:1148:17: error: 'rk3328_usb2phy_tuning' 
> undeclared here (not in a function)
>.phy_tuning = rk3328_usb2phy_tuning,
>  ^
> 
> Caused by commit
> 
>   ffa0c278e89c ("phy: rockchip-inno-usb2: add support of u2phy for rk3328")
> 
> I have used the phy-next tree from next-20170329 for today.

Thanks for reporting this. Will fix it in my tree.

-Kishon


Re: linux-next: build failure after merge of the phy-next tree

2017-03-29 Thread Kishon Vijay Abraham I
Hi,

On Thursday 30 March 2017 09:17 AM, Stephen Rothwell wrote:
> Hi Kishon,
> 
> After merging the phy-next tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> drivers/phy/phy-rockchip-inno-usb2.c:1148:3: error: unknown field 
> 'phy_tuning' specified in initializer
>.phy_tuning = rk3328_usb2phy_tuning,
>^
> drivers/phy/phy-rockchip-inno-usb2.c:1148:17: error: 'rk3328_usb2phy_tuning' 
> undeclared here (not in a function)
>.phy_tuning = rk3328_usb2phy_tuning,
>  ^
> 
> Caused by commit
> 
>   ffa0c278e89c ("phy: rockchip-inno-usb2: add support of u2phy for rk3328")
> 
> I have used the phy-next tree from next-20170329 for today.

Thanks for reporting this. Will fix it in my tree.

-Kishon


Re: [PATCH] phy: rockchip-inno-usb2: fix spelling mistake: "connecetd" -> "connected"

2017-03-29 Thread Kishon Vijay Abraham I


On Thursday 23 February 2017 05:00 AM, Colin King wrote:
> From: Colin Ian King 
> 
> trivial fix to spelling mistake in dev_dbg message, also rejoin
> lines to clean up checkpatch warning
> 
> Signed-off-by: Colin Ian King 

merged, thanks.

-Kishon
> ---
>  drivers/phy/phy-rockchip-inno-usb2.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/phy/phy-rockchip-inno-usb2.c 
> b/drivers/phy/phy-rockchip-inno-usb2.c
> index 4ea95c2..761a921 100644
> --- a/drivers/phy/phy-rockchip-inno-usb2.c
> +++ b/drivers/phy/phy-rockchip-inno-usb2.c
> @@ -554,8 +554,7 @@ static void rockchip_usb2phy_otg_sm_work(struct 
> work_struct *work)
>   case USB_CHG_STATE_DETECTED:
>   switch (rphy->chg_type) {
>   case POWER_SUPPLY_TYPE_USB:
> - dev_dbg(>phy->dev,
> - "sdp cable is connecetd\n");
> + dev_dbg(>phy->dev, "sdp cable is 
> connected\n");
>   rockchip_usb2phy_power_on(rport->phy);
>   rport->state = OTG_STATE_B_PERIPHERAL;
>   notify_charger = true;
> @@ -563,16 +562,14 @@ static void rockchip_usb2phy_otg_sm_work(struct 
> work_struct *work)
>   cable = EXTCON_CHG_USB_SDP;
>   break;
>   case POWER_SUPPLY_TYPE_USB_DCP:
> - dev_dbg(>phy->dev,
> - "dcp cable is connecetd\n");
> + dev_dbg(>phy->dev, "dcp cable is 
> connected\n");
>   rockchip_usb2phy_power_off(rport->phy);
>   notify_charger = true;
>   sch_work = true;
>   cable = EXTCON_CHG_USB_DCP;
>   break;
>   case POWER_SUPPLY_TYPE_USB_CDP:
> - dev_dbg(>phy->dev,
> - "cdp cable is connecetd\n");
> + dev_dbg(>phy->dev, "cdp cable is 
> connected\n");
>   rockchip_usb2phy_power_on(rport->phy);
>   rport->state = OTG_STATE_B_PERIPHERAL;
>   notify_charger = true;
> 


Re: [PATCH] phy: rockchip-inno-usb2: fix spelling mistake: "connecetd" -> "connected"

2017-03-29 Thread Kishon Vijay Abraham I


On Thursday 23 February 2017 05:00 AM, Colin King wrote:
> From: Colin Ian King 
> 
> trivial fix to spelling mistake in dev_dbg message, also rejoin
> lines to clean up checkpatch warning
> 
> Signed-off-by: Colin Ian King 

merged, thanks.

-Kishon
> ---
>  drivers/phy/phy-rockchip-inno-usb2.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/phy/phy-rockchip-inno-usb2.c 
> b/drivers/phy/phy-rockchip-inno-usb2.c
> index 4ea95c2..761a921 100644
> --- a/drivers/phy/phy-rockchip-inno-usb2.c
> +++ b/drivers/phy/phy-rockchip-inno-usb2.c
> @@ -554,8 +554,7 @@ static void rockchip_usb2phy_otg_sm_work(struct 
> work_struct *work)
>   case USB_CHG_STATE_DETECTED:
>   switch (rphy->chg_type) {
>   case POWER_SUPPLY_TYPE_USB:
> - dev_dbg(>phy->dev,
> - "sdp cable is connecetd\n");
> + dev_dbg(>phy->dev, "sdp cable is 
> connected\n");
>   rockchip_usb2phy_power_on(rport->phy);
>   rport->state = OTG_STATE_B_PERIPHERAL;
>   notify_charger = true;
> @@ -563,16 +562,14 @@ static void rockchip_usb2phy_otg_sm_work(struct 
> work_struct *work)
>   cable = EXTCON_CHG_USB_SDP;
>   break;
>   case POWER_SUPPLY_TYPE_USB_DCP:
> - dev_dbg(>phy->dev,
> - "dcp cable is connecetd\n");
> + dev_dbg(>phy->dev, "dcp cable is 
> connected\n");
>   rockchip_usb2phy_power_off(rport->phy);
>   notify_charger = true;
>   sch_work = true;
>   cable = EXTCON_CHG_USB_DCP;
>   break;
>   case POWER_SUPPLY_TYPE_USB_CDP:
> - dev_dbg(>phy->dev,
> - "cdp cable is connecetd\n");
> + dev_dbg(>phy->dev, "cdp cable is 
> connected\n");
>   rockchip_usb2phy_power_on(rport->phy);
>   rport->state = OTG_STATE_B_PERIPHERAL;
>   notify_charger = true;
> 


Re: [PATCH v2 2/2] mfd: axp20c-i2c: Select designware i2c-bus driver on x86

2017-03-29 Thread kbuild test robot
Hi Hans,

[auto build test WARNING on ljones-mfd/for-mfd-next]
[also build test WARNING on v4.11-rc4 next-20170329]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Hans-de-Goede/mfd-intel_soc_pmic-Select-designware-i2c-bus-driver/20170330-102517
base:   https://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git for-mfd-next
config: i386-randconfig-x074-201713 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

warning: (MFD_AXP20X_I2C) selects I2C_DESIGNWARE_BAYTRAIL which has unmet 
direct dependencies (I2C && HAS_IOMEM && ACPI && (I2C_DESIGNWARE_PLATFORM=m && 
IOSF_MBI || I2C_DESIGNWARE_PLATFORM=y && IOSF_MBI=y))

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v2 2/2] mfd: axp20c-i2c: Select designware i2c-bus driver on x86

2017-03-29 Thread kbuild test robot
Hi Hans,

[auto build test WARNING on ljones-mfd/for-mfd-next]
[also build test WARNING on v4.11-rc4 next-20170329]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Hans-de-Goede/mfd-intel_soc_pmic-Select-designware-i2c-bus-driver/20170330-102517
base:   https://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git for-mfd-next
config: i386-randconfig-x074-201713 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

warning: (MFD_AXP20X_I2C) selects I2C_DESIGNWARE_BAYTRAIL which has unmet 
direct dependencies (I2C && HAS_IOMEM && ACPI && (I2C_DESIGNWARE_PLATFORM=m && 
IOSF_MBI || I2C_DESIGNWARE_PLATFORM=y && IOSF_MBI=y))

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v2 2/2] mfd: axp20c-i2c: Select designware i2c-bus driver on x86

2017-03-29 Thread kbuild test robot
Hi Hans,

[auto build test ERROR on ljones-mfd/for-mfd-next]
[also build test ERROR on v4.11-rc4 next-20170329]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Hans-de-Goede/mfd-intel_soc_pmic-Select-designware-i2c-bus-driver/20170330-102517
base:   https://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git for-mfd-next
config: i386-randconfig-r0-201713 (attached as .config)
compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `reset_semaphore':
>> i2c-designware-baytrail.c:(.text+0x136c3b): undefined reference to 
>> `iosf_mbi_read'
>> i2c-designware-baytrail.c:(.text+0x136c65): undefined reference to 
>> `iosf_mbi_write'
   drivers/built-in.o: In function `baytrail_i2c_acquire':
   i2c-designware-baytrail.c:(.text+0x136ce6): undefined reference to 
`iosf_mbi_write'
   i2c-designware-baytrail.c:(.text+0x136d26): undefined reference to 
`iosf_mbi_read'
   i2c-designware-baytrail.c:(.text+0x136da1): undefined reference to 
`iosf_mbi_read'
   drivers/built-in.o: In function `i2c_dw_eval_lock_support':
>> (.text+0x136f01): undefined reference to `iosf_mbi_available'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v2 2/2] mfd: axp20c-i2c: Select designware i2c-bus driver on x86

2017-03-29 Thread kbuild test robot
Hi Hans,

[auto build test ERROR on ljones-mfd/for-mfd-next]
[also build test ERROR on v4.11-rc4 next-20170329]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Hans-de-Goede/mfd-intel_soc_pmic-Select-designware-i2c-bus-driver/20170330-102517
base:   https://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git for-mfd-next
config: i386-randconfig-r0-201713 (attached as .config)
compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `reset_semaphore':
>> i2c-designware-baytrail.c:(.text+0x136c3b): undefined reference to 
>> `iosf_mbi_read'
>> i2c-designware-baytrail.c:(.text+0x136c65): undefined reference to 
>> `iosf_mbi_write'
   drivers/built-in.o: In function `baytrail_i2c_acquire':
   i2c-designware-baytrail.c:(.text+0x136ce6): undefined reference to 
`iosf_mbi_write'
   i2c-designware-baytrail.c:(.text+0x136d26): undefined reference to 
`iosf_mbi_read'
   i2c-designware-baytrail.c:(.text+0x136da1): undefined reference to 
`iosf_mbi_read'
   drivers/built-in.o: In function `i2c_dw_eval_lock_support':
>> (.text+0x136f01): undefined reference to `iosf_mbi_available'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH] block: do not put mq context in blk_mq_alloc_request_hctx

2017-03-29 Thread Minchan Kim
In blk_mq_alloc_request_hctx, blk_mq_sched_get_request doesn't
get sw context so we don't need to put the context with
blk_mq_put_ctx. Unless, we will see preempt counter underflow.

Cc: Sagi Grimberg 
Cc: Omar Sandoval 
Cc: Jens Axboe 
Signed-off-by: Minchan Kim 
---

Maybe, it would be fixed by someone but I have noticed preempt counter
undeflow problem a few weeks ago and still see the problem with
linux-next.

 block/blk-mq.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a4546f060e80..a6f3998dc4ee 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -321,7 +321,6 @@ struct request *blk_mq_alloc_request_hctx(struct 
request_queue *q, int rw,
 
rq = blk_mq_sched_get_request(q, NULL, rw, _data);
 
-   blk_mq_put_ctx(alloc_data.ctx);
blk_queue_exit(q);
 
if (!rq)
-- 
2.7.4



Re: [PATCH v2] virtio_net: fix support for small rings

2017-03-29 Thread Jason Wang



On 2017年03月30日 01:42, Michael S. Tsirkin wrote:

When ring size is small (<32 entries) making buffers smaller means a
full ring might not be able to hold enough buffers to fit a single large
packet.

Make sure a ring full of buffers is large enough to allow at least one
packet of max size.

Fixes: 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag 
allocators")
Signed-off-by: Michael S. Tsirkin 
---

changes from v1:
typo fix

  drivers/net/virtio_net.c | 30 ++
  1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9dc31dc..e5f6e34 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -29,6 +29,7 @@
  #include 
  #include 
  #include 
+#include 


This seems unnecessary.

  
  static int napi_weight = NAPI_POLL_WEIGHT;

  module_param(napi_weight, int, 0444);
@@ -98,6 +99,9 @@ struct receive_queue {
/* RX: fragments + linear part + virtio header */
struct scatterlist sg[MAX_SKB_FRAGS + 2];
  
+	/* Min single buffer size for mergeable buffers case. */

+   unsigned int min_buf_len;
+
/* Name of this receive queue: input.$index */
char name[40];
  };
@@ -894,13 +898,14 @@ static int add_recvbuf_big(struct virtnet_info *vi, 
struct receive_queue *rq,
return err;
  }
  
-static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)

+static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
+ struct ewma_pkt_len *avg_pkt_len)
  {
const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
unsigned int len;
  
  	len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),

-   GOOD_PACKET_LEN, PAGE_SIZE - hdr_len);
+   rq->min_buf_len - hdr_len, PAGE_SIZE - hdr_len);
return ALIGN(len, L1_CACHE_BYTES);
  }
  
@@ -914,7 +919,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,

int err;
unsigned int len, hole;
  
-	len = get_mergeable_buf_len(>mrg_avg_pkt_len);

+   len = get_mergeable_buf_len(rq, >mrg_avg_pkt_len);
if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
return -ENOMEM;
  
@@ -2086,6 +2091,21 @@ static void virtnet_del_vqs(struct virtnet_info *vi)

virtnet_free_queues(vi);
  }
  
+/* How large should a single buffer be so a queue full of these can fit at

+ * least one full packet?
+ * Logic below assumes the mergeable buffer header is used.
+ */
+static unsigned int mergeable_min_buf_len(struct virtnet_info *vi, struct 
virtqueue *vq)
+{
+   const unsigned int hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   unsigned int rq_size = virtqueue_get_vring_size(vq);
+   unsigned int packet_len = vi->big_packets ? IP_MAX_MTU : 
vi->dev->max_mtu;
+   unsigned int buf_len = hdr_len + ETH_HLEN + VLAN_HLEN + packet_len;
+   unsigned int min_buf_len = DIV_ROUND_UP(buf_len, rq_size);
+
+   return max(min_buf_len, hdr_len);
+}


Consider rq_size may be large, I think this should be max(min_buf_len, 
GOOD_PACKET_LEN)?



+
  static int virtnet_find_vqs(struct virtnet_info *vi)
  {
vq_callback_t **callbacks;
@@ -2151,6 +2171,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
  
  	for (i = 0; i < vi->max_queue_pairs; i++) {

vi->rq[i].vq = vqs[rxq2vq(i)];
+   vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
vi->sq[i].vq = vqs[txq2vq(i)];
}
  
@@ -2237,7 +2258,8 @@ static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue,
  
  	BUG_ON(queue_index >= vi->max_queue_pairs);

avg = >rq[queue_index].mrg_avg_pkt_len;
-   return sprintf(buf, "%u\n", get_mergeable_buf_len(avg));
+   return sprintf(buf, "%u\n",
+  get_mergeable_buf_len(>rq[queue_index], avg));
  }
  
  static struct rx_queue_attribute mergeable_rx_buffer_size_attribute =




[PATCH] block: do not put mq context in blk_mq_alloc_request_hctx

2017-03-29 Thread Minchan Kim
In blk_mq_alloc_request_hctx, blk_mq_sched_get_request doesn't
get sw context so we don't need to put the context with
blk_mq_put_ctx. Unless, we will see preempt counter underflow.

Cc: Sagi Grimberg 
Cc: Omar Sandoval 
Cc: Jens Axboe 
Signed-off-by: Minchan Kim 
---

Maybe, it would be fixed by someone but I have noticed preempt counter
undeflow problem a few weeks ago and still see the problem with
linux-next.

 block/blk-mq.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a4546f060e80..a6f3998dc4ee 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -321,7 +321,6 @@ struct request *blk_mq_alloc_request_hctx(struct 
request_queue *q, int rw,
 
rq = blk_mq_sched_get_request(q, NULL, rw, _data);
 
-   blk_mq_put_ctx(alloc_data.ctx);
blk_queue_exit(q);
 
if (!rq)
-- 
2.7.4



Re: [PATCH v2] virtio_net: fix support for small rings

2017-03-29 Thread Jason Wang



On 2017年03月30日 01:42, Michael S. Tsirkin wrote:

When ring size is small (<32 entries) making buffers smaller means a
full ring might not be able to hold enough buffers to fit a single large
packet.

Make sure a ring full of buffers is large enough to allow at least one
packet of max size.

Fixes: 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag 
allocators")
Signed-off-by: Michael S. Tsirkin 
---

changes from v1:
typo fix

  drivers/net/virtio_net.c | 30 ++
  1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9dc31dc..e5f6e34 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -29,6 +29,7 @@
  #include 
  #include 
  #include 
+#include 


This seems unnecessary.

  
  static int napi_weight = NAPI_POLL_WEIGHT;

  module_param(napi_weight, int, 0444);
@@ -98,6 +99,9 @@ struct receive_queue {
/* RX: fragments + linear part + virtio header */
struct scatterlist sg[MAX_SKB_FRAGS + 2];
  
+	/* Min single buffer size for mergeable buffers case. */

+   unsigned int min_buf_len;
+
/* Name of this receive queue: input.$index */
char name[40];
  };
@@ -894,13 +898,14 @@ static int add_recvbuf_big(struct virtnet_info *vi, 
struct receive_queue *rq,
return err;
  }
  
-static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)

+static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
+ struct ewma_pkt_len *avg_pkt_len)
  {
const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
unsigned int len;
  
  	len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),

-   GOOD_PACKET_LEN, PAGE_SIZE - hdr_len);
+   rq->min_buf_len - hdr_len, PAGE_SIZE - hdr_len);
return ALIGN(len, L1_CACHE_BYTES);
  }
  
@@ -914,7 +919,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,

int err;
unsigned int len, hole;
  
-	len = get_mergeable_buf_len(>mrg_avg_pkt_len);

+   len = get_mergeable_buf_len(rq, >mrg_avg_pkt_len);
if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
return -ENOMEM;
  
@@ -2086,6 +2091,21 @@ static void virtnet_del_vqs(struct virtnet_info *vi)

virtnet_free_queues(vi);
  }
  
+/* How large should a single buffer be so a queue full of these can fit at

+ * least one full packet?
+ * Logic below assumes the mergeable buffer header is used.
+ */
+static unsigned int mergeable_min_buf_len(struct virtnet_info *vi, struct 
virtqueue *vq)
+{
+   const unsigned int hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   unsigned int rq_size = virtqueue_get_vring_size(vq);
+   unsigned int packet_len = vi->big_packets ? IP_MAX_MTU : 
vi->dev->max_mtu;
+   unsigned int buf_len = hdr_len + ETH_HLEN + VLAN_HLEN + packet_len;
+   unsigned int min_buf_len = DIV_ROUND_UP(buf_len, rq_size);
+
+   return max(min_buf_len, hdr_len);
+}


Consider rq_size may be large, I think this should be max(min_buf_len, 
GOOD_PACKET_LEN)?



+
  static int virtnet_find_vqs(struct virtnet_info *vi)
  {
vq_callback_t **callbacks;
@@ -2151,6 +2171,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
  
  	for (i = 0; i < vi->max_queue_pairs; i++) {

vi->rq[i].vq = vqs[rxq2vq(i)];
+   vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
vi->sq[i].vq = vqs[txq2vq(i)];
}
  
@@ -2237,7 +2258,8 @@ static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue,
  
  	BUG_ON(queue_index >= vi->max_queue_pairs);

avg = >rq[queue_index].mrg_avg_pkt_len;
-   return sprintf(buf, "%u\n", get_mergeable_buf_len(avg));
+   return sprintf(buf, "%u\n",
+  get_mergeable_buf_len(>rq[queue_index], avg));
  }
  
  static struct rx_queue_attribute mergeable_rx_buffer_size_attribute =




Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention

2017-03-29 Thread Ricardo Neri
On Wed, 2017-03-29 at 23:55 +0300, Stas Sergeev wrote:
> 29.03.2017 07:38, Ricardo Neri пишет:
> >> Probably you could also remove
> >> the sldt and str emulation for protected mode, because,
> >> as I understand from this thread, wine does not
> >> need those.
> > I see. I would lean on keeping the emulation because I already
> > implemented it :), for completeness, and because it is performed in a
> > single switch. The bulk of the emulation code deals with operands.
> But this is not for free.
> As Andy said, you will then need a syscall and
> a feature mask to be able to disable this emulation.
> And AFAIK you haven't implemented that yet, so
> there is something to consider.

Right, I see your point.

>  You know the wine's
>  requirements now - they are very small. And
>  dosemu doesn't need anything at all but smsw.
>  And even smsw is very rare.
> >>> But emulation is still needed for SMSW, right?
> >> Likely so.
> >> If you want, I can enable the logging of this command
> >> and see if it is used by some of the DOS programs I have.
> > It would be great if you could do that, if you don't mind.
> OK, scheduled to the week-end.
> I'll let you know.

Thanks!

> 
> >> But at least dosemu implements it, so probably it is needed.
> > Right.
> >
> >> Of course if it is used by one of 100 DOS progs, then there
> >> is an option to just add its support to dosemu2 and pretend
> >> the compatibility problems did not exist. :)
> > Do you mean relaying the GP fault to dosemu instead of trapping it and
> > emulating it in the kernel?
> Yes, that would be optimal if this does not severely break
> the current setups. If we can find out that smsw is not in
> the real use, we can probably do exactly that. 
> But other
> instructions are not in real use in v86 for sure, so I
> wouldn't be adding the explicit test-cases to the kernel
> that will make you depend on some particular behaviour
> that no one may need.

> My objection was that we shouldn't
> write tests before we know exactly how we want this to work.
OK, if only SMSW is used then I'll keep the emulation for SMSW only.




Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention

2017-03-29 Thread Ricardo Neri
On Wed, 2017-03-29 at 23:55 +0300, Stas Sergeev wrote:
> 29.03.2017 07:38, Ricardo Neri пишет:
> >> Probably you could also remove
> >> the sldt and str emulation for protected mode, because,
> >> as I understand from this thread, wine does not
> >> need those.
> > I see. I would lean on keeping the emulation because I already
> > implemented it :), for completeness, and because it is performed in a
> > single switch. The bulk of the emulation code deals with operands.
> But this is not for free.
> As Andy said, you will then need a syscall and
> a feature mask to be able to disable this emulation.
> And AFAIK you haven't implemented that yet, so
> there is something to consider.

Right, I see your point.

>  You know the wine's
>  requirements now - they are very small. And
>  dosemu doesn't need anything at all but smsw.
>  And even smsw is very rare.
> >>> But emulation is still needed for SMSW, right?
> >> Likely so.
> >> If you want, I can enable the logging of this command
> >> and see if it is used by some of the DOS programs I have.
> > It would be great if you could do that, if you don't mind.
> OK, scheduled to the week-end.
> I'll let you know.

Thanks!

> 
> >> But at least dosemu implements it, so probably it is needed.
> > Right.
> >
> >> Of course if it is used by one of 100 DOS progs, then there
> >> is an option to just add its support to dosemu2 and pretend
> >> the compatibility problems did not exist. :)
> > Do you mean relaying the GP fault to dosemu instead of trapping it and
> > emulating it in the kernel?
> Yes, that would be optimal if this does not severely break
> the current setups. If we can find out that smsw is not in
> the real use, we can probably do exactly that. 
> But other
> instructions are not in real use in v86 for sure, so I
> wouldn't be adding the explicit test-cases to the kernel
> that will make you depend on some particular behaviour
> that no one may need.

> My objection was that we shouldn't
> write tests before we know exactly how we want this to work.
OK, if only SMSW is used then I'll keep the emulation for SMSW only.




Re: [PATCH 0/8] fujitsu-laptop: use sparse keymaps for input event handling

2017-03-29 Thread Darren Hart
On Thu, Mar 30, 2017 at 02:26:26PM +1030, Jonathan Woithe wrote:
> On Wed, Mar 29, 2017 at 08:36:50PM -0700, Darren Hart wrote:
> > On Wed, Mar 29, 2017 at 07:35:50PM +0300, Andy Shevchenko wrote:
> > > On Wed, Mar 29, 2017 at 10:19 AM, Micha?? K??pie??  
> > > wrote:
> > > 
> > > > Darren, Andy, in light of the above I will be awaiting your review of
> > > > this series.  I will submit v2 afterwards, with all remarks from both
> > > > you and Jonathan taken into account.
> > > 
> > > Darren marked this series under his name to review, so, I let him to
> > > speak for us.
> > 
> > The series looks good to me. Nice work Micha??. They are logically divided 
> > and
> > address issues in a procedural way (so I stopped commenting until I read the
> > full series through as a couple of times you addressed a concern from a 
> > move in
> > a cleanup to follow).
> > 
> > I've applied the noted change to 7/8 and will run this through my tests, but
> > don't anticipate any problems. Jonathan, if you don't have any additional
> > concerns, let me know if I can add your Reviewed-by.
> 
> With the noted change to 7/8 applied I'm happy with the resulting series. 
> As you noted, there is still some scope for making things more consistent,
> especially with regard to error handling.  However, that is really a
> separate task which can be addressed in a later series.  This present series
> doesn't impact on this issue in any significant way so it makes sense that
> be applied.
> 
> Reviewed-by: Jonathan Woithe 

Merged, thanks!

-- 
Darren Hart
VMware Open Source Technology Center


Re: [PATCH 0/8] fujitsu-laptop: use sparse keymaps for input event handling

2017-03-29 Thread Darren Hart
On Thu, Mar 30, 2017 at 02:26:26PM +1030, Jonathan Woithe wrote:
> On Wed, Mar 29, 2017 at 08:36:50PM -0700, Darren Hart wrote:
> > On Wed, Mar 29, 2017 at 07:35:50PM +0300, Andy Shevchenko wrote:
> > > On Wed, Mar 29, 2017 at 10:19 AM, Micha?? K??pie??  
> > > wrote:
> > > 
> > > > Darren, Andy, in light of the above I will be awaiting your review of
> > > > this series.  I will submit v2 afterwards, with all remarks from both
> > > > you and Jonathan taken into account.
> > > 
> > > Darren marked this series under his name to review, so, I let him to
> > > speak for us.
> > 
> > The series looks good to me. Nice work Micha??. They are logically divided 
> > and
> > address issues in a procedural way (so I stopped commenting until I read the
> > full series through as a couple of times you addressed a concern from a 
> > move in
> > a cleanup to follow).
> > 
> > I've applied the noted change to 7/8 and will run this through my tests, but
> > don't anticipate any problems. Jonathan, if you don't have any additional
> > concerns, let me know if I can add your Reviewed-by.
> 
> With the noted change to 7/8 applied I'm happy with the resulting series. 
> As you noted, there is still some scope for making things more consistent,
> especially with regard to error handling.  However, that is really a
> separate task which can be addressed in a later series.  This present series
> doesn't impact on this issue in any significant way so it makes sense that
> be applied.
> 
> Reviewed-by: Jonathan Woithe 

Merged, thanks!

-- 
Darren Hart
VMware Open Source Technology Center


Re: [PATCH v6 0/4] Broadcom SBA RAID support

2017-03-29 Thread Vinod Koul
On Wed, Mar 29, 2017 at 11:35:43AM +0530, Anup Patel wrote:
> On Tue, Mar 21, 2017 at 2:48 PM, Vinod Koul  wrote:
> > On Tue, Mar 21, 2017 at 02:17:21PM +0530, Anup Patel wrote:
> >> On Tue, Mar 21, 2017 at 2:00 PM, Vinod Koul  wrote:
> >> > On Mon, Mar 06, 2017 at 03:13:24PM +0530, Anup Patel wrote:
> >> >> The Broadcom SBA RAID is a stream-based device which provides
> >> >> RAID5/6 offload.
> >> >>
> >> >> It requires a SoC specific ring manager (such as Broadcom FlexRM
> >> >> ring manager) to provide ring-based programming interface. Due to
> >> >> this, the Broadcom SBA RAID driver (mailbox client) implements
> >> >> DMA device having one DMA channel using a set of mailbox channels
> >> >> provided by Broadcom SoC specific ring manager driver (mailbox
> >> >> controller).
> >> >>
> >> >> The Broadcom SBA RAID hardware requires PQ disk position instead
> >> >> of PQ disk coefficient. To address this, we have added raid_gflog
> >> >> table which will help driver to convert PQ disk coefficient to PQ
> >> >> disk position.
> >> >>
> >> >> This patchset is based on Linux-4.11-rc1 and depends on patchset
> >> >> "[PATCH v5 0/2] Broadcom FlexRM ring manager support"
> >> >
> >> > Okay I applied and was about to push when I noticed this :(
> >> >
> >> > So what is the status of this..?
> >>
> >> PATCH2 is Acked but PATCH1 is under-review. Currently, its
> >> v6 of that patchset.
> >>
> >> The only dependency on that patchset is the changes in
> >> brcm-message.h which are required by this BCM-SBA-RAID
> >> driver.
> >>
> >> @Jassi,
> >> Can you please have a look at PATCH v6?
> >
> > And I would need an immutable branch/tag once merged. I am going to keep
> > this series pending till then.
> 
> The Broadcom FlexRM patchset is pickedup by Jassi and
> can be found in mailbox-for-next branch of
> git://git.linaro.org/landing-teams/working/fujitsu/integration
> 
> Both patchset (Broadcom FlexRM patchset and this one) are
> also available in sba-raid-v7 branch of
> https://github.com/Broadcom/arm64-linux.git

Jassi,

Can you provide an immutable branch/tag please for this, latter is
preferred.

Btw didn't find your tree in MAINTAINERS..

> 
> Regards,
> Anup

-- 
~Vinod


Re: [PATCH v6 0/4] Broadcom SBA RAID support

2017-03-29 Thread Vinod Koul
On Wed, Mar 29, 2017 at 11:35:43AM +0530, Anup Patel wrote:
> On Tue, Mar 21, 2017 at 2:48 PM, Vinod Koul  wrote:
> > On Tue, Mar 21, 2017 at 02:17:21PM +0530, Anup Patel wrote:
> >> On Tue, Mar 21, 2017 at 2:00 PM, Vinod Koul  wrote:
> >> > On Mon, Mar 06, 2017 at 03:13:24PM +0530, Anup Patel wrote:
> >> >> The Broadcom SBA RAID is a stream-based device which provides
> >> >> RAID5/6 offload.
> >> >>
> >> >> It requires a SoC specific ring manager (such as Broadcom FlexRM
> >> >> ring manager) to provide ring-based programming interface. Due to
> >> >> this, the Broadcom SBA RAID driver (mailbox client) implements
> >> >> DMA device having one DMA channel using a set of mailbox channels
> >> >> provided by Broadcom SoC specific ring manager driver (mailbox
> >> >> controller).
> >> >>
> >> >> The Broadcom SBA RAID hardware requires PQ disk position instead
> >> >> of PQ disk coefficient. To address this, we have added raid_gflog
> >> >> table which will help driver to convert PQ disk coefficient to PQ
> >> >> disk position.
> >> >>
> >> >> This patchset is based on Linux-4.11-rc1 and depends on patchset
> >> >> "[PATCH v5 0/2] Broadcom FlexRM ring manager support"
> >> >
> >> > Okay I applied and was about to push when I noticed this :(
> >> >
> >> > So what is the status of this..?
> >>
> >> PATCH2 is Acked but PATCH1 is under-review. Currently, its
> >> v6 of that patchset.
> >>
> >> The only dependency on that patchset is the changes in
> >> brcm-message.h which are required by this BCM-SBA-RAID
> >> driver.
> >>
> >> @Jassi,
> >> Can you please have a look at PATCH v6?
> >
> > And I would need an immutable branch/tag once merged. I am going to keep
> > this series pending till then.
> 
> The Broadcom FlexRM patchset is pickedup by Jassi and
> can be found in mailbox-for-next branch of
> git://git.linaro.org/landing-teams/working/fujitsu/integration
> 
> Both patchset (Broadcom FlexRM patchset and this one) are
> also available in sba-raid-v7 branch of
> https://github.com/Broadcom/arm64-linux.git

Jassi,

Can you provide an immutable branch/tag please for this, latter is
preferred.

Btw didn't find your tree in MAINTAINERS..

> 
> Regards,
> Anup

-- 
~Vinod


Re: [PATCH v2 0/8] thermal: ti-soc-thermal: Migrate slope/offset data to device tree

2017-03-29 Thread Eduardo Valentin
On Thu, Mar 30, 2017 at 08:59:31AM +0530, Keerthy wrote:
> 
> 
> On Wednesday 29 March 2017 10:07 AM, Eduardo Valentin wrote:
> > Keerthy,
> > 
> > On Fri, Mar 24, 2017 at 07:26:10AM -0700, Tony Lindgren wrote:
> >> * Keerthy  [170323 20:29]:
> >>>
> >>>
> >>> On Friday 24 March 2017 02:22 AM, Tony Lindgren wrote:
>  * Keerthy  [170321 20:45]:
> >
> >
> > On Thursday 09 March 2017 01:35 PM, Keerthy wrote:
> >> Currently the slope and offset values for calculating the
> >> hot spot temperature of a particular thermal zone is part
> >> of driver data. Pass them here instead and obtain the values
> >> while of node parsing.
> >>
> >> Tested for the slope and constant values on DRA7-EVM, OMAP3-BEAGLE. 
> > 
> > Have you tried on boards that need negative coefficients?
> > 
> > https://patchwork.kernel.org/patch/9619577/
> 
> Yes. I retrieved the negative values nicely in the driver passed via
> Device Tree.
> 
> > 
> >
> > Hi Eduardo,
> >
> > If the series looks okay could you please pull this?
> 
>  Also.. Are the dts changes safe for me to pick separately?
> >>>
> >>> Yes Tony they are safe to pulled separately.
> >>
> >> OK applying patches 1 - 5 of this series into omap-for-v4.12/dt-v2.
> > 
> > Keerthy,
> > 
> > The only thing I want you to confirm is if you are really getting the
> > negative coefficients, because currently of-thermal reads the array
> > using an OF helper that understands only unsigned. For this reason, I
> > will be queueing your patches only for next merge window, not as a fix,
> > not for rc's.
> 
> I am getting negative co-efficients.

OK. That is odd. Might be simply we get still converted after simply
assigning the unsigned to the int in the thermal device data struct.
Anyways, still not for rc, given the amount of changes in the driver.

> 
> > 
> > 
> >>
> >> Thanks,
> >>
> >> Tony


Re: [PATCH v2 0/8] thermal: ti-soc-thermal: Migrate slope/offset data to device tree

2017-03-29 Thread Eduardo Valentin
On Thu, Mar 30, 2017 at 08:59:31AM +0530, Keerthy wrote:
> 
> 
> On Wednesday 29 March 2017 10:07 AM, Eduardo Valentin wrote:
> > Keerthy,
> > 
> > On Fri, Mar 24, 2017 at 07:26:10AM -0700, Tony Lindgren wrote:
> >> * Keerthy  [170323 20:29]:
> >>>
> >>>
> >>> On Friday 24 March 2017 02:22 AM, Tony Lindgren wrote:
>  * Keerthy  [170321 20:45]:
> >
> >
> > On Thursday 09 March 2017 01:35 PM, Keerthy wrote:
> >> Currently the slope and offset values for calculating the
> >> hot spot temperature of a particular thermal zone is part
> >> of driver data. Pass them here instead and obtain the values
> >> while of node parsing.
> >>
> >> Tested for the slope and constant values on DRA7-EVM, OMAP3-BEAGLE. 
> > 
> > Have you tried on boards that need negative coefficients?
> > 
> > https://patchwork.kernel.org/patch/9619577/
> 
> Yes. I retrieved the negative values nicely in the driver passed via
> Device Tree.
> 
> > 
> >
> > Hi Eduardo,
> >
> > If the series looks okay could you please pull this?
> 
>  Also.. Are the dts changes safe for me to pick separately?
> >>>
> >>> Yes Tony they are safe to pulled separately.
> >>
> >> OK applying patches 1 - 5 of this series into omap-for-v4.12/dt-v2.
> > 
> > Keerthy,
> > 
> > The only thing I want you to confirm is if you are really getting the
> > negative coefficients, because currently of-thermal reads the array
> > using an OF helper that understands only unsigned. For this reason, I
> > will be queueing your patches only for next merge window, not as a fix,
> > not for rc's.
> 
> I am getting negative co-efficients.

OK. That is odd. Might be simply we get still converted after simply
assigning the unsigned to the int in the thermal device data struct.
Anyways, still not for rc, given the amount of changes in the driver.

> 
> > 
> > 
> >>
> >> Thanks,
> >>
> >> Tony


Re: [PATCH -mm -v7 4/9] mm, THP, swap: Add get_huge_swap_page()

2017-03-29 Thread Huang, Ying
Johannes Weiner  writes:

> On Tue, Mar 28, 2017 at 01:32:04PM +0800, Huang, Ying wrote:
>> @@ -527,6 +527,23 @@ static inline swp_entry_t get_swap_page(void)
>>  
>>  #endif /* CONFIG_SWAP */
>>  
>> +#ifdef CONFIG_THP_SWAP_CLUSTER
>> +static inline swp_entry_t get_huge_swap_page(void)
>> +{
>> +swp_entry_t entry;
>> +
>> +if (get_swap_pages(1, , true))
>> +return entry;
>> +else
>> +return (swp_entry_t) {0};
>> +}
>> +#else
>> +static inline swp_entry_t get_huge_swap_page(void)
>> +{
>> +return (swp_entry_t) {0};
>> +}
>> +#endif
>
> Your introducing a function without a user, making it very hard to
> judge whether the API is well-designed for the callers or not.
>
> I pointed this out as a systemic problem with this patch series in v3,
> along with other stuff, but with the way this series is structured I'm
> having a hard time seeing whether you implemented my other feedback or
> whether your counter arguments to them are justified.
>
> I cannot review and ack these patches this way.

Sorry for inconvenience, I will send a new version to combine the
function definition and usage into one patch at least for you to
review.  But I think we can continue our discussion in the comments your
raised so far firstly, what do you think about that?

Best Regards,
Huang, Ying


Re: [PATCH -mm -v7 4/9] mm, THP, swap: Add get_huge_swap_page()

2017-03-29 Thread Huang, Ying
Johannes Weiner  writes:

> On Tue, Mar 28, 2017 at 01:32:04PM +0800, Huang, Ying wrote:
>> @@ -527,6 +527,23 @@ static inline swp_entry_t get_swap_page(void)
>>  
>>  #endif /* CONFIG_SWAP */
>>  
>> +#ifdef CONFIG_THP_SWAP_CLUSTER
>> +static inline swp_entry_t get_huge_swap_page(void)
>> +{
>> +swp_entry_t entry;
>> +
>> +if (get_swap_pages(1, , true))
>> +return entry;
>> +else
>> +return (swp_entry_t) {0};
>> +}
>> +#else
>> +static inline swp_entry_t get_huge_swap_page(void)
>> +{
>> +return (swp_entry_t) {0};
>> +}
>> +#endif
>
> Your introducing a function without a user, making it very hard to
> judge whether the API is well-designed for the callers or not.
>
> I pointed this out as a systemic problem with this patch series in v3,
> along with other stuff, but with the way this series is structured I'm
> having a hard time seeing whether you implemented my other feedback or
> whether your counter arguments to them are justified.
>
> I cannot review and ack these patches this way.

Sorry for inconvenience, I will send a new version to combine the
function definition and usage into one patch at least for you to
review.  But I think we can continue our discussion in the comments your
raised so far firstly, what do you think about that?

Best Regards,
Huang, Ying


Re: [BUG nohz]: wrong user and system time accounting

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 16:08 -0400, Rik van Riel wrote:

> In other words, the tick on cpu0 is aligned
> with the tick on the nohz_full cpus, and
> jiffies is advanced while the nohz_full cpus
> with an active tick happen to be in kernel
> mode?

You really want skew_tick=1, especially on big boxen.
 
> Frederic, can you think of any reason why
> the tick on nohz_full CPUs would end up aligned
> with the tick on cpu0, instead of running at some
> random offset?

(I or low rq->clock bits as crude NOHZ collision avoidance)

> A random offset, or better yet a somewhat randomized
> tick length to make sure that simultaneous ticks are
> fairly rare and the vtime sampling does not end up
> "in phase" with the jiffies incrementing, could make
> the accounting work right again.

That improves jitter, especially on big boxen.  I have an 8 socket box
that thinks it's an extra large PC, there, collision avoidance matters
hugely.  I couldn't reproduce bean counting woes, no idea if collision
avoidance will help that.

-Mike


Re: [BUG nohz]: wrong user and system time accounting

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 16:08 -0400, Rik van Riel wrote:

> In other words, the tick on cpu0 is aligned
> with the tick on the nohz_full cpus, and
> jiffies is advanced while the nohz_full cpus
> with an active tick happen to be in kernel
> mode?

You really want skew_tick=1, especially on big boxen.
 
> Frederic, can you think of any reason why
> the tick on nohz_full CPUs would end up aligned
> with the tick on cpu0, instead of running at some
> random offset?

(I or low rq->clock bits as crude NOHZ collision avoidance)

> A random offset, or better yet a somewhat randomized
> tick length to make sure that simultaneous ticks are
> fairly rare and the vtime sampling does not end up
> "in phase" with the jiffies incrementing, could make
> the accounting work right again.

That improves jitter, especially on big boxen.  I have an 8 socket box
that thinks it's an extra large PC, there, collision avoidance matters
hugely.  I couldn't reproduce bean counting woes, no idea if collision
avoidance will help that.

-Mike


linux-next: build failure after merge of the vhost tree

2017-03-29 Thread Stephen Rothwell
Hi Michael,

After merging the vhost tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

drivers/remoteproc/remoteproc_virtio.c: In function 'rproc_virtio_find_vqs':
drivers/remoteproc/remoteproc_virtio.c:148:9: error: 'ctx' undeclared (first 
use in this function)
 ctx ? ctx[i] : false);
 ^

Caused by commit

  a965e977a103 ("virtio: add context flag to find vqs")

I have used the vhost tree from next-20170329 for today.

-- 
Cheers,
Stephen Rothwell


linux-next: build failure after merge of the vhost tree

2017-03-29 Thread Stephen Rothwell
Hi Michael,

After merging the vhost tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

drivers/remoteproc/remoteproc_virtio.c: In function 'rproc_virtio_find_vqs':
drivers/remoteproc/remoteproc_virtio.c:148:9: error: 'ctx' undeclared (first 
use in this function)
 ctx ? ctx[i] : false);
 ^

Caused by commit

  a965e977a103 ("virtio: add context flag to find vqs")

I have used the vhost tree from next-20170329 for today.

-- 
Cheers,
Stephen Rothwell


Re: [PATCH -mm -v7 9/9] mm, THP, swap: Delay splitting THP during swap out

2017-03-29 Thread Huang, Ying
Johannes Weiner  writes:

> On Tue, Mar 28, 2017 at 01:32:09PM +0800, Huang, Ying wrote:
>> @@ -183,12 +184,53 @@ void __delete_from_swap_cache(struct page *page)
>>  ADD_CACHE_INFO(del_total, nr);
>>  }
>>  
>> +#ifdef CONFIG_THP_SWAP_CLUSTER
>> +int add_to_swap_trans_huge(struct page *page, struct list_head *list)
>> +{
>> +swp_entry_t entry;
>> +int ret = 0;
>> +
>> +/* cannot split, which may be needed during swap in, skip it */
>> +if (!can_split_huge_page(page, NULL))
>> +return -EBUSY;
>> +/* fallback to split huge page firstly if no PMD map */
>> +if (!compound_mapcount(page))
>> +return 0;
>> +entry = get_huge_swap_page();
>> +if (!entry.val)
>> +return 0;
>> +if (mem_cgroup_try_charge_swap(page, entry, HPAGE_PMD_NR)) {
>> +__swapcache_free(entry, true);
>> +return -EOVERFLOW;
>> +}
>> +ret = add_to_swap_cache(page, entry,
>> +__GFP_HIGH | __GFP_NOMEMALLOC|__GFP_NOWARN);
>> +/* -ENOMEM radix-tree allocation failure */
>> +if (ret) {
>> +__swapcache_free(entry, true);
>> +return 0;
>> +}
>> +ret = split_huge_page_to_list(page, list);
>> +if (ret) {
>> +delete_from_swap_cache(page);
>> +return -EBUSY;
>> +}
>> +return 1;
>> +}
>> +#else
>> +static inline int add_to_swap_trans_huge(struct page *page,
>> + struct list_head *list)
>> +{
>> +return 0;
>> +}
>> +#endif
>> +
>>  /**
>>   * add_to_swap - allocate swap space for a page
>>   * @page: page we want to move to swap
>>   *
>>   * Allocate swap space for the page and add the page to the
>> - * swap cache.  Caller needs to hold the page lock. 
>> + * swap cache.  Caller needs to hold the page lock.
>>   */
>>  int add_to_swap(struct page *page, struct list_head *list)
>>  {
>> @@ -198,6 +240,18 @@ int add_to_swap(struct page *page, struct list_head 
>> *list)
>>  VM_BUG_ON_PAGE(!PageLocked(page), page);
>>  VM_BUG_ON_PAGE(!PageUptodate(page), page);
>>  
>> +if (unlikely(PageTransHuge(page))) {
>> +err = add_to_swap_trans_huge(page, list);
>> +switch (err) {
>> +case 1:
>> +return 1;
>> +case 0:
>> +/* fallback to split firstly if return 0 */
>> +break;
>> +default:
>> +return 0;
>> +}
>> +}
>>  entry = get_swap_page();
>>  if (!entry.val)
>>  return 0;
>
> add_to_swap_trans_huge() is too close a copy of add_to_swap(), which
> makes the code error prone for future modifications to the swap slot
> allocation protocol.
>
> This should read:
>
> retry:
>   entry = get_swap_page(page);
>   if (!entry.val) {
>   if (PageTransHuge(page)) {
>   split_huge_page_to_list(page, list);
>   goto retry;
>   }
>   return 0;
>   }

If the swap space is used up, that is, get_swap_page() cannot allocate
even 1 swap entry for a normal page.  We will split THP unnecessarily
with the change, but in the original code, we just skip the THP.  There
may be a performance regression here.  Similar problem exists for
mem_cgroup_try_charge_swap() too.  If the mem cgroup exceeds the swap
limit, the THP will be split unnecessary with the change too.

> And get_swap_page(), mem_cgroup_try_charge_swap() etc. should all
> check PageTransHuge() instead of having extra parameters or separate
> code paths for the huge page case.
>
> In general, don't try to tack this feature onto the side of the
> VM. Because right now, this looks a bit like the hugetlb code, with
> one big branch in the beginning that opens up an alternate
> reality. Instead, these functions should handle THP all the way down
> the stack, and without passing down redundant information.

Yes.  We should share the code as much as possible.  I just have some
questions as above.  Could you help me on that?

Best Regards,
Huang, Ying


Re: [PATCH -mm -v7 9/9] mm, THP, swap: Delay splitting THP during swap out

2017-03-29 Thread Huang, Ying
Johannes Weiner  writes:

> On Tue, Mar 28, 2017 at 01:32:09PM +0800, Huang, Ying wrote:
>> @@ -183,12 +184,53 @@ void __delete_from_swap_cache(struct page *page)
>>  ADD_CACHE_INFO(del_total, nr);
>>  }
>>  
>> +#ifdef CONFIG_THP_SWAP_CLUSTER
>> +int add_to_swap_trans_huge(struct page *page, struct list_head *list)
>> +{
>> +swp_entry_t entry;
>> +int ret = 0;
>> +
>> +/* cannot split, which may be needed during swap in, skip it */
>> +if (!can_split_huge_page(page, NULL))
>> +return -EBUSY;
>> +/* fallback to split huge page firstly if no PMD map */
>> +if (!compound_mapcount(page))
>> +return 0;
>> +entry = get_huge_swap_page();
>> +if (!entry.val)
>> +return 0;
>> +if (mem_cgroup_try_charge_swap(page, entry, HPAGE_PMD_NR)) {
>> +__swapcache_free(entry, true);
>> +return -EOVERFLOW;
>> +}
>> +ret = add_to_swap_cache(page, entry,
>> +__GFP_HIGH | __GFP_NOMEMALLOC|__GFP_NOWARN);
>> +/* -ENOMEM radix-tree allocation failure */
>> +if (ret) {
>> +__swapcache_free(entry, true);
>> +return 0;
>> +}
>> +ret = split_huge_page_to_list(page, list);
>> +if (ret) {
>> +delete_from_swap_cache(page);
>> +return -EBUSY;
>> +}
>> +return 1;
>> +}
>> +#else
>> +static inline int add_to_swap_trans_huge(struct page *page,
>> + struct list_head *list)
>> +{
>> +return 0;
>> +}
>> +#endif
>> +
>>  /**
>>   * add_to_swap - allocate swap space for a page
>>   * @page: page we want to move to swap
>>   *
>>   * Allocate swap space for the page and add the page to the
>> - * swap cache.  Caller needs to hold the page lock. 
>> + * swap cache.  Caller needs to hold the page lock.
>>   */
>>  int add_to_swap(struct page *page, struct list_head *list)
>>  {
>> @@ -198,6 +240,18 @@ int add_to_swap(struct page *page, struct list_head 
>> *list)
>>  VM_BUG_ON_PAGE(!PageLocked(page), page);
>>  VM_BUG_ON_PAGE(!PageUptodate(page), page);
>>  
>> +if (unlikely(PageTransHuge(page))) {
>> +err = add_to_swap_trans_huge(page, list);
>> +switch (err) {
>> +case 1:
>> +return 1;
>> +case 0:
>> +/* fallback to split firstly if return 0 */
>> +break;
>> +default:
>> +return 0;
>> +}
>> +}
>>  entry = get_swap_page();
>>  if (!entry.val)
>>  return 0;
>
> add_to_swap_trans_huge() is too close a copy of add_to_swap(), which
> makes the code error prone for future modifications to the swap slot
> allocation protocol.
>
> This should read:
>
> retry:
>   entry = get_swap_page(page);
>   if (!entry.val) {
>   if (PageTransHuge(page)) {
>   split_huge_page_to_list(page, list);
>   goto retry;
>   }
>   return 0;
>   }

If the swap space is used up, that is, get_swap_page() cannot allocate
even 1 swap entry for a normal page.  We will split THP unnecessarily
with the change, but in the original code, we just skip the THP.  There
may be a performance regression here.  Similar problem exists for
mem_cgroup_try_charge_swap() too.  If the mem cgroup exceeds the swap
limit, the THP will be split unnecessary with the change too.

> And get_swap_page(), mem_cgroup_try_charge_swap() etc. should all
> check PageTransHuge() instead of having extra parameters or separate
> code paths for the huge page case.
>
> In general, don't try to tack this feature onto the side of the
> VM. Because right now, this looks a bit like the hugetlb code, with
> one big branch in the beginning that opens up an alternate
> reality. Instead, these functions should handle THP all the way down
> the stack, and without passing down redundant information.

Yes.  We should share the code as much as possible.  I just have some
questions as above.  Could you help me on that?

Best Regards,
Huang, Ying


Re: [PATCH v2] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable

2017-03-29 Thread Fam Zheng
On Wed, 03/29 22:37, Martin K. Petersen wrote:
> Fam Zheng  writes:
> 
> Fam,
> 
> > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> > index fcfeddc..a5c7e67 100644
> > --- a/drivers/scsi/sd.c
> > +++ b/drivers/scsi/sd.c
> > @@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
> > rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks);
> > } else
> > rw_max = BLK_DEF_MAX_SECTORS;
> > +   rw_max = min_not_zero(rw_max, logical_to_sectors(sdp, dev_max));
> >  
> > /* Combine with controller limits */
> > q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q));
> 
> Instead of updating rw_max twice, how about:
> 
> } else
>   rw_max = min_not_zero(logical_to_sectors(sdp, dev_max),
>   BLK_DEF_MAX_SECTORS);

Yes, it is better. Is it okay to make the change when you apply?

Fam


Re: [PATCH v2] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable

2017-03-29 Thread Fam Zheng
On Wed, 03/29 22:37, Martin K. Petersen wrote:
> Fam Zheng  writes:
> 
> Fam,
> 
> > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> > index fcfeddc..a5c7e67 100644
> > --- a/drivers/scsi/sd.c
> > +++ b/drivers/scsi/sd.c
> > @@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
> > rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks);
> > } else
> > rw_max = BLK_DEF_MAX_SECTORS;
> > +   rw_max = min_not_zero(rw_max, logical_to_sectors(sdp, dev_max));
> >  
> > /* Combine with controller limits */
> > q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q));
> 
> Instead of updating rw_max twice, how about:
> 
> } else
>   rw_max = min_not_zero(logical_to_sectors(sdp, dev_max),
>   BLK_DEF_MAX_SECTORS);

Yes, it is better. Is it okay to make the change when you apply?

Fam


Re: [PATCH] zram: factor out partial IO routine

2017-03-29 Thread Sergey Senozhatsky
On (03/29/17 16:48), Minchan Kim wrote:
> For architecture(PAGE_SIZE > 4K), zram have supported partial IO.
> However, the mixed code for handling normal/partial IO is too mess,
> error-prone to modify IO handler functions with upcoming feature
> so this patch aims for cleaning up via factoring out partial IO
> routines to zram_bvec_partial_[read|write] which will be disabled
> for most 4K page architecures.
> 
> x86(4K architecure)
> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-664 (-664)
> function old new   delta
> zram_bvec_rw23012039-262
> zram_decompress_page.isra402   --402
> 
> So, we will save 662 bytes.
> 
> However, as side effect, it will increase binary size in
> non-4K architecure but it's not major for zram so I believe
> benefit(maintainance, binary size for most architecture) is bigger.

a bigger side effect is that now we double the amount of lines we need
to change in certain patches and, thus, the amount of work - when we
add new functionality/fix something in zram_bvec_{write, read} we also
would need to touch zram_bvec_partial_{write, read}.

still probably worth it.

Reviewed-by: Sergey Senozhatsky 

-ss


Re: [PATCH] zram: factor out partial IO routine

2017-03-29 Thread Sergey Senozhatsky
On (03/29/17 16:48), Minchan Kim wrote:
> For architecture(PAGE_SIZE > 4K), zram have supported partial IO.
> However, the mixed code for handling normal/partial IO is too mess,
> error-prone to modify IO handler functions with upcoming feature
> so this patch aims for cleaning up via factoring out partial IO
> routines to zram_bvec_partial_[read|write] which will be disabled
> for most 4K page architecures.
> 
> x86(4K architecure)
> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-664 (-664)
> function old new   delta
> zram_bvec_rw23012039-262
> zram_decompress_page.isra402   --402
> 
> So, we will save 662 bytes.
> 
> However, as side effect, it will increase binary size in
> non-4K architecure but it's not major for zram so I believe
> benefit(maintainance, binary size for most architecture) is bigger.

a bigger side effect is that now we double the amount of lines we need
to change in certain patches and, thus, the amount of work - when we
add new functionality/fix something in zram_bvec_{write, read} we also
would need to touch zram_bvec_partial_{write, read}.

still probably worth it.

Reviewed-by: Sergey Senozhatsky 

-ss


Re: spin_lock behavior with ARM64 big.Little/HMP

2017-03-29 Thread Vikram Mulukutla


Hi Sudeep,



Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata
819472 enabled ?


Sorry for bringing this up after the loo-ong delay, but I've been 
assured that the A53 involved is > r0p1. I've also confirmed this 
problem on multiple internal platforms, and I'm pretty sure that it 
occurs on any b.L out there today. Also, we found the same problematic 
lock design used in the workqueue code in the kernel, causing the same 
livelock. It's very very rare and requires a perfect set of 
circumstances.


If it would help I can provide a unit test if you folks would be 
generous enough to test it on the latest Juno or something b.L that's 
also upstream.


Thanks,
Vikram


Re: spin_lock behavior with ARM64 big.Little/HMP

2017-03-29 Thread Vikram Mulukutla


Hi Sudeep,



Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata
819472 enabled ?


Sorry for bringing this up after the loo-ong delay, but I've been 
assured that the A53 involved is > r0p1. I've also confirmed this 
problem on multiple internal platforms, and I'm pretty sure that it 
occurs on any b.L out there today. Also, we found the same problematic 
lock design used in the workqueue code in the kernel, causing the same 
livelock. It's very very rare and requires a perfect set of 
circumstances.


If it would help I can provide a unit test if you folks would be 
generous enough to test it on the latest Juno or something b.L that's 
also upstream.


Thanks,
Vikram


Re: [RFC PATCH 0/6] Unify the Interrupt Mode and setup it as soon as possible

2017-03-29 Thread Baoquan He
On 03/30/17 at 11:09am, Dou Liyang wrote:
> 
> 
> At 03/30/2017 11:03 AM, Dou Liyang wrote:
> > Hi Baoquan,
> > 
> > At 03/30/2017 10:08 AM, Baoquan He wrote:
> > > Hi Liyang,
> > > 
> > > This is awesome. I planned to do this after kaslr back porting, glad to
> > > see your posting. I like below diagram and the idea of patch 2/6
> > > framework. Will review and see what I can do to help since rhel bug from
> > > FJ is assigned to me.
> > > 
> > 
> > Thanks very much for your join! We have investigated the bug almost
> > half a year. :)
> > 
> > In my opinion,
> > If we plan to refactor the process of APIC initialization for the bug.
> > There must be lots of work need to be done. This patchset is just the
> > first step. When I test it, I am thinking about:
> > 
> > 1. The check and logic in each enable and setup LAPIC/IOAPIC functions.
> > 2. The process of IRQ remapping.
> > 3. The check and init of APIC timer.
> > 4. The relationship between the various switches, such as If
> > the smp_found_config is 1, the acpi_lapic must be 1.
> > 
> > And following work to me are:
> > 
> > 1. Use more test cases to test.
> > 2. learn the IOMMU.
> > 3. trace the APIC timer code.
> > 4. make the check logic more clear.
> > 
> > Hope to be helpful to you.

Thanks for telling, I will also check.

> > > 
> > > And add Joerg to this thread since he knows IOMMU very well.
> > 
> 
> ahh,
> 
> --cc j...@8bytes.org, not j...@8types.org

Yes, indeed. Thanks.

> 
> > oops, Yes, I forgot it, Thanks!
> > 
> > Thanks
> > Liyang
> > 
> > > 
> > > Thanks
> > > Baoquan
> > > 
> > > On 03/29/17 at 10:55pm, Dou Liyang wrote:
> > > > According to Ingo's and Eric's advice[1,2], Try my best to optimize the
> > > > init of Interrupt Mode for x86.
> > > > 
> > > > The MP specification defines three different interrupt modes as follows:
> > > > 
> > > >  1. PIC Mode
> > > >  2. Virtual Wire Mode
> > > >  3. Symmetic I/O Mode
> > > > 
> > > > Currently, In kernel,
> > > > 
> > > > 1. Setup the Virtual Wire Mode during the IRQ initialization(
> > > > step 1 in the following figure).
> > > > 2. Enable and Setup the Symmetic I/O Mode either during the
> > > > SMP-capabe system prepares CPUs(step 2) or during the UP system
> > > > initializes itself(step 3).
> > > > 
> > > >   start_kernel
> > > > +---+
> > > > |
> > > > +--> ...
> > > > |
> > > > |setup_arch
> > > > +--> +---+
> > > > |
> > > > |init_IRQ
> > > > +-> +--+-+
> > > > |  |init_ISA_irqs
> > > > |  +--> +-++
> > > > | | ++
> > > > +---> +-->  | 1.init_bsp_APIC|
> > > > | ...   ++
> > > > +--->
> > > > | rest_init
> > > > +--->---+-+
> > > > |   |   kernel_init
> > > > |   +> +-+
> > > > |  |   kernel_init_freeable
> > > > |  +->  +-+
> > > > |   | smp_prepare_cpus
> > > > |   +---> ++-+
> > > > |   |  |   +---+
> > > > |   |  +-> |2.  apic_bsp_setup |
> > > > |   |  +---+
> > > > |   |
> > > > v   | smp_init
> > > > +---> +---++
> > > >   |+---+
> > > >   +--> |3.  apic_bsp_setup |
> > > >+---+
> > > > 
> > > > The purpose of this patchset is Unifing these setup steps and
> > > > executing as
> > > > soon as possible as follows:
> > > > 
> > > >start_kernel
> > > > ---+
> > > > |
> > > > |
> > > > |
> > > > | init_IRQ
> > > > +>---++
> > > > ||
> > > > ||  ++
> > > > |+> | 4. init_bsp_APIC   |
> > > > |   ++
> > > > v
> > > > 
> > > > By the way, Also fix a bug about kexec[3].
> > > > 
> > > > 
> > > > Some doubts, need help:
> > > > 
> > > > 1. Patchset has influence on IOMMU in enable_IR_x2apic(). Not sure
> > > > it can be in advance?
> > > > 
> > > > 2. Due to
> > > > 
> > > > Commit 8c3ba8d04924 ("x86, apic: ack all pending irqs when crashed/on
> > > > kexec")
> > > > 
> > > >  ..., patchset also needs TSC and uses the "cpu_khz" in
> > > > setup_local_APIC().
> > > > And a warning[4] will be triggered when crashed/on kexec. Not sure
> > > > how to
> > > > modify?
> > > > 
> > > > [1]. https://lkml.org/lkml/2016/8/2/929
> > > > [2]. https://lkml.org/lkml/2016/8/1/506
> > > > [3]. https://lkml.org/lkml/2016/7/25/1118
> > > > [4]. WARN_ON(max_loops <= 0) in setup_local_APIC()
> > > > 
> > > > Dou Liyang (6):
> > > >   x86/apic: Replace init_bsp_APIC() with apic_virture_wire_mode_setup()
> > > >   x86/apic: Construct a framework for setuping APIC mode as soon as
> > > > 

Re: [RFC PATCH 0/6] Unify the Interrupt Mode and setup it as soon as possible

2017-03-29 Thread Baoquan He
On 03/30/17 at 11:09am, Dou Liyang wrote:
> 
> 
> At 03/30/2017 11:03 AM, Dou Liyang wrote:
> > Hi Baoquan,
> > 
> > At 03/30/2017 10:08 AM, Baoquan He wrote:
> > > Hi Liyang,
> > > 
> > > This is awesome. I planned to do this after kaslr back porting, glad to
> > > see your posting. I like below diagram and the idea of patch 2/6
> > > framework. Will review and see what I can do to help since rhel bug from
> > > FJ is assigned to me.
> > > 
> > 
> > Thanks very much for your join! We have investigated the bug almost
> > half a year. :)
> > 
> > In my opinion,
> > If we plan to refactor the process of APIC initialization for the bug.
> > There must be lots of work need to be done. This patchset is just the
> > first step. When I test it, I am thinking about:
> > 
> > 1. The check and logic in each enable and setup LAPIC/IOAPIC functions.
> > 2. The process of IRQ remapping.
> > 3. The check and init of APIC timer.
> > 4. The relationship between the various switches, such as If
> > the smp_found_config is 1, the acpi_lapic must be 1.
> > 
> > And following work to me are:
> > 
> > 1. Use more test cases to test.
> > 2. learn the IOMMU.
> > 3. trace the APIC timer code.
> > 4. make the check logic more clear.
> > 
> > Hope to be helpful to you.

Thanks for telling, I will also check.

> > > 
> > > And add Joerg to this thread since he knows IOMMU very well.
> > 
> 
> ahh,
> 
> --cc j...@8bytes.org, not j...@8types.org

Yes, indeed. Thanks.

> 
> > oops, Yes, I forgot it, Thanks!
> > 
> > Thanks
> > Liyang
> > 
> > > 
> > > Thanks
> > > Baoquan
> > > 
> > > On 03/29/17 at 10:55pm, Dou Liyang wrote:
> > > > According to Ingo's and Eric's advice[1,2], Try my best to optimize the
> > > > init of Interrupt Mode for x86.
> > > > 
> > > > The MP specification defines three different interrupt modes as follows:
> > > > 
> > > >  1. PIC Mode
> > > >  2. Virtual Wire Mode
> > > >  3. Symmetic I/O Mode
> > > > 
> > > > Currently, In kernel,
> > > > 
> > > > 1. Setup the Virtual Wire Mode during the IRQ initialization(
> > > > step 1 in the following figure).
> > > > 2. Enable and Setup the Symmetic I/O Mode either during the
> > > > SMP-capabe system prepares CPUs(step 2) or during the UP system
> > > > initializes itself(step 3).
> > > > 
> > > >   start_kernel
> > > > +---+
> > > > |
> > > > +--> ...
> > > > |
> > > > |setup_arch
> > > > +--> +---+
> > > > |
> > > > |init_IRQ
> > > > +-> +--+-+
> > > > |  |init_ISA_irqs
> > > > |  +--> +-++
> > > > | | ++
> > > > +---> +-->  | 1.init_bsp_APIC|
> > > > | ...   ++
> > > > +--->
> > > > | rest_init
> > > > +--->---+-+
> > > > |   |   kernel_init
> > > > |   +> +-+
> > > > |  |   kernel_init_freeable
> > > > |  +->  +-+
> > > > |   | smp_prepare_cpus
> > > > |   +---> ++-+
> > > > |   |  |   +---+
> > > > |   |  +-> |2.  apic_bsp_setup |
> > > > |   |  +---+
> > > > |   |
> > > > v   | smp_init
> > > > +---> +---++
> > > >   |+---+
> > > >   +--> |3.  apic_bsp_setup |
> > > >+---+
> > > > 
> > > > The purpose of this patchset is Unifing these setup steps and
> > > > executing as
> > > > soon as possible as follows:
> > > > 
> > > >start_kernel
> > > > ---+
> > > > |
> > > > |
> > > > |
> > > > | init_IRQ
> > > > +>---++
> > > > ||
> > > > ||  ++
> > > > |+> | 4. init_bsp_APIC   |
> > > > |   ++
> > > > v
> > > > 
> > > > By the way, Also fix a bug about kexec[3].
> > > > 
> > > > 
> > > > Some doubts, need help:
> > > > 
> > > > 1. Patchset has influence on IOMMU in enable_IR_x2apic(). Not sure
> > > > it can be in advance?
> > > > 
> > > > 2. Due to
> > > > 
> > > > Commit 8c3ba8d04924 ("x86, apic: ack all pending irqs when crashed/on
> > > > kexec")
> > > > 
> > > >  ..., patchset also needs TSC and uses the "cpu_khz" in
> > > > setup_local_APIC().
> > > > And a warning[4] will be triggered when crashed/on kexec. Not sure
> > > > how to
> > > > modify?
> > > > 
> > > > [1]. https://lkml.org/lkml/2016/8/2/929
> > > > [2]. https://lkml.org/lkml/2016/8/1/506
> > > > [3]. https://lkml.org/lkml/2016/7/25/1118
> > > > [4]. WARN_ON(max_loops <= 0) in setup_local_APIC()
> > > > 
> > > > Dou Liyang (6):
> > > >   x86/apic: Replace init_bsp_APIC() with apic_virture_wire_mode_setup()
> > > >   x86/apic: Construct a framework for setuping APIC mode as soon as
> > > > 

Re: [PATCH v9 10/15] ACPI: platform-msi: retrieve dev id from IORT

2017-03-29 Thread majun (Euler7)
Hi all:

在 2017/3/30 11:07, Hanjun Guo 写道:
> On 03/30/2017 01:32 AM, Lorenzo Pieralisi wrote:
>> On Wed, Mar 29, 2017 at 05:13:54PM +0100, Lorenzo Pieralisi wrote:
>>> On Wed, Mar 29, 2017 at 03:52:47PM +0100, Marc Zyngier wrote:
 On 29/03/17 14:00, Hanjun Guo wrote:
> On 03/29/2017 08:38 PM, Lorenzo Pieralisi wrote:
>> On Wed, Mar 29, 2017 at 07:52:48PM +0800, Hanjun Guo wrote:
>>> Hi Lorenzo,
>>>
>>> On 03/29/2017 06:14 PM, Lorenzo Pieralisi wrote:
 Hi Hanjun, Marc,

 On Tue, Mar 07, 2017 at 08:40:05PM +0800, Hanjun Guo wrote:
> [...]
>drivers/acpi/arm64/iort.c | 24 
> 
>drivers/irqchip/irq-gic-v3-its-platform-msi.c |  3 ++-
[...]

 Thoughts?
>>>
>>> Perfect for me. Hanjun, I can cherry pick Marc's patch above, rework
>>> this patch and post the resulting branch for everyone to have a final
>>> test.
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/linux.git 
>> acpi/arm64-acpi-4.12
>>
>> Please have a look and let me know if that's ok, I planned to send
>> a PR to Catalin by the end of the week (first 7 patches up to
>> 7fc3061df075 ("ACPI: platform: setup MSI domain for ACPI based platform
>> device")).
> 
> Perfect for me too, Lorenzo, Marc, Thank you very much.
> 
> I'm currently in paternity leave and can't reach the machine,
> I had a detail review with the patches, they looks good to me,
> Ma Jun and Wei Xu will test on Hisilicon machines and give the
> feedback.

The sas/xge/uart are working fine on my hisilicon board with
Lorenzo's branch (arm64-acpi-4.12)

Thanks
Majun

> 
> Thanks
> Hanjun
> 
> .
> 



Re: [PATCH v9 10/15] ACPI: platform-msi: retrieve dev id from IORT

2017-03-29 Thread majun (Euler7)
Hi all:

在 2017/3/30 11:07, Hanjun Guo 写道:
> On 03/30/2017 01:32 AM, Lorenzo Pieralisi wrote:
>> On Wed, Mar 29, 2017 at 05:13:54PM +0100, Lorenzo Pieralisi wrote:
>>> On Wed, Mar 29, 2017 at 03:52:47PM +0100, Marc Zyngier wrote:
 On 29/03/17 14:00, Hanjun Guo wrote:
> On 03/29/2017 08:38 PM, Lorenzo Pieralisi wrote:
>> On Wed, Mar 29, 2017 at 07:52:48PM +0800, Hanjun Guo wrote:
>>> Hi Lorenzo,
>>>
>>> On 03/29/2017 06:14 PM, Lorenzo Pieralisi wrote:
 Hi Hanjun, Marc,

 On Tue, Mar 07, 2017 at 08:40:05PM +0800, Hanjun Guo wrote:
> [...]
>drivers/acpi/arm64/iort.c | 24 
> 
>drivers/irqchip/irq-gic-v3-its-platform-msi.c |  3 ++-
[...]

 Thoughts?
>>>
>>> Perfect for me. Hanjun, I can cherry pick Marc's patch above, rework
>>> this patch and post the resulting branch for everyone to have a final
>>> test.
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/linux.git 
>> acpi/arm64-acpi-4.12
>>
>> Please have a look and let me know if that's ok, I planned to send
>> a PR to Catalin by the end of the week (first 7 patches up to
>> 7fc3061df075 ("ACPI: platform: setup MSI domain for ACPI based platform
>> device")).
> 
> Perfect for me too, Lorenzo, Marc, Thank you very much.
> 
> I'm currently in paternity leave and can't reach the machine,
> I had a detail review with the patches, they looks good to me,
> Ma Jun and Wei Xu will test on Hisilicon machines and give the
> feedback.

The sas/xge/uart are working fine on my hisilicon board with
Lorenzo's branch (arm64-acpi-4.12)

Thanks
Majun

> 
> Thanks
> Hanjun
> 
> .
> 



Re: [PATCH 4.8 50/96] firmware: fix usermode helper fallback loading

2017-03-29 Thread Luis R. Rodriguez
On Fri, Mar 24, 2017 at 04:01:58PM -0400, Ben Gamari wrote:
> Greg Kroah-Hartman  writes:
> 
> > On Fri, Jan 06, 2017 at 10:54:38PM +0100, Yves-Alexis Perez wrote:
> >> On Fri, 2017-01-06 at 22:43 +0100, Greg Kroah-Hartman wrote:
> >> > 4.8-stable review patch.  If anyone has any objections, please let me 
> >> > know.
> >> 
> >> Hi Greg,
> >> 
> >> Ben Gamari think there was a regression in that patch so I'm adding him to
> >> recipients so he can voice concerns if needed.
> >
> > Given the lack of response, I'm going to assume all is fine :)
> >
> Oh dear, sorry for the late response; this was stuck in the pergatory of
> my inbox.
> 
> It's been a while since I've looked at this, but I believe the alleged
> regression in this pastch is the reason I have the attached patch in my
> tree. I seem to recall that it was the ath10k driver which triggered the
> issue.
> 
> Unfortunately I can't recall which driver was affected by this. I'll
> have to see what happens when I revert the attached patch.
> 
> Cheers,
> 
> - Ben
> 
> 
> 
> Author: Ben Gamari 
> Date:   Mon Jan 2 00:38:05 2017 -0500
> 
> firmware_class: Ensure buf is non-NULL in __fw_load_abort
> 
> I have observed that this can be NULL.
> 
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index ac350c518e0c..fd0be24911fc 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -546,7 +546,8 @@ static void __fw_load_abort(struct firmware_buf *buf)
>  * There is a small window in which user can write to 'loading'
>  * between loading done and disappearance of 'loading'
>  */
> -   if (fw_state_is_done(>fw_st))
> +
> +   if (!buf || fw_state_is_done(>fw_st))
> return;
>  
> list_del_init(>pending_list);



We discussed this a while back and went with a more elegant fix, see:
commit 191e885a2e130e639bb0c8ee350d7047294f2ce6

  Luis


Re: [PATCH 4.8 50/96] firmware: fix usermode helper fallback loading

2017-03-29 Thread Luis R. Rodriguez
On Fri, Mar 24, 2017 at 04:01:58PM -0400, Ben Gamari wrote:
> Greg Kroah-Hartman  writes:
> 
> > On Fri, Jan 06, 2017 at 10:54:38PM +0100, Yves-Alexis Perez wrote:
> >> On Fri, 2017-01-06 at 22:43 +0100, Greg Kroah-Hartman wrote:
> >> > 4.8-stable review patch.  If anyone has any objections, please let me 
> >> > know.
> >> 
> >> Hi Greg,
> >> 
> >> Ben Gamari think there was a regression in that patch so I'm adding him to
> >> recipients so he can voice concerns if needed.
> >
> > Given the lack of response, I'm going to assume all is fine :)
> >
> Oh dear, sorry for the late response; this was stuck in the pergatory of
> my inbox.
> 
> It's been a while since I've looked at this, but I believe the alleged
> regression in this pastch is the reason I have the attached patch in my
> tree. I seem to recall that it was the ath10k driver which triggered the
> issue.
> 
> Unfortunately I can't recall which driver was affected by this. I'll
> have to see what happens when I revert the attached patch.
> 
> Cheers,
> 
> - Ben
> 
> 
> 
> Author: Ben Gamari 
> Date:   Mon Jan 2 00:38:05 2017 -0500
> 
> firmware_class: Ensure buf is non-NULL in __fw_load_abort
> 
> I have observed that this can be NULL.
> 
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index ac350c518e0c..fd0be24911fc 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -546,7 +546,8 @@ static void __fw_load_abort(struct firmware_buf *buf)
>  * There is a small window in which user can write to 'loading'
>  * between loading done and disappearance of 'loading'
>  */
> -   if (fw_state_is_done(>fw_st))
> +
> +   if (!buf || fw_state_is_done(>fw_st))
> return;
>  
> list_del_init(>pending_list);



We discussed this a while back and went with a more elegant fix, see:
commit 191e885a2e130e639bb0c8ee350d7047294f2ce6

  Luis


Re: [PATCH 1/8] platform/x86: fujitsu-laptop: move backlight input device setup to a separate function

2017-03-29 Thread Jonathan Woithe
On Wed, Mar 29, 2017 at 12:54:15PM -0700, Darren Hart wrote:
> On Mon, Mar 20, 2017 at 10:32:17AM +0100, Micha?? K??pie?? wrote:
> > +
> > +   return error;
> 
> This return path could be cleaned up a bit:
> 
>   error = input_register_device(input);
>   if (error)
>   input_free_device(input);
> 
>   return error;
> 
> But, this driver uses this "error/return 0" pattern pretty consistently, 
> whereas
> most of the kernel uses ret instead of error, and will return ret on success 
> and
> failure, relying on it being 0 in the successful case. Over the whole driver,
> we'd save several lines with the conversion and be more consistent with the 
> rest
> of the kernel. But, local consistency is important too. Jonathan, do you have 
> a
> preference for this driver?

I have no strong preferences, except to say that clarity is important.  As I
eluded to a few minutes ago, I agree that there's scope to address error
handling and there is a case to be made for bringing it into line with the
rest of the kernel.  I think this can be addressed in a separate patch
series though.  The present series under consideration doesn't make the
situation any worse (it actually improves it in some places) and introduces
worthwhile changes.  As such I don't see that we gain anything by delaying
it in order to address what is, at the end of the day, a separate concern.

Regards
  jonathan


Re: [PATCH 1/8] platform/x86: fujitsu-laptop: move backlight input device setup to a separate function

2017-03-29 Thread Jonathan Woithe
On Wed, Mar 29, 2017 at 12:54:15PM -0700, Darren Hart wrote:
> On Mon, Mar 20, 2017 at 10:32:17AM +0100, Micha?? K??pie?? wrote:
> > +
> > +   return error;
> 
> This return path could be cleaned up a bit:
> 
>   error = input_register_device(input);
>   if (error)
>   input_free_device(input);
> 
>   return error;
> 
> But, this driver uses this "error/return 0" pattern pretty consistently, 
> whereas
> most of the kernel uses ret instead of error, and will return ret on success 
> and
> failure, relying on it being 0 in the successful case. Over the whole driver,
> we'd save several lines with the conversion and be more consistent with the 
> rest
> of the kernel. But, local consistency is important too. Jonathan, do you have 
> a
> preference for this driver?

I have no strong preferences, except to say that clarity is important.  As I
eluded to a few minutes ago, I agree that there's scope to address error
handling and there is a case to be made for bringing it into line with the
rest of the kernel.  I think this can be addressed in a separate patch
series though.  The present series under consideration doesn't make the
situation any worse (it actually improves it in some places) and introduces
worthwhile changes.  As such I don't see that we gain anything by delaying
it in order to address what is, at the end of the day, a separate concern.

Regards
  jonathan


[PATCH] powerpc/prom: Increase RMA size to 512MB

2017-03-29 Thread Sukadev Bhattiprolu
>From 3ae8d1ed31b01b92b172fe20e4560cfbfab135ec Mon Sep 17 00:00:00 2001
From: root 
Date: Mon, 27 Mar 2017 19:43:14 -0400
Subject: [PATCH] powerpc/prom: Increase RMA size to 512MB

When booting very large systems with a large initrd, we run out of
space for either the RTAS or the flattened device tree (FDT). Boot
fails with messages like:

Could not allocate memory for RTAS
or
No memory for flatten_device_tree (no room)

Increasing the minimum RMA size to 512MB fixes the problem. This
should not have an impact on smaller LPARs (with 256MB memory),
as the firmware will cap the RMA to the memory assigned to the LPAR.

Fix is based on input/discussions with Michael Ellerman. Thanks to
Praveen K. Pandey for testing on a large system.

Reported-by: Praveen K. Pandey 
Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/kernel/prom_init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 1c1b44e..dd8a04f 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -815,7 +815,7 @@ struct ibm_arch_vec __cacheline_aligned 
ibm_architecture_vec = {
.virt_base = cpu_to_be32(0x),
.virt_size = cpu_to_be32(0x),
.load_base = cpu_to_be32(0x),
-   .min_rma = cpu_to_be32(256),/* 256MB min RMA */
+   .min_rma = cpu_to_be32(512),/* 512MB min RMA */
.min_load = cpu_to_be32(0x),/* full client load */
.min_rma_percent = 0,   /* min RMA percentage of total RAM */
.max_pft_size = 48, /* max log_2(hash table size) */
-- 
1.8.3.1



[PATCH] powerpc/prom: Increase RMA size to 512MB

2017-03-29 Thread Sukadev Bhattiprolu
>From 3ae8d1ed31b01b92b172fe20e4560cfbfab135ec Mon Sep 17 00:00:00 2001
From: root 
Date: Mon, 27 Mar 2017 19:43:14 -0400
Subject: [PATCH] powerpc/prom: Increase RMA size to 512MB

When booting very large systems with a large initrd, we run out of
space for either the RTAS or the flattened device tree (FDT). Boot
fails with messages like:

Could not allocate memory for RTAS
or
No memory for flatten_device_tree (no room)

Increasing the minimum RMA size to 512MB fixes the problem. This
should not have an impact on smaller LPARs (with 256MB memory),
as the firmware will cap the RMA to the memory assigned to the LPAR.

Fix is based on input/discussions with Michael Ellerman. Thanks to
Praveen K. Pandey for testing on a large system.

Reported-by: Praveen K. Pandey 
Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/kernel/prom_init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 1c1b44e..dd8a04f 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -815,7 +815,7 @@ struct ibm_arch_vec __cacheline_aligned 
ibm_architecture_vec = {
.virt_base = cpu_to_be32(0x),
.virt_size = cpu_to_be32(0x),
.load_base = cpu_to_be32(0x),
-   .min_rma = cpu_to_be32(256),/* 256MB min RMA */
+   .min_rma = cpu_to_be32(512),/* 512MB min RMA */
.min_load = cpu_to_be32(0x),/* full client load */
.min_rma_percent = 0,   /* min RMA percentage of total RAM */
.max_pft_size = 48, /* max log_2(hash table size) */
-- 
1.8.3.1



Re: [PATCH] virtio_net: enable big packets for large MTU values

2017-03-29 Thread Jason Wang



On 2017年03月29日 20:38, Michael S. Tsirkin wrote:

If one enables e.g. jumbo frames without mergeable
buffers, packets won't fit in 1500 byte buffers
we use. Switch to big packet mode instead.
TODO: make sizing more exact, possibly extend small
packet mode to use larger pages.

Signed-off-by: Michael S. Tsirkin 
---
  drivers/net/virtio_net.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e0fb3707..9dc31dc 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2428,6 +2428,10 @@ static int virtnet_probe(struct virtio_device *vdev)
dev->mtu = mtu;
dev->max_mtu = mtu;
}
+
+   /* TODO: size buffers correctly in this case. */
+   if (dev->mtu > ETH_DATA_LEN)
+   vi->big_packets = true;
}
  
  	if (vi->any_header_sg)


Ok, I think maybe we should fail the probe of small buffer in this case 
and decrease the max_mtu to ETH_DATA_LEN if small buffer is used (since 
user are allowed to increase the mtu).


Thanks


Re: [PATCH] virtio_net: enable big packets for large MTU values

2017-03-29 Thread Jason Wang



On 2017年03月29日 20:38, Michael S. Tsirkin wrote:

If one enables e.g. jumbo frames without mergeable
buffers, packets won't fit in 1500 byte buffers
we use. Switch to big packet mode instead.
TODO: make sizing more exact, possibly extend small
packet mode to use larger pages.

Signed-off-by: Michael S. Tsirkin 
---
  drivers/net/virtio_net.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e0fb3707..9dc31dc 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2428,6 +2428,10 @@ static int virtnet_probe(struct virtio_device *vdev)
dev->mtu = mtu;
dev->max_mtu = mtu;
}
+
+   /* TODO: size buffers correctly in this case. */
+   if (dev->mtu > ETH_DATA_LEN)
+   vi->big_packets = true;
}
  
  	if (vi->any_header_sg)


Ok, I think maybe we should fail the probe of small buffer in this case 
and decrease the max_mtu to ETH_DATA_LEN if small buffer is used (since 
user are allowed to increase the mtu).


Thanks


Re: lockdep warning: console vs. mem hotplug

2017-03-29 Thread Sergey Senozhatsky
On (03/28/17 16:22), Michal Hocko wrote:
[..]
> > Sebastian, does this change make lockdep happy?
> > 
> > it removes console drivers from the __offline_isolated_pages(). not the
> > best solution I can think of, but the simplest one.
> > 
> > ---
> > 
> >  mm/page_alloc.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index f749b7ff7c50..eb61e6ab5f4f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, 
> > unsigned long end_pfn)
> > BUG_ON(!PageBuddy(page));
> > order = page_order(page);
> >  #ifdef CONFIG_DEBUG_VM
> > -   pr_info("remove from free list %lx %d %lx\n",
> > +   printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n",
> > pfn, 1 << order, end_pfn);
> >  #endif
> > list_del(>lru);
> 
> I believe this is not a proper fix.

oh, absolutely. I hate it. didn't really propose it as a fix. mostly
did it just to verify that there are no other lock inversions behind
the one that has been reported (lockdep turns off itself when it
detects the first lock dependency inversion).

> Although this code is ugly and maybe it doesn't really need zone->lock
> because that should be the page allocator internal thing the problem is
> that printk shouldn't impose such a subtle dependency on locks. Why does
> the timer needs to allocate at all?

I believe Petr has answered your questions. sorry for the delay.

-ss


Re: lockdep warning: console vs. mem hotplug

2017-03-29 Thread Sergey Senozhatsky
On (03/28/17 16:22), Michal Hocko wrote:
[..]
> > Sebastian, does this change make lockdep happy?
> > 
> > it removes console drivers from the __offline_isolated_pages(). not the
> > best solution I can think of, but the simplest one.
> > 
> > ---
> > 
> >  mm/page_alloc.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index f749b7ff7c50..eb61e6ab5f4f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, 
> > unsigned long end_pfn)
> > BUG_ON(!PageBuddy(page));
> > order = page_order(page);
> >  #ifdef CONFIG_DEBUG_VM
> > -   pr_info("remove from free list %lx %d %lx\n",
> > +   printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n",
> > pfn, 1 << order, end_pfn);
> >  #endif
> > list_del(>lru);
> 
> I believe this is not a proper fix.

oh, absolutely. I hate it. didn't really propose it as a fix. mostly
did it just to verify that there are no other lock inversions behind
the one that has been reported (lockdep turns off itself when it
detects the first lock dependency inversion).

> Although this code is ugly and maybe it doesn't really need zone->lock
> because that should be the page allocator internal thing the problem is
> that printk shouldn't impose such a subtle dependency on locks. Why does
> the timer needs to allocate at all?

I believe Petr has answered your questions. sorry for the delay.

-ss


Re: [PATCH] virtio_net: fix mergeable bufs error handling

2017-03-29 Thread Jason Wang



On 2017年03月29日 20:37, Michael S. Tsirkin wrote:

On xdp error we try to free head_skb without having
initialized it, that's clearly bogus.

Fixes: f600b6905015 ("virtio_net: Add XDP support")
Cc: John Fastabend 
Signed-off-by: Michael S. Tsirkin 
---
  drivers/net/virtio_net.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11773d6..e0fb3707 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -570,7 +570,7 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
u16 num_buf;
struct page *page;
int offset;
-   struct sk_buff *head_skb, *curr_skb;
+   struct sk_buff *head_skb = NULL, *curr_skb;
struct bpf_prog *xdp_prog;
unsigned int truesize;
  


My tree (net.git HEAD is 8f1f7eeb22c16a197159cf7b35d1350695193ead) has:

head_skb = NULL;

just after the above codes.

Thanks


Re: [PATCH] virtio_net: fix mergeable bufs error handling

2017-03-29 Thread Jason Wang



On 2017年03月29日 20:37, Michael S. Tsirkin wrote:

On xdp error we try to free head_skb without having
initialized it, that's clearly bogus.

Fixes: f600b6905015 ("virtio_net: Add XDP support")
Cc: John Fastabend 
Signed-off-by: Michael S. Tsirkin 
---
  drivers/net/virtio_net.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11773d6..e0fb3707 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -570,7 +570,7 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
u16 num_buf;
struct page *page;
int offset;
-   struct sk_buff *head_skb, *curr_skb;
+   struct sk_buff *head_skb = NULL, *curr_skb;
struct bpf_prog *xdp_prog;
unsigned int truesize;
  


My tree (net.git HEAD is 8f1f7eeb22c16a197159cf7b35d1350695193ead) has:

head_skb = NULL;

just after the above codes.

Thanks


Re: [PATCH 0/8] fujitsu-laptop: use sparse keymaps for input event handling

2017-03-29 Thread Jonathan Woithe
On Wed, Mar 29, 2017 at 08:36:50PM -0700, Darren Hart wrote:
> On Wed, Mar 29, 2017 at 07:35:50PM +0300, Andy Shevchenko wrote:
> > On Wed, Mar 29, 2017 at 10:19 AM, Micha?? K??pie??  
> > wrote:
> > 
> > > Darren, Andy, in light of the above I will be awaiting your review of
> > > this series.  I will submit v2 afterwards, with all remarks from both
> > > you and Jonathan taken into account.
> > 
> > Darren marked this series under his name to review, so, I let him to
> > speak for us.
> 
> The series looks good to me. Nice work Micha??. They are logically divided and
> address issues in a procedural way (so I stopped commenting until I read the
> full series through as a couple of times you addressed a concern from a move 
> in
> a cleanup to follow).
> 
> I've applied the noted change to 7/8 and will run this through my tests, but
> don't anticipate any problems. Jonathan, if you don't have any additional
> concerns, let me know if I can add your Reviewed-by.

With the noted change to 7/8 applied I'm happy with the resulting series. 
As you noted, there is still some scope for making things more consistent,
especially with regard to error handling.  However, that is really a
separate task which can be addressed in a later series.  This present series
doesn't impact on this issue in any significant way so it makes sense that
be applied.

Reviewed-by: Jonathan Woithe 

Regards
  jonathan


Re: [PATCH 0/8] fujitsu-laptop: use sparse keymaps for input event handling

2017-03-29 Thread Jonathan Woithe
On Wed, Mar 29, 2017 at 08:36:50PM -0700, Darren Hart wrote:
> On Wed, Mar 29, 2017 at 07:35:50PM +0300, Andy Shevchenko wrote:
> > On Wed, Mar 29, 2017 at 10:19 AM, Micha?? K??pie??  
> > wrote:
> > 
> > > Darren, Andy, in light of the above I will be awaiting your review of
> > > this series.  I will submit v2 afterwards, with all remarks from both
> > > you and Jonathan taken into account.
> > 
> > Darren marked this series under his name to review, so, I let him to
> > speak for us.
> 
> The series looks good to me. Nice work Micha??. They are logically divided and
> address issues in a procedural way (so I stopped commenting until I read the
> full series through as a couple of times you addressed a concern from a move 
> in
> a cleanup to follow).
> 
> I've applied the noted change to 7/8 and will run this through my tests, but
> don't anticipate any problems. Jonathan, if you don't have any additional
> concerns, let me know if I can add your Reviewed-by.

With the noted change to 7/8 applied I'm happy with the resulting series. 
As you noted, there is still some scope for making things more consistent,
especially with regard to error handling.  However, that is really a
separate task which can be addressed in a later series.  This present series
doesn't impact on this issue in any significant way so it makes sense that
be applied.

Reviewed-by: Jonathan Woithe 

Regards
  jonathan


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:19 +0300, Michael S. Tsirkin wrote:
>  > >  > > > > > > > >  >max_nr_ports) == 0) {
> > @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
> >  
> >  static unsigned int features[] = {
> >  > >> > VIRTIO_CONSOLE_F_SIZE,
> > +#ifndef CONFIG_IRQ_FORCED_THREADING
> >  > >> > VIRTIO_CONSOLE_F_MULTIPORT,
> > +#endif
> >  };
> 
> These look kind of questionable.
> Is this part needed?

I would have sworn it was, but double checking, nope, it's not.

Hm, so I could make a prettier bandaid with a runtime check.. but it'd
remain a bandaid, so I'll go do some beans 'n' biscuits work instead.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:19 +0300, Michael S. Tsirkin wrote:
>  > >  > > > > > > > >  >max_nr_ports) == 0) {
> > @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
> >  
> >  static unsigned int features[] = {
> >  > >> > VIRTIO_CONSOLE_F_SIZE,
> > +#ifndef CONFIG_IRQ_FORCED_THREADING
> >  > >> > VIRTIO_CONSOLE_F_MULTIPORT,
> > +#endif
> >  };
> 
> These look kind of questionable.
> Is this part needed?

I would have sworn it was, but double checking, nope, it's not.

Hm, so I could make a prettier bandaid with a runtime check.. but it'd
remain a bandaid, so I'll go do some beans 'n' biscuits work instead.

-Mike


Re: [PATCH net-next 8/8] vhost_net: use lockless peeking for skb array during busy polling

2017-03-29 Thread Jason Wang



On 2017年03月30日 10:33, Michael S. Tsirkin wrote:

On Thu, Mar 30, 2017 at 10:16:15AM +0800, Jason Wang wrote:


On 2017年03月29日 20:07, Michael S. Tsirkin wrote:

On Tue, Mar 21, 2017 at 12:04:47PM +0800, Jason Wang wrote:

For the socket that exports its skb array, we can use lockless polling
to avoid touching spinlock during busy polling.

Signed-off-by: Jason Wang 
---
   drivers/vhost/net.c | 7 +--
   1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 53f09f2..41153a3 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -551,10 +551,13 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, 
struct sock *sk)
return len;
   }
-static int sk_has_rx_data(struct sock *sk)
+static int sk_has_rx_data(struct vhost_net_virtqueue *rvq, struct sock *sk)
   {
struct socket *sock = sk->sk_socket;
+   if (rvq->rx_array)
+   return !__skb_array_empty(rvq->rx_array);
+
if (sock->ops->peek_len)
return sock->ops->peek_len(sock);

I don't see which patch adds __skb_array_empty.

This is not something new, it was introduced by ad69f35d1dc0a ("skb_array:
array based FIFO for skbs").

Thanks

Same comment about a compiler barrier applies then.


Ok, rethink about this, since skb_array could be resized, using lockless 
version seems wrong.


For the comment of using compiler barrier, caller 
(vhost_net_rx_peek_head_len) uses cpu_relax(). But I haven't figured out 
why a compiler barrier is needed here. Could you please explain?


Thanks




@@ -579,7 +582,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net,
endtime = busy_clock() + vq->busyloop_timeout;
while (vhost_can_busy_poll(>dev, endtime) &&
-  !sk_has_rx_data(sk) &&
+  !sk_has_rx_data(rvq, sk) &&
   vhost_vq_avail_empty(>dev, vq))
cpu_relax();
--
2.7.4




Re: [PATCH net-next 8/8] vhost_net: use lockless peeking for skb array during busy polling

2017-03-29 Thread Jason Wang



On 2017年03月30日 10:33, Michael S. Tsirkin wrote:

On Thu, Mar 30, 2017 at 10:16:15AM +0800, Jason Wang wrote:


On 2017年03月29日 20:07, Michael S. Tsirkin wrote:

On Tue, Mar 21, 2017 at 12:04:47PM +0800, Jason Wang wrote:

For the socket that exports its skb array, we can use lockless polling
to avoid touching spinlock during busy polling.

Signed-off-by: Jason Wang 
---
   drivers/vhost/net.c | 7 +--
   1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 53f09f2..41153a3 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -551,10 +551,13 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, 
struct sock *sk)
return len;
   }
-static int sk_has_rx_data(struct sock *sk)
+static int sk_has_rx_data(struct vhost_net_virtqueue *rvq, struct sock *sk)
   {
struct socket *sock = sk->sk_socket;
+   if (rvq->rx_array)
+   return !__skb_array_empty(rvq->rx_array);
+
if (sock->ops->peek_len)
return sock->ops->peek_len(sock);

I don't see which patch adds __skb_array_empty.

This is not something new, it was introduced by ad69f35d1dc0a ("skb_array:
array based FIFO for skbs").

Thanks

Same comment about a compiler barrier applies then.


Ok, rethink about this, since skb_array could be resized, using lockless 
version seems wrong.


For the comment of using compiler barrier, caller 
(vhost_net_rx_peek_head_len) uses cpu_relax(). But I haven't figured out 
why a compiler barrier is needed here. Could you please explain?


Thanks




@@ -579,7 +582,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net,
endtime = busy_clock() + vq->busyloop_timeout;
while (vhost_can_busy_poll(>dev, endtime) &&
-  !sk_has_rx_data(sk) &&
+  !sk_has_rx_data(rvq, sk) &&
   vhost_vq_avail_empty(>dev, vq))
cpu_relax();
--
2.7.4




linux-next: build failure after merge of the phy-next tree

2017-03-29 Thread Stephen Rothwell
Hi Kishon,

After merging the phy-next tree, today's linux-next build (x86_64
allmodconfig) failed like this:

drivers/phy/phy-rockchip-inno-usb2.c:1148:3: error: unknown field 'phy_tuning' 
specified in initializer
   .phy_tuning = rk3328_usb2phy_tuning,
   ^
drivers/phy/phy-rockchip-inno-usb2.c:1148:17: error: 'rk3328_usb2phy_tuning' 
undeclared here (not in a function)
   .phy_tuning = rk3328_usb2phy_tuning,
 ^

Caused by commit

  ffa0c278e89c ("phy: rockchip-inno-usb2: add support of u2phy for rk3328")

I have used the phy-next tree from next-20170329 for today.

-- 
Cheers,
Stephen Rothwell


linux-next: build failure after merge of the phy-next tree

2017-03-29 Thread Stephen Rothwell
Hi Kishon,

After merging the phy-next tree, today's linux-next build (x86_64
allmodconfig) failed like this:

drivers/phy/phy-rockchip-inno-usb2.c:1148:3: error: unknown field 'phy_tuning' 
specified in initializer
   .phy_tuning = rk3328_usb2phy_tuning,
   ^
drivers/phy/phy-rockchip-inno-usb2.c:1148:17: error: 'rk3328_usb2phy_tuning' 
undeclared here (not in a function)
   .phy_tuning = rk3328_usb2phy_tuning,
 ^

Caused by commit

  ffa0c278e89c ("phy: rockchip-inno-usb2: add support of u2phy for rk3328")

I have used the phy-next tree from next-20170329 for today.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the tty tree with the tty.current tree

2017-03-29 Thread Michael Neuling
On Mon, 2017-03-20 at 10:26 +0100, Dmitry Vyukov wrote:
> On Mon, Mar 20, 2017 at 10:21 AM, Dmitry Vyukov  wrote:
> > On Mon, Mar 20, 2017 at 3:28 AM, Stephen Rothwell 
> > wrote:
> > > Hi Greg,
> > > 
> > > Today's linux-next merge of the tty tree got a conflict in:
> > > 
> > >   drivers/tty/tty_ldisc.c
> > > 
> > > between commit:
> > > 
> > >   5362544bebe8 ("tty: don't panic on OOM in tty_set_ldisc()")
> > > 
> > > from the tty.current tree and commit:
> > > 
> > >   71472fa9c52b ("tty: Fix ldisc crash on reopened tty")
> > > 
> > > from the tty tree.
> > > 
> > > I fixed it up (see below) and can carry the fix as necessary. This
> > > is now fixed as far as linux-next is concerned, but any non trivial
> > > conflicts should be mentioned to your upstream maintainer when your tree
> > > is submitted for merging.  You may also want to consider cooperating
> > > with the maintainer of the conflicting tree to minimise any particularly
> > > complex conflicts.
> > > 
> > > --
> > > Cheers,
> > > Stephen Rothwell
> > > 
> > > diff --cc drivers/tty/tty_ldisc.c
> > > index b0500a0a87b8,4ee7742dced3..
> > > --- a/drivers/tty/tty_ldisc.c
> > > +++ b/drivers/tty/tty_ldisc.c
> > > @@@ -621,14 -669,17 +621,15 @@@ int tty_ldisc_reinit(struct tty_struct
> > > tty_ldisc_put(tty->ldisc);
> > > }
> > > 
> > > -   /* switch the line discipline */
> > > -   tty->ldisc = ld;
> > > tty_set_termios_ldisc(tty, disc);
> > > -   retval = tty_ldisc_open(tty, tty->ldisc);
> > > +   retval = tty_ldisc_open(tty, ld);
> > > if (retval) {
> > > -   tty_ldisc_put(tty->ldisc);
> > > -   tty->ldisc = NULL;
> > >  -  if (!WARN_ON(disc == N_TTY)) {
> > >  -  tty_ldisc_put(ld);
> > >  -  ld = NULL;
> > >  -  }
> > > ++  tty_ldisc_put(ld);
> > > ++  ld = NULL;
> > > }
> > > +
> > > +   /* switch the line discipline */
> > > +   smp_store_release(>ldisc, ld);
> > > return retval;
> > >   }
> > > 
> > 
> > 
> > Peter,
> > 
> > Looking at your patch "tty: Fix ldisc crash on reopened tty", I think
> > there is a missed barrier in tty_ldisc_ref. A single barrier does not
> > have any effect, they always need to be in pairs. So I think we also
> > need at least:
> > 
> > @@ -295,7 +295,8 @@ struct tty_ldisc *tty_ldisc_ref(struct tty_struct *tty)
> > struct tty_ldisc *ld = NULL;
> > 
> > if (ldsem_down_read_trylock(>ldisc_sem)) {
> > -   ld = tty->ldisc;
> > +   ld = READ_ONCE(tty->ldisc);
> > +   read_barrier_depends();
> > if (!ld)
> > ldsem_up_read(>ldisc_sem);
> > }
> > 
> > 
> > Or simply:
> > 
> > @@ -295,7 +295,8 @@ struct tty_ldisc *tty_ldisc_ref(struct tty_struct *tty)
> > struct tty_ldisc *ld = NULL;
> > 
> > if (ldsem_down_read_trylock(>ldisc_sem)) {
> > -   ld = tty->ldisc;
> > +   /* pairs with smp_store_release in tty_ldisc_reinit */
> > +   ld = smp_load_acquire(>ldisc);
> > if (!ld)
> > ldsem_up_read(>ldisc_sem);
> > }
> 
> 
> 
> 
> I am also surprised that callers of tty_ldisc_reinit don't hold
> ldisc_sem. I thought that ldisc_sem is what's supposed to protect
> changes to ldisc. That would also auto fix the crash without any
> tricky barriers as flush_to_ldisc uses tty_ldisc_ref.

Dmitry,

Thanks for the help.  Peter doesn't seem to be responding to email any more.

I'm not familiar with the tty layer, but the issue that patch was suppose to fix
had a similar signature to the below oops we are seeing on powerpc on boot.
(sorry I don't have a repro on mainline or linux-next). Hence why I pushed on
it.

In the below crash the call to tty_ldisc_reinit is coming from a workqueue, so
requiring the callers to hold the ldisc_sem is more tricky.

Could we just hold the ldisc_sem inside tty_ldisc_reinit()?

Regards,
Mikey

[ 9.021567] Unable to handle kernel paging request for data at address 
0x2260
[ 9.022501] Faulting instruction address: 0xc06c7770
[ 9.023105] Oops: Kernel access of bad area, sig: 11 [#1]
[ 9.023674] SMP NR_CPUS=2048
[ 9.023676] NUMA
[ 9.023970] PowerNV
[ 9.024372] Modules linked in: ofpart cmdlinepart ipmi_powernv powernv_flash 
ipmi_devintf mtd ipmi_msghandler ibmpowernv opal_prd uio_pdrv_genirq uio 
vmx_crypto ip_tables x_tables autofs4 ast i2c_algo_bit ttm drm_kms_helper 
syscopyarea sysfillrect sysimgblt crc32c_vpmsum fb_sys_fops drm ahci libahci tg3
[ 9.027146] CPU: 15 PID: 354 Comm: kworker/u64:2 Not tainted 4.10.0-8-generic 
#10-Ubuntu
[ 9.027978] Workqueue: events_unbound flush_to_ldisc
[ 9.028468] task: c016a7758c00 task.stack: c000fd084000
[ 9.029055] NIP: c06c7770 LR: c06c7758 CTR: c06c84b0
[ 9.029767] REGS: 

Re: linux-next: manual merge of the tty tree with the tty.current tree

2017-03-29 Thread Michael Neuling
On Mon, 2017-03-20 at 10:26 +0100, Dmitry Vyukov wrote:
> On Mon, Mar 20, 2017 at 10:21 AM, Dmitry Vyukov  wrote:
> > On Mon, Mar 20, 2017 at 3:28 AM, Stephen Rothwell 
> > wrote:
> > > Hi Greg,
> > > 
> > > Today's linux-next merge of the tty tree got a conflict in:
> > > 
> > >   drivers/tty/tty_ldisc.c
> > > 
> > > between commit:
> > > 
> > >   5362544bebe8 ("tty: don't panic on OOM in tty_set_ldisc()")
> > > 
> > > from the tty.current tree and commit:
> > > 
> > >   71472fa9c52b ("tty: Fix ldisc crash on reopened tty")
> > > 
> > > from the tty tree.
> > > 
> > > I fixed it up (see below) and can carry the fix as necessary. This
> > > is now fixed as far as linux-next is concerned, but any non trivial
> > > conflicts should be mentioned to your upstream maintainer when your tree
> > > is submitted for merging.  You may also want to consider cooperating
> > > with the maintainer of the conflicting tree to minimise any particularly
> > > complex conflicts.
> > > 
> > > --
> > > Cheers,
> > > Stephen Rothwell
> > > 
> > > diff --cc drivers/tty/tty_ldisc.c
> > > index b0500a0a87b8,4ee7742dced3..
> > > --- a/drivers/tty/tty_ldisc.c
> > > +++ b/drivers/tty/tty_ldisc.c
> > > @@@ -621,14 -669,17 +621,15 @@@ int tty_ldisc_reinit(struct tty_struct
> > > tty_ldisc_put(tty->ldisc);
> > > }
> > > 
> > > -   /* switch the line discipline */
> > > -   tty->ldisc = ld;
> > > tty_set_termios_ldisc(tty, disc);
> > > -   retval = tty_ldisc_open(tty, tty->ldisc);
> > > +   retval = tty_ldisc_open(tty, ld);
> > > if (retval) {
> > > -   tty_ldisc_put(tty->ldisc);
> > > -   tty->ldisc = NULL;
> > >  -  if (!WARN_ON(disc == N_TTY)) {
> > >  -  tty_ldisc_put(ld);
> > >  -  ld = NULL;
> > >  -  }
> > > ++  tty_ldisc_put(ld);
> > > ++  ld = NULL;
> > > }
> > > +
> > > +   /* switch the line discipline */
> > > +   smp_store_release(>ldisc, ld);
> > > return retval;
> > >   }
> > > 
> > 
> > 
> > Peter,
> > 
> > Looking at your patch "tty: Fix ldisc crash on reopened tty", I think
> > there is a missed barrier in tty_ldisc_ref. A single barrier does not
> > have any effect, they always need to be in pairs. So I think we also
> > need at least:
> > 
> > @@ -295,7 +295,8 @@ struct tty_ldisc *tty_ldisc_ref(struct tty_struct *tty)
> > struct tty_ldisc *ld = NULL;
> > 
> > if (ldsem_down_read_trylock(>ldisc_sem)) {
> > -   ld = tty->ldisc;
> > +   ld = READ_ONCE(tty->ldisc);
> > +   read_barrier_depends();
> > if (!ld)
> > ldsem_up_read(>ldisc_sem);
> > }
> > 
> > 
> > Or simply:
> > 
> > @@ -295,7 +295,8 @@ struct tty_ldisc *tty_ldisc_ref(struct tty_struct *tty)
> > struct tty_ldisc *ld = NULL;
> > 
> > if (ldsem_down_read_trylock(>ldisc_sem)) {
> > -   ld = tty->ldisc;
> > +   /* pairs with smp_store_release in tty_ldisc_reinit */
> > +   ld = smp_load_acquire(>ldisc);
> > if (!ld)
> > ldsem_up_read(>ldisc_sem);
> > }
> 
> 
> 
> 
> I am also surprised that callers of tty_ldisc_reinit don't hold
> ldisc_sem. I thought that ldisc_sem is what's supposed to protect
> changes to ldisc. That would also auto fix the crash without any
> tricky barriers as flush_to_ldisc uses tty_ldisc_ref.

Dmitry,

Thanks for the help.  Peter doesn't seem to be responding to email any more.

I'm not familiar with the tty layer, but the issue that patch was suppose to fix
had a similar signature to the below oops we are seeing on powerpc on boot.
(sorry I don't have a repro on mainline or linux-next). Hence why I pushed on
it.

In the below crash the call to tty_ldisc_reinit is coming from a workqueue, so
requiring the callers to hold the ldisc_sem is more tricky.

Could we just hold the ldisc_sem inside tty_ldisc_reinit()?

Regards,
Mikey

[ 9.021567] Unable to handle kernel paging request for data at address 
0x2260
[ 9.022501] Faulting instruction address: 0xc06c7770
[ 9.023105] Oops: Kernel access of bad area, sig: 11 [#1]
[ 9.023674] SMP NR_CPUS=2048
[ 9.023676] NUMA
[ 9.023970] PowerNV
[ 9.024372] Modules linked in: ofpart cmdlinepart ipmi_powernv powernv_flash 
ipmi_devintf mtd ipmi_msghandler ibmpowernv opal_prd uio_pdrv_genirq uio 
vmx_crypto ip_tables x_tables autofs4 ast i2c_algo_bit ttm drm_kms_helper 
syscopyarea sysfillrect sysimgblt crc32c_vpmsum fb_sys_fops drm ahci libahci tg3
[ 9.027146] CPU: 15 PID: 354 Comm: kworker/u64:2 Not tainted 4.10.0-8-generic 
#10-Ubuntu
[ 9.027978] Workqueue: events_unbound flush_to_ldisc
[ 9.028468] task: c016a7758c00 task.stack: c000fd084000
[ 9.029055] NIP: c06c7770 LR: c06c7758 CTR: c06c84b0
[ 9.029767] REGS: c000fd0878c0 TRAP: 0300 Not tainted 

Re: [PATCH] kernel.h: add IS_PTR_ALIGNED() macro

2017-03-29 Thread H. Peter Anvin
On 03/29/17 18:57, Masahiro Yamada wrote:
> 
> Could you care to send a patch?
> 
> Perhaps, ALIGN and PTR_ALIGN can be merged as well?
> 

Can't promise when I'd get around to it...

-hpa




Re: [PATCH] kernel.h: add IS_PTR_ALIGNED() macro

2017-03-29 Thread H. Peter Anvin
On 03/29/17 18:57, Masahiro Yamada wrote:
> 
> Could you care to send a patch?
> 
> Perhaps, ALIGN and PTR_ALIGN can be merged as well?
> 

Can't promise when I'd get around to it...

-hpa




Re: [PATCH 0/8] fujitsu-laptop: use sparse keymaps for input event handling

2017-03-29 Thread Darren Hart
On Wed, Mar 29, 2017 at 07:35:50PM +0300, Andy Shevchenko wrote:
> On Wed, Mar 29, 2017 at 10:19 AM, Michał Kępień  wrote:
> 
> > Darren, Andy, in light of the above I will be awaiting your review of
> > this series.  I will submit v2 afterwards, with all remarks from both
> > you and Jonathan taken into account.
> 
> Darren marked this series under his name to review, so, I let him to
> speak for us.

The series looks good to me. Nice work Michał. They are logically divided and
address issues in a procedural way (so I stopped commenting until I read the
full series through as a couple of times you addressed a concern from a move in
a cleanup to follow).

I've applied the noted change to 7/8 and will run this through my tests, but
don't anticipate any problems. Jonathan, if you don't have any additional
concerns, let me know if I can add your Reviewed-by.

Thanks,

-- 
Darren Hart
VMware Open Source Technology Center


Re: [PATCH 0/8] fujitsu-laptop: use sparse keymaps for input event handling

2017-03-29 Thread Darren Hart
On Wed, Mar 29, 2017 at 07:35:50PM +0300, Andy Shevchenko wrote:
> On Wed, Mar 29, 2017 at 10:19 AM, Michał Kępień  wrote:
> 
> > Darren, Andy, in light of the above I will be awaiting your review of
> > this series.  I will submit v2 afterwards, with all remarks from both
> > you and Jonathan taken into account.
> 
> Darren marked this series under his name to review, so, I let him to
> speak for us.

The series looks good to me. Nice work Michał. They are logically divided and
address issues in a procedural way (so I stopped commenting until I read the
full series through as a couple of times you addressed a concern from a move in
a cleanup to follow).

I've applied the noted change to 7/8 and will run this through my tests, but
don't anticipate any problems. Jonathan, if you don't have any additional
concerns, let me know if I can add your Reviewed-by.

Thanks,

-- 
Darren Hart
VMware Open Source Technology Center


Re: [PATCH v4 1/2] module: verify address is read-only

2017-03-29 Thread Jessica Yu

+++ Eddie Kovsky [26/03/17 15:08 -0600]:

Implement a mechanism to check if a module's address is in
the rodata or ro_after_init sections. It mimics the existing functions
that test if an address is inside a module's text section.

Functions that take a module as an argument will be able to verify that the
module address is in a read-only section. The idea is to prevent structures
(including modules) that are not read-only from being passed to functions.

This implements the first half of a suggestion made by Kees Cook for
the Kernel Self Protection Project:

   - provide mechanism to check for ro_after_init memory areas, and
 reject structures not marked ro_after_init in vmbus_register()

Suggested-by: Kees Cook 
Signed-off-by: Eddie Kovsky 


Thanks for the respin, this looks much better.

Acked-by: Jessica Yu 


---
Changes in v4:
- Rename function __module_ro_address() to __module_rodata_address()
- Move module_rodata_address() stub below is_module_address()
- Minor comment fixes
- Verify that mod is not NULL before setting up layout variables
Changes in v3:
- Change function name is_module_ro_address() to
is_module_rodata_address().
- Improve comments on is_module_rodata_address().
- Add a !CONFIG_MODULES stub for is_module_rodata_address.
- Correct and simplify the check for the read-only memory regions in
the module address.
---
include/linux/module.h | 12 
kernel/module.c| 53 ++
2 files changed, 65 insertions(+)

diff --git a/include/linux/module.h b/include/linux/module.h
index 9ad68561d8c2..a3d17b081de3 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -492,7 +492,9 @@ static inline int module_is_live(struct module *mod)

struct module *__module_text_address(unsigned long addr);
struct module *__module_address(unsigned long addr);
+struct module *__module_rodata_address(unsigned long addr);
bool is_module_address(unsigned long addr);
+bool is_module_rodata_address(unsigned long addr);
bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr);
bool is_module_percpu_address(unsigned long addr);
bool is_module_text_address(unsigned long addr);
@@ -646,6 +648,11 @@ static inline struct module *__module_address(unsigned 
long addr)
return NULL;
}

+static inline struct module *__module_rodata_address(unsigned long addr)
+{
+   return NULL;
+}
+
static inline struct module *__module_text_address(unsigned long addr)
{
return NULL;
@@ -656,6 +663,11 @@ static inline bool is_module_address(unsigned long addr)
return false;
}

+static inline bool is_module_rodata_address(unsigned long addr)
+{
+   return false;
+}
+
static inline bool is_module_percpu_address(unsigned long addr)
{
return false;
diff --git a/kernel/module.c b/kernel/module.c
index 8ffcd29a4245..26bd62c61d87 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4292,6 +4292,59 @@ struct module *__module_text_address(unsigned long addr)
}
EXPORT_SYMBOL_GPL(__module_text_address);

+/**
+ * is_module_rodata_address - is this address inside read-only module data?
+ * @addr: the address to check.
+ *
+ */
+bool is_module_rodata_address(unsigned long addr)
+{
+   bool ret;
+
+   preempt_disable();
+   ret = __module_rodata_address(addr) != NULL;
+   preempt_enable();
+
+   return ret;
+}
+
+/*
+ * __module_rodata_address - get the module whose rodata/ro_after_init sections
+ * contain the given address.
+ * @addr: the address.
+ *
+ * Must be called with preempt disabled or module mutex held so that
+ * module doesn't get freed during this.
+ */
+struct module *__module_rodata_address(unsigned long addr)
+{
+   struct module *mod = __module_address(addr);
+
+   /*
+* Make sure module is within the read-only section.
+* range(base + text_size, base + ro_after_init_size)
+* encompasses both the rodata and ro_after_init regions.
+* See comment above frob_text() for the layout diagram.
+*/
+   if (mod) {
+   void *init_base = mod->init_layout.base;
+   unsigned int init_text_size = mod->init_layout.text_size;
+   unsigned int init_ro_after_init_size = 
mod->init_layout.ro_after_init_size;
+
+   void *core_base = mod->core_layout.base;
+   unsigned int core_text_size = mod->core_layout.text_size;
+   unsigned int core_ro_after_init_size = 
mod->core_layout.ro_after_init_size;
+
+   if (!within(addr, init_base + init_text_size,
+   init_ro_after_init_size - init_text_size)
+   && !within(addr, core_base + core_text_size,
+  core_ro_after_init_size - core_text_size))
+   mod = NULL;
+   }
+   return mod;
+}
+EXPORT_SYMBOL_GPL(__module_rodata_address);
+
/* Don't grab lock, we're oopsing. */
void 

Re: [PATCH v4 1/2] module: verify address is read-only

2017-03-29 Thread Jessica Yu

+++ Eddie Kovsky [26/03/17 15:08 -0600]:

Implement a mechanism to check if a module's address is in
the rodata or ro_after_init sections. It mimics the existing functions
that test if an address is inside a module's text section.

Functions that take a module as an argument will be able to verify that the
module address is in a read-only section. The idea is to prevent structures
(including modules) that are not read-only from being passed to functions.

This implements the first half of a suggestion made by Kees Cook for
the Kernel Self Protection Project:

   - provide mechanism to check for ro_after_init memory areas, and
 reject structures not marked ro_after_init in vmbus_register()

Suggested-by: Kees Cook 
Signed-off-by: Eddie Kovsky 


Thanks for the respin, this looks much better.

Acked-by: Jessica Yu 


---
Changes in v4:
- Rename function __module_ro_address() to __module_rodata_address()
- Move module_rodata_address() stub below is_module_address()
- Minor comment fixes
- Verify that mod is not NULL before setting up layout variables
Changes in v3:
- Change function name is_module_ro_address() to
is_module_rodata_address().
- Improve comments on is_module_rodata_address().
- Add a !CONFIG_MODULES stub for is_module_rodata_address.
- Correct and simplify the check for the read-only memory regions in
the module address.
---
include/linux/module.h | 12 
kernel/module.c| 53 ++
2 files changed, 65 insertions(+)

diff --git a/include/linux/module.h b/include/linux/module.h
index 9ad68561d8c2..a3d17b081de3 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -492,7 +492,9 @@ static inline int module_is_live(struct module *mod)

struct module *__module_text_address(unsigned long addr);
struct module *__module_address(unsigned long addr);
+struct module *__module_rodata_address(unsigned long addr);
bool is_module_address(unsigned long addr);
+bool is_module_rodata_address(unsigned long addr);
bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr);
bool is_module_percpu_address(unsigned long addr);
bool is_module_text_address(unsigned long addr);
@@ -646,6 +648,11 @@ static inline struct module *__module_address(unsigned 
long addr)
return NULL;
}

+static inline struct module *__module_rodata_address(unsigned long addr)
+{
+   return NULL;
+}
+
static inline struct module *__module_text_address(unsigned long addr)
{
return NULL;
@@ -656,6 +663,11 @@ static inline bool is_module_address(unsigned long addr)
return false;
}

+static inline bool is_module_rodata_address(unsigned long addr)
+{
+   return false;
+}
+
static inline bool is_module_percpu_address(unsigned long addr)
{
return false;
diff --git a/kernel/module.c b/kernel/module.c
index 8ffcd29a4245..26bd62c61d87 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4292,6 +4292,59 @@ struct module *__module_text_address(unsigned long addr)
}
EXPORT_SYMBOL_GPL(__module_text_address);

+/**
+ * is_module_rodata_address - is this address inside read-only module data?
+ * @addr: the address to check.
+ *
+ */
+bool is_module_rodata_address(unsigned long addr)
+{
+   bool ret;
+
+   preempt_disable();
+   ret = __module_rodata_address(addr) != NULL;
+   preempt_enable();
+
+   return ret;
+}
+
+/*
+ * __module_rodata_address - get the module whose rodata/ro_after_init sections
+ * contain the given address.
+ * @addr: the address.
+ *
+ * Must be called with preempt disabled or module mutex held so that
+ * module doesn't get freed during this.
+ */
+struct module *__module_rodata_address(unsigned long addr)
+{
+   struct module *mod = __module_address(addr);
+
+   /*
+* Make sure module is within the read-only section.
+* range(base + text_size, base + ro_after_init_size)
+* encompasses both the rodata and ro_after_init regions.
+* See comment above frob_text() for the layout diagram.
+*/
+   if (mod) {
+   void *init_base = mod->init_layout.base;
+   unsigned int init_text_size = mod->init_layout.text_size;
+   unsigned int init_ro_after_init_size = 
mod->init_layout.ro_after_init_size;
+
+   void *core_base = mod->core_layout.base;
+   unsigned int core_text_size = mod->core_layout.text_size;
+   unsigned int core_ro_after_init_size = 
mod->core_layout.ro_after_init_size;
+
+   if (!within(addr, init_base + init_text_size,
+   init_ro_after_init_size - init_text_size)
+   && !within(addr, core_base + core_text_size,
+  core_ro_after_init_size - core_text_size))
+   mod = NULL;
+   }
+   return mod;
+}
+EXPORT_SYMBOL_GPL(__module_rodata_address);
+
/* Don't grab lock, we're oopsing. */
void print_modules(void)
{
--
2.12.1



Re: [PATCH] virtio_console: fix uninitialized variable use

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:27 +0300, Michael S. Tsirkin wrote:

> Hi Mike
>   if you like, pls send me your Signed-off-by and I'll
>   change the patch to make you an author.

Nah, it's perfect as it is.  While I was pretty darn sure it was
generic, I intentionally posted it as diagnostic info.

>   More importantly, pls let me know whether this helps
>   resolve the issues about hybernation for you!

Well, it prevents kaboom with VIRTIO_CONSOLE_F_MULTIPORT turned off,
but no, it does nada wrt the warnings.

-Mike


Re: [PATCH] virtio_console: fix uninitialized variable use

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:27 +0300, Michael S. Tsirkin wrote:

> Hi Mike
>   if you like, pls send me your Signed-off-by and I'll
>   change the patch to make you an author.

Nah, it's perfect as it is.  While I was pretty darn sure it was
generic, I intentionally posted it as diagnostic info.

>   More importantly, pls let me know whether this helps
>   resolve the issues about hybernation for you!

Well, it prevents kaboom with VIRTIO_CONSOLE_F_MULTIPORT turned off,
but no, it does nada wrt the warnings.

-Mike


Re: [PATCH v2 0/8] thermal: ti-soc-thermal: Migrate slope/offset data to device tree

2017-03-29 Thread Keerthy


On Wednesday 29 March 2017 10:07 AM, Eduardo Valentin wrote:
> Keerthy,
> 
> On Fri, Mar 24, 2017 at 07:26:10AM -0700, Tony Lindgren wrote:
>> * Keerthy  [170323 20:29]:
>>>
>>>
>>> On Friday 24 March 2017 02:22 AM, Tony Lindgren wrote:
 * Keerthy  [170321 20:45]:
>
>
> On Thursday 09 March 2017 01:35 PM, Keerthy wrote:
>> Currently the slope and offset values for calculating the
>> hot spot temperature of a particular thermal zone is part
>> of driver data. Pass them here instead and obtain the values
>> while of node parsing.
>>
>> Tested for the slope and constant values on DRA7-EVM, OMAP3-BEAGLE. 
> 
> Have you tried on boards that need negative coefficients?
> 
> https://patchwork.kernel.org/patch/9619577/

Yes. I retrieved the negative values nicely in the driver passed via
Device Tree.

> 
>
> Hi Eduardo,
>
> If the series looks okay could you please pull this?

 Also.. Are the dts changes safe for me to pick separately?
>>>
>>> Yes Tony they are safe to pulled separately.
>>
>> OK applying patches 1 - 5 of this series into omap-for-v4.12/dt-v2.
> 
> Keerthy,
> 
> The only thing I want you to confirm is if you are really getting the
> negative coefficients, because currently of-thermal reads the array
> using an OF helper that understands only unsigned. For this reason, I
> will be queueing your patches only for next merge window, not as a fix,
> not for rc's.

I am getting negative co-efficients.

> 
> 
>>
>> Thanks,
>>
>> Tony


Re: [PATCH v2 0/8] thermal: ti-soc-thermal: Migrate slope/offset data to device tree

2017-03-29 Thread Keerthy


On Wednesday 29 March 2017 10:07 AM, Eduardo Valentin wrote:
> Keerthy,
> 
> On Fri, Mar 24, 2017 at 07:26:10AM -0700, Tony Lindgren wrote:
>> * Keerthy  [170323 20:29]:
>>>
>>>
>>> On Friday 24 March 2017 02:22 AM, Tony Lindgren wrote:
 * Keerthy  [170321 20:45]:
>
>
> On Thursday 09 March 2017 01:35 PM, Keerthy wrote:
>> Currently the slope and offset values for calculating the
>> hot spot temperature of a particular thermal zone is part
>> of driver data. Pass them here instead and obtain the values
>> while of node parsing.
>>
>> Tested for the slope and constant values on DRA7-EVM, OMAP3-BEAGLE. 
> 
> Have you tried on boards that need negative coefficients?
> 
> https://patchwork.kernel.org/patch/9619577/

Yes. I retrieved the negative values nicely in the driver passed via
Device Tree.

> 
>
> Hi Eduardo,
>
> If the series looks okay could you please pull this?

 Also.. Are the dts changes safe for me to pick separately?
>>>
>>> Yes Tony they are safe to pulled separately.
>>
>> OK applying patches 1 - 5 of this series into omap-for-v4.12/dt-v2.
> 
> Keerthy,
> 
> The only thing I want you to confirm is if you are really getting the
> negative coefficients, because currently of-thermal reads the array
> using an OF helper that understands only unsigned. For this reason, I
> will be queueing your patches only for next merge window, not as a fix,
> not for rc's.

I am getting negative co-efficients.

> 
> 
>>
>> Thanks,
>>
>> Tony


[PATCH 0/5] firmware: move UMH locks onto fallback code

2017-03-29 Thread Luis R. Rodriguez
Greg,

One of the eyesores on the old API was the use of the UMH lock even when
we don't use any of the usermode helpers. It took quite a bit of git
archeology to draw up a solution which makes me feel comfortable in
moving this code out of the way given it may have added new protections
we never knew about. A resolution is to make similar protections for
direct FS lookups for now, and later after a release or so we can
consider removing them. There is further possible work to do to clean
up the UMH pollution on external code, but this can be done separately
and is more a generic kernel/kmod.c thing than firmware thing.

This changes makes subsequent patches able to fold the new driver data
API onto the older firmware API. These all go hammered with 0-day and
the firmware test scripts shipped upstream.

Luis R. Rodriguez (5):
  firmware: share fw fallback killing on reboot/suspend
  firmware: always enable the reboot notifier
  firmware: add sanity check on shutdown/suspend
  firmware: move assign_firmware_buf() further up
  firmware: move umh try locks into the umh code

 .../driver-api/firmware/request_firmware.rst   |  11 +
 drivers/base/firmware_class.c  | 293 ++---
 2 files changed, 207 insertions(+), 97 deletions(-)

-- 
2.11.0



[PATCH 0/5] firmware: move UMH locks onto fallback code

2017-03-29 Thread Luis R. Rodriguez
Greg,

One of the eyesores on the old API was the use of the UMH lock even when
we don't use any of the usermode helpers. It took quite a bit of git
archeology to draw up a solution which makes me feel comfortable in
moving this code out of the way given it may have added new protections
we never knew about. A resolution is to make similar protections for
direct FS lookups for now, and later after a release or so we can
consider removing them. There is further possible work to do to clean
up the UMH pollution on external code, but this can be done separately
and is more a generic kernel/kmod.c thing than firmware thing.

This changes makes subsequent patches able to fold the new driver data
API onto the older firmware API. These all go hammered with 0-day and
the firmware test scripts shipped upstream.

Luis R. Rodriguez (5):
  firmware: share fw fallback killing on reboot/suspend
  firmware: always enable the reboot notifier
  firmware: add sanity check on shutdown/suspend
  firmware: move assign_firmware_buf() further up
  firmware: move umh try locks into the umh code

 .../driver-api/firmware/request_firmware.rst   |  11 +
 drivers/base/firmware_class.c  | 293 ++---
 2 files changed, 207 insertions(+), 97 deletions(-)

-- 
2.11.0



Re: [PATCH v2 3/8] ARM: OMAP5: Thermal: Add slope and offset values

2017-03-29 Thread Keerthy


On Wednesday 29 March 2017 10:03 AM, Eduardo Valentin wrote:
> On Thu, Mar 09, 2017 at 01:35:57PM +0530, Keerthy wrote:
>> Currently the slope and offset values for calculating the
>> hot spot temperature of a particular thermal zone is part
>> of driver data. Pass them here instead and obtain the values
>> while of node parsing.
> 
> 
> The patch is fine.. but
> 
>>
>> Signed-off-by: Keerthy 
>> ---
>>  arch/arm/boot/dts/omap5.dtsi | 9 +
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/arm/boot/dts/omap5.dtsi b/arch/arm/boot/dts/omap5.dtsi
>> index 222155c..eaff2a5 100644
>> --- a/arch/arm/boot/dts/omap5.dtsi
>> +++ b/arch/arm/boot/dts/omap5.dtsi
>> @@ -1127,6 +1127,15 @@
>>  
>>  _thermal {
>>  polling-delay = <500>; /* milliseconds */
>> +coefficients = <65 (-1791)>;
> 
> I suppose you tried this change with this patch:
> https://patchwork.kernel.org/patch/9619577/
> ?
> 
> Otherwise, I do not see how your coeff would work, right?

IIRC i did not have that patch.

But No issues there. I did try deliberately with negative values and i
retrieved the negative values in driver nicely.

Fox ex: -20 unsigned when retrieved in driver as an int coeff will give
me -20. I checked with multiple negative co-efficients and i retrieved
the passed negative co-efficients in driver from DT.

Regards,
Keerthy


> 
>>  };
>>  
>>  /include/ "omap54xx-clocks.dtsi"
>> +
>> +_thermal {
>> +coefficients = <117 (-2992)>;
>> +};
>> +
>> +_thermal {
>> +coefficients = <0 2000>;
>> +};
>> -- 
>> 1.9.1
>>


Re: [PATCH v2 3/8] ARM: OMAP5: Thermal: Add slope and offset values

2017-03-29 Thread Keerthy


On Wednesday 29 March 2017 10:03 AM, Eduardo Valentin wrote:
> On Thu, Mar 09, 2017 at 01:35:57PM +0530, Keerthy wrote:
>> Currently the slope and offset values for calculating the
>> hot spot temperature of a particular thermal zone is part
>> of driver data. Pass them here instead and obtain the values
>> while of node parsing.
> 
> 
> The patch is fine.. but
> 
>>
>> Signed-off-by: Keerthy 
>> ---
>>  arch/arm/boot/dts/omap5.dtsi | 9 +
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/arm/boot/dts/omap5.dtsi b/arch/arm/boot/dts/omap5.dtsi
>> index 222155c..eaff2a5 100644
>> --- a/arch/arm/boot/dts/omap5.dtsi
>> +++ b/arch/arm/boot/dts/omap5.dtsi
>> @@ -1127,6 +1127,15 @@
>>  
>>  _thermal {
>>  polling-delay = <500>; /* milliseconds */
>> +coefficients = <65 (-1791)>;
> 
> I suppose you tried this change with this patch:
> https://patchwork.kernel.org/patch/9619577/
> ?
> 
> Otherwise, I do not see how your coeff would work, right?

IIRC i did not have that patch.

But No issues there. I did try deliberately with negative values and i
retrieved the negative values in driver nicely.

Fox ex: -20 unsigned when retrieved in driver as an int coeff will give
me -20. I checked with multiple negative co-efficients and i retrieved
the passed negative co-efficients in driver from DT.

Regards,
Keerthy


> 
>>  };
>>  
>>  /include/ "omap54xx-clocks.dtsi"
>> +
>> +_thermal {
>> +coefficients = <117 (-2992)>;
>> +};
>> +
>> +_thermal {
>> +coefficients = <0 2000>;
>> +};
>> -- 
>> 1.9.1
>>


[PATCH 3/5] firmware: add sanity check on shutdown/suspend

2017-03-29 Thread Luis R. Rodriguez
The firmware API should not be used after we go to suspend
and after we reboot/halt. The suspend/resume case is a bit
complex, so this documents that so things are clearer.

We want to know about users of the API in incorrect places so
that their callers are corrected, so this also adds a warn
for those cases.

Signed-off-by: Luis R. Rodriguez 
---
 .../driver-api/firmware/request_firmware.rst   | 11 +++
 drivers/base/firmware_class.c  | 96 ++
 2 files changed, 107 insertions(+)

diff --git a/Documentation/driver-api/firmware/request_firmware.rst 
b/Documentation/driver-api/firmware/request_firmware.rst
index cc0aea880824..1c2c4967cd43 100644
--- a/Documentation/driver-api/firmware/request_firmware.rst
+++ b/Documentation/driver-api/firmware/request_firmware.rst
@@ -44,6 +44,17 @@ request_firmware_nowait
 .. kernel-doc:: drivers/base/firmware_class.c
:functions: request_firmware_nowait
 
+Considerations for suspend and resume
+=
+
+During suspend and resume only the built-in firmware and the firmware cache
+elements of the firmware API can be used. This is managed by fw_pm_notify().
+
+fw_pm_notify
+
+.. kernel-doc:: drivers/base/firmware_class.c
+   :functions: fw_pm_notify
+
 request firmware API expected driver use
 
 
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index e8edf22536c1..f7ac619635b4 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -260,6 +260,38 @@ static int fw_cache_piggyback_on_request(const char *name);
  * guarding for corner cases a global lock should be OK */
 static DEFINE_MUTEX(fw_lock);
 
+static bool __enable_firmware = false;
+
+static void enable_firmware(void)
+{
+   mutex_lock(_lock);
+   __enable_firmware = true;
+   mutex_unlock(_lock);
+}
+
+static void disable_firmware(void)
+{
+   mutex_lock(_lock);
+   __enable_firmware = false;
+   mutex_unlock(_lock);
+}
+
+/*
+ * When disabled only the built-in firmware and the firmware cache will be
+ * used to look for firmware.
+ */
+static bool firmware_enabled(void)
+{
+   bool enabled = false;
+
+   mutex_lock(_lock);
+   if (__enable_firmware)
+   enabled = true;
+   mutex_unlock(_lock);
+
+   return enabled;
+}
+
 static struct firmware_cache fw_cache;
 
 static struct firmware_buf *__allocate_fw_buf(const char *fw_name,
@@ -1165,6 +1197,12 @@ _request_firmware(const struct firmware **firmware_p, 
const char *name,
if (ret <= 0) /* error or already assigned */
goto out;
 
+   if (!firmware_enabled()) {
+   WARN(1, "firmware request while host is not available\n");
+   ret = -EHOSTDOWN;
+   goto out;
+   }
+
ret = 0;
timeout = firmware_loading_timeout();
if (opt_flags & FW_OPT_NOWAIT) {
@@ -1697,6 +1735,59 @@ static void device_uncache_fw_images_delay(unsigned long 
delay)
   msecs_to_jiffies(delay));
 }
 
+/**
+ * fw_pm_notify - notifier for suspend/resume
+ *
+ * Used to modify the firmware_class state as we move in between states.
+ * The firmware_class implements a firmware cache to enable device driver
+ * to fetch firmware upon resume before the root filesystem is ready. We
+ * disable API calls which do not use the built-in firmware or the firmware
+ * cache when we know these calls will not work.
+ *
+ * The inner logic behind all this is a bit complex so it is worth summarizing
+ * the kernel's own suspend/resume process with context and focus on how this
+ * can impact the firmware API.
+ *
+ * First a review on how we go to suspend:
+ *
+ * pm_suspend() --> enter_state() -->
+ * sys_sync()
+ * suspend_prepare() -->
+ * __pm_notifier_call_chain(PM_SUSPEND_PREPARE, ...);
+ * suspend_freeze_processes() -->
+ * freeze_processes() -->
+ * 
__usermodehelper_set_disable_depth(UMH_DISABLED);
+ * freeze all tasks ...
+ * freeze_kernel_threads()
+ * suspend_devices_and_enter() -->
+ * dpm_suspend_start() -->
+ * dpm_prepare()
+ * dpm_suspend()
+ * suspend_enter()  -->
+ * platform_suspend_prepare()
+ * dpm_suspend_late()
+ * freeze_enter()
+ * syscore_suspend()
+ *
+ * When we resume we bail out of a loop from suspend_devices_and_enter() and
+ * unwind back out to the caller enter_state() where we were before as follows:
+ *
+ * enter_state() -->
+ * suspend_devices_and_enter() --> (bail from loop)
+ * dpm_resume_end() -->
+ * dpm_resume()
+ * dpm_complete()
+ * suspend_finish() -->
+ *

[PATCH 3/5] firmware: add sanity check on shutdown/suspend

2017-03-29 Thread Luis R. Rodriguez
The firmware API should not be used after we go to suspend
and after we reboot/halt. The suspend/resume case is a bit
complex, so this documents that so things are clearer.

We want to know about users of the API in incorrect places so
that their callers are corrected, so this also adds a warn
for those cases.

Signed-off-by: Luis R. Rodriguez 
---
 .../driver-api/firmware/request_firmware.rst   | 11 +++
 drivers/base/firmware_class.c  | 96 ++
 2 files changed, 107 insertions(+)

diff --git a/Documentation/driver-api/firmware/request_firmware.rst 
b/Documentation/driver-api/firmware/request_firmware.rst
index cc0aea880824..1c2c4967cd43 100644
--- a/Documentation/driver-api/firmware/request_firmware.rst
+++ b/Documentation/driver-api/firmware/request_firmware.rst
@@ -44,6 +44,17 @@ request_firmware_nowait
 .. kernel-doc:: drivers/base/firmware_class.c
:functions: request_firmware_nowait
 
+Considerations for suspend and resume
+=
+
+During suspend and resume only the built-in firmware and the firmware cache
+elements of the firmware API can be used. This is managed by fw_pm_notify().
+
+fw_pm_notify
+
+.. kernel-doc:: drivers/base/firmware_class.c
+   :functions: fw_pm_notify
+
 request firmware API expected driver use
 
 
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index e8edf22536c1..f7ac619635b4 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -260,6 +260,38 @@ static int fw_cache_piggyback_on_request(const char *name);
  * guarding for corner cases a global lock should be OK */
 static DEFINE_MUTEX(fw_lock);
 
+static bool __enable_firmware = false;
+
+static void enable_firmware(void)
+{
+   mutex_lock(_lock);
+   __enable_firmware = true;
+   mutex_unlock(_lock);
+}
+
+static void disable_firmware(void)
+{
+   mutex_lock(_lock);
+   __enable_firmware = false;
+   mutex_unlock(_lock);
+}
+
+/*
+ * When disabled only the built-in firmware and the firmware cache will be
+ * used to look for firmware.
+ */
+static bool firmware_enabled(void)
+{
+   bool enabled = false;
+
+   mutex_lock(_lock);
+   if (__enable_firmware)
+   enabled = true;
+   mutex_unlock(_lock);
+
+   return enabled;
+}
+
 static struct firmware_cache fw_cache;
 
 static struct firmware_buf *__allocate_fw_buf(const char *fw_name,
@@ -1165,6 +1197,12 @@ _request_firmware(const struct firmware **firmware_p, 
const char *name,
if (ret <= 0) /* error or already assigned */
goto out;
 
+   if (!firmware_enabled()) {
+   WARN(1, "firmware request while host is not available\n");
+   ret = -EHOSTDOWN;
+   goto out;
+   }
+
ret = 0;
timeout = firmware_loading_timeout();
if (opt_flags & FW_OPT_NOWAIT) {
@@ -1697,6 +1735,59 @@ static void device_uncache_fw_images_delay(unsigned long 
delay)
   msecs_to_jiffies(delay));
 }
 
+/**
+ * fw_pm_notify - notifier for suspend/resume
+ *
+ * Used to modify the firmware_class state as we move in between states.
+ * The firmware_class implements a firmware cache to enable device driver
+ * to fetch firmware upon resume before the root filesystem is ready. We
+ * disable API calls which do not use the built-in firmware or the firmware
+ * cache when we know these calls will not work.
+ *
+ * The inner logic behind all this is a bit complex so it is worth summarizing
+ * the kernel's own suspend/resume process with context and focus on how this
+ * can impact the firmware API.
+ *
+ * First a review on how we go to suspend:
+ *
+ * pm_suspend() --> enter_state() -->
+ * sys_sync()
+ * suspend_prepare() -->
+ * __pm_notifier_call_chain(PM_SUSPEND_PREPARE, ...);
+ * suspend_freeze_processes() -->
+ * freeze_processes() -->
+ * 
__usermodehelper_set_disable_depth(UMH_DISABLED);
+ * freeze all tasks ...
+ * freeze_kernel_threads()
+ * suspend_devices_and_enter() -->
+ * dpm_suspend_start() -->
+ * dpm_prepare()
+ * dpm_suspend()
+ * suspend_enter()  -->
+ * platform_suspend_prepare()
+ * dpm_suspend_late()
+ * freeze_enter()
+ * syscore_suspend()
+ *
+ * When we resume we bail out of a loop from suspend_devices_and_enter() and
+ * unwind back out to the caller enter_state() where we were before as follows:
+ *
+ * enter_state() -->
+ * suspend_devices_and_enter() --> (bail from loop)
+ * dpm_resume_end() -->
+ * dpm_resume()
+ * dpm_complete()
+ * suspend_finish() -->
+ * 

Re: [PATCH v4 2/2] extable: verify address is read-only

2017-03-29 Thread Jessica Yu

+++ Eddie Kovsky [28/03/17 21:28 -0600]:

On 03/27/17, Kees Cook wrote:

On Mon, Mar 27, 2017 at 1:43 AM, kbuild test robot  wrote:
> Hi Eddie,
>
> [auto build test ERROR on next-20170323]
> [cannot apply to linus/master linux/master jeyu/modules-next v4.9-rc8 
v4.9-rc7 v4.9-rc6 v4.11-rc4]
> [if your patch is applied to the wrong git tree, please drop us a note to 
help improve the system]
>
> url:
https://github.com/0day-ci/linux/commits/Eddie-Kovsky/module-verify-address-is-read-only/20170327-142922
> config: blackfin-BF561-EZKIT-SMP_defconfig (attached as .config)
> compiler: bfin-uclinux-gcc (GCC) 6.2.0
> reproduce:
> wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=blackfin
>
> All errors (new ones prefixed by >>):
>
>kernel/built-in.o: In function `core_kernel_rodata':
>>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init'

Hm, I'm confused about this. blackfin includes
include/asm-generic-vmlinux.lds.h and uses the RO_DATA macro (which
resolves to RO_DATA_SECTION to RO_AFTER_INIT_DATA which defines
__[start|end]_data_ro_after_init.

Also, it seems that commit d7c19b066dcf4bd19c4385e8065558d4e74f9e73
("mm: kmemleak: scan .data.ro_after_init") added a potentially
redundant section name (s390 already calls this
__[start|end]_ro_after_init). I'd like to get this cleaned up, since
having multiple names for the same thing is confusing:

diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index 000e6e91f6a0..3667d20e997f 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -62,9 +62,11 @@ SECTIONS

. = ALIGN(PAGE_SIZE);
__start_ro_after_init = .;
+   __start_data_ro_after_init = .;
.data..ro_after_init : {
 *(.data..ro_after_init)
}
+   __end_data_ro_after_init = .;
EXCEPTION_TABLE(16)
. = ALIGN(PAGE_SIZE);
__end_ro_after_init = .;

And it seems that this hunk is wrong (__end_ro_after_init includes
s390's exception table, etc). I think we should remove the
..._data_... name and use s390's name.

I'll send an adjustment patch, but we'll still need to deal with blackfin.

-Kees



Kees

I applied your patch (mm: fix section name for .data..ro_after_init) and
changed the new function in extable.c to use __[start|end]_ro_after_init
instead. The new version still builds without errors on x86, which isn't
surprising.

I've cross compiled this for blackfin and I'm able to reproduce the
build error. I'm still not sure why. As you pointed out, blackfin does
appear to use 'include/asm-generic/vmlinux.lds.h'.


This appears to be because blackfin is one of the 2 arches that
prepends an underscore '_' to all symbols defined in C. I noticed that
__{start,end}_data_ro_after_init in vmlinux.lds.h are not wrapped with
VMLINUX_SYMBOL() which adds the necessary prefix for arches that have
HAVE_UNDERSCORE_SYMBOL_PREFIX, hence the undefined reference.

The below patch fixed the build error for me, if it works for you then
I can send a formal patch.

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 4e09b28..7b262f7 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -260,9 +260,9 @@
 */
#ifndef RO_AFTER_INIT_DATA
#define RO_AFTER_INIT_DATA  \
-   __start_data_ro_after_init = .; \
+   VMLINUX_SYMBOL(__start_data_ro_after_init) = .; \
*(.data..ro_after_init) \
-   __end_data_ro_after_init = .;
+   VMLINUX_SYMBOL(__end_data_ro_after_init) = .;
#endif

/*

Jessica


Re: [PATCH v4 2/2] extable: verify address is read-only

2017-03-29 Thread Jessica Yu

+++ Eddie Kovsky [28/03/17 21:28 -0600]:

On 03/27/17, Kees Cook wrote:

On Mon, Mar 27, 2017 at 1:43 AM, kbuild test robot  wrote:
> Hi Eddie,
>
> [auto build test ERROR on next-20170323]
> [cannot apply to linus/master linux/master jeyu/modules-next v4.9-rc8 
v4.9-rc7 v4.9-rc6 v4.11-rc4]
> [if your patch is applied to the wrong git tree, please drop us a note to 
help improve the system]
>
> url:
https://github.com/0day-ci/linux/commits/Eddie-Kovsky/module-verify-address-is-read-only/20170327-142922
> config: blackfin-BF561-EZKIT-SMP_defconfig (attached as .config)
> compiler: bfin-uclinux-gcc (GCC) 6.2.0
> reproduce:
> wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=blackfin
>
> All errors (new ones prefixed by >>):
>
>kernel/built-in.o: In function `core_kernel_rodata':
>>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init'
>>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init'

Hm, I'm confused about this. blackfin includes
include/asm-generic-vmlinux.lds.h and uses the RO_DATA macro (which
resolves to RO_DATA_SECTION to RO_AFTER_INIT_DATA which defines
__[start|end]_data_ro_after_init.

Also, it seems that commit d7c19b066dcf4bd19c4385e8065558d4e74f9e73
("mm: kmemleak: scan .data.ro_after_init") added a potentially
redundant section name (s390 already calls this
__[start|end]_ro_after_init). I'd like to get this cleaned up, since
having multiple names for the same thing is confusing:

diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index 000e6e91f6a0..3667d20e997f 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -62,9 +62,11 @@ SECTIONS

. = ALIGN(PAGE_SIZE);
__start_ro_after_init = .;
+   __start_data_ro_after_init = .;
.data..ro_after_init : {
 *(.data..ro_after_init)
}
+   __end_data_ro_after_init = .;
EXCEPTION_TABLE(16)
. = ALIGN(PAGE_SIZE);
__end_ro_after_init = .;

And it seems that this hunk is wrong (__end_ro_after_init includes
s390's exception table, etc). I think we should remove the
..._data_... name and use s390's name.

I'll send an adjustment patch, but we'll still need to deal with blackfin.

-Kees



Kees

I applied your patch (mm: fix section name for .data..ro_after_init) and
changed the new function in extable.c to use __[start|end]_ro_after_init
instead. The new version still builds without errors on x86, which isn't
surprising.

I've cross compiled this for blackfin and I'm able to reproduce the
build error. I'm still not sure why. As you pointed out, blackfin does
appear to use 'include/asm-generic/vmlinux.lds.h'.


This appears to be because blackfin is one of the 2 arches that
prepends an underscore '_' to all symbols defined in C. I noticed that
__{start,end}_data_ro_after_init in vmlinux.lds.h are not wrapped with
VMLINUX_SYMBOL() which adds the necessary prefix for arches that have
HAVE_UNDERSCORE_SYMBOL_PREFIX, hence the undefined reference.

The below patch fixed the build error for me, if it works for you then
I can send a formal patch.

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 4e09b28..7b262f7 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -260,9 +260,9 @@
 */
#ifndef RO_AFTER_INIT_DATA
#define RO_AFTER_INIT_DATA  \
-   __start_data_ro_after_init = .; \
+   VMLINUX_SYMBOL(__start_data_ro_after_init) = .; \
*(.data..ro_after_init) \
-   __end_data_ro_after_init = .;
+   VMLINUX_SYMBOL(__end_data_ro_after_init) = .;
#endif

/*

Jessica


[PATCH] usb: host: plat: Enable xHCI plat runtime PM

2017-03-29 Thread Baolin Wang
Enable the xHCI plat runtime PM for parent device to suspend/resume
xHCI. Also call pm_runtime_forbid() in probe() function to force users
to explicitly enable runtime pm using power/control in sysfs, in case
some parent devices didn't implement runtime PM callbacks.

Signed-off-by: Baolin Wang 
---
 drivers/usb/host/xhci-plat.c |   54 --
 1 file changed, 47 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index bd02a6c..2036c24 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -179,9 +179,15 @@ static int xhci_plat_probe(struct platform_device *pdev)
return ret;
}
 
+   pm_runtime_set_active(>dev);
+   pm_runtime_enable(>dev);
+   pm_runtime_get_noresume(>dev);
+
hcd = usb_create_hcd(driver, >dev, dev_name(>dev));
-   if (!hcd)
-   return -ENOMEM;
+   if (!hcd) {
+   ret = -ENOMEM;
+   goto disable_runtime;
+   }
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
hcd->regs = devm_ioremap_resource(>dev, res);
@@ -258,6 +264,14 @@ static int xhci_plat_probe(struct platform_device *pdev)
if (ret)
goto dealloc_usb2_hcd;
 
+   pm_runtime_put_noidle(>dev);
+
+   /*
+* Prevent runtime pm from being on as default, users should enable
+* runtime pm using power/control in sysfs.
+*/
+   pm_runtime_forbid(>dev);
+
return 0;
 
 
@@ -277,6 +291,10 @@ static int xhci_plat_probe(struct platform_device *pdev)
 put_hcd:
usb_put_hcd(hcd);
 
+disable_runtime:
+   pm_runtime_put_noidle(>dev);
+   pm_runtime_disable(>dev);
+
return ret;
 }
 
@@ -298,6 +316,9 @@ static int xhci_plat_remove(struct platform_device *dev)
clk_disable_unprepare(clk);
usb_put_hcd(hcd);
 
+   pm_runtime_set_suspended(>dev);
+   pm_runtime_disable(>dev);
+
return 0;
 }
 
@@ -325,14 +346,33 @@ static int xhci_plat_resume(struct device *dev)
 
return xhci_resume(xhci, 0);
 }
+#endif /* CONFIG_PM_SLEEP */
+
+#ifdef CONFIG_PM
+static int xhci_plat_runtime_suspend(struct device *dev)
+{
+   struct usb_hcd  *hcd = dev_get_drvdata(dev);
+   struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+
+   return xhci_suspend(xhci, device_may_wakeup(dev));
+}
+
+static int xhci_plat_runtime_resume(struct device *dev)
+{
+   struct usb_hcd  *hcd = dev_get_drvdata(dev);
+   struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+
+   return xhci_resume(xhci, 0);
+}
+#endif /* CONFIG_PM */
 
 static const struct dev_pm_ops xhci_plat_pm_ops = {
SET_SYSTEM_SLEEP_PM_OPS(xhci_plat_suspend, xhci_plat_resume)
+
+   SET_RUNTIME_PM_OPS(xhci_plat_runtime_suspend,
+  xhci_plat_runtime_resume,
+  NULL)
 };
-#define DEV_PM_OPS (_plat_pm_ops)
-#else
-#define DEV_PM_OPS NULL
-#endif /* CONFIG_PM */
 
 static const struct acpi_device_id usb_xhci_acpi_match[] = {
/* XHCI-compliant USB Controller */
@@ -346,7 +386,7 @@ static int xhci_plat_resume(struct device *dev)
.remove = xhci_plat_remove,
.driver = {
.name = "xhci-hcd",
-   .pm = DEV_PM_OPS,
+   .pm = _plat_pm_ops,
.of_match_table = of_match_ptr(usb_xhci_of_match),
.acpi_match_table = ACPI_PTR(usb_xhci_acpi_match),
},
-- 
1.7.9.5



Re: [RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask

2017-03-29 Thread Oza Oza
On Wed, Mar 29, 2017 at 10:13 AM, Oza Oza  wrote:
> On Tue, Mar 28, 2017 at 7:59 PM, Robin Murphy  wrote:
>> On 28/03/17 06:27, Oza Oza wrote:
>>> On Mon, Mar 27, 2017 at 8:16 PM, Rob Herring  wrote:
 On Sat, Mar 25, 2017 at 12:31 AM, Oza Pawandeep  
 wrote:
> it is possible that PCI device supports 64-bit DMA addressing,
> and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
> however PCI host bridge may have limitations on the inbound
> transaction addressing. As an example, consider NVME SSD device
> connected to iproc-PCIe controller.
>
> Currently, the IOMMU DMA ops only considers PCI device dma_mask
> when allocating an IOVA. This is particularly problematic on
> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
> PA for in-bound transactions only after PCI Host has forwarded
> these transactions on SOC IO bus. This means on such ARM/ARM64
> SOCs the IOVA of in-bound transactions has to honor the addressing
> restrictions of the PCI Host.
>
> current pcie frmework and of framework integration assumes dma-ranges
> in a way where memory-mapped devices define their dma-ranges.
> dma-ranges: (child-bus-address, parent-bus-address, length).
>
> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;

 If you implement a common function, then I expect to see other users
 converted to use it. There's also PCI hosts in arch/powerpc that parse
 dma-ranges.
>>>
>>> the common function should be similar to what
>>> of_pci_get_host_bridge_resources is doing right now.
>>> it parses ranges property right now.
>>>
>>> the new function would look look following.
>>>
>>> of_pci_get_dma_ranges(struct device_node *dev, struct list_head *resources)
>>> where resources would return the dma-ranges.
>>>
>>> but right now if you see the patch, of_dma_configure calls the new
>>> function, which actually returns the largest possible size.
>>> so this new function has to be generic in a way where other PCI hosts
>>> can use it. but certainly iproc(Broadcom SOC) , rcar based SOCs can
>>> use it for sure.
>>>
>>> although having powerpc using it;  is a separate exercise, since I do
>>> not have any access to other PCI hosts such as powerpc. but we can
>>> workout with them on thsi forum if required.
>>>
>>> so overall, of_pci_get_dma_ranges has to serve following 2 purposes.
>>>
>>> 1) it has to return largest possible size to of_dma_configure to
>>> generate largest possible dma_mask.
>>>
>>> 2) it also has to return resources (dma-ranges) parsed, to the users.
>>>
>>> so to address above needs
>>>
>>> of_pci_get_dma_ranges(struct device_node *dev, struct list_head
>>> *resources, u64 *size)
>>>
>>> dev -> device node.
>>> resources -> dma-ranges in allocated list.
>>> size -> highest possible size to generate possible dma_mask for
>>> of_dma_configure.
>>>
>>> let em know how this sounds.
>>
>> Note that the point of passing PCI host bridges into of_dma_configure()
>> in the first place was to avoid having some separate PCI-specific path
>> for DMA configuration. I worry that introducing bus-specific dma-ranges
>> parsing largely defeats that, since we end up with the worst of both
>> worlds; effectively-duplicated code, and/or a load of extra complexity
>> to then attempt to reconverge the divergent paths (there really
>> shouldn't be any need to allocate a list of anything). Given that
>> of_translate_dma_address() is already bus-agnostic, it hardly seems
>> justifiable for its caller not to be so as well, especially when it
>> mostly just comes down to getting the right #address-cells value.
>>
>> The patch below is actually enough to make typical cases work, but is
>> vile, so I'm not seriously considering it (hence I've not bothered
>> making IOMMU configuration handle all circumstances). What it has served
>> to do, though, is give me a clear idea of how to properly sort out the
>> not-quite-right device/parent assumptions between of_dma_configure() and
>> of_dma_get_range() rather than bodging around them any further - stay tuned.
>>
>> Robin.
>>
>> ->8-
>> From: Robin Murphy 
>> Subject: [PATCH] of/pci: Use child node for DMA configuration
>>
>> of_dma_configure() expects to be passed an OF node representing the
>> device being configured - for PCI devices we currently pass the node of
>> the appropriate host controller, which sort of works for inherited
>> properties which may appear at any level, like "dma-coherent", but falls
>> apart for properties which actually care about specific device-parent
>> relationships, like "dma-ranges".
>>
>> Solve this by attempting to find a suitable child node if the PCI
>> hierarchy is actually represented in DT, and if not then faking one up
>> as a last resort, to 

  1   2   3   4   5   6   7   8   9   10   >