Re: [PATCH 3/7] thermal/drivers/core: Add init section table for self-encapsulation

2019-04-22 Thread Zhang Rui
Hi, Daniel,

thanks for clarifying.
It is true that we need to make thermal framework ready as early as
possible. And a static table works for me as long as vmlinux.lds.h is
the proper place.

Arnd,
are you okay with this patch? if yes, I suppose I can take it through
my tree, right?

thanks,
rui

On 一, 2019-04-22 at 14:11 +0200, Daniel Lezcano wrote:
> Hi Zhang,
> 
> 
> On 22/04/2019 10:43, Zhang Rui wrote:
> > 
> > Hi, Daniel,
> > 
> > Thanks for the patches, it looks good to me except this one and
> > patch
> > 4/7.
> > 
> > First, I don't think this is a cyclic dependency issue as they are
> > in
> > the same module.
> The governors have to export their [un]register functions in order to
> have the core to use them.
> 
> The core has to export the [un]register function in order to have the
> governors to use them.
> 
> From my point of view it is a cyclic dependency. In any other
> subsystems, the plugins/governor/drivers/whatever don't have to
> export
> their functions to the core, they use the core's exported functions.
> 
> > 
> > Second, I have not read include/asm-generic/vmlinux.lds.h before,
> > it
> > seems that it is used for architecture specific stuff. Fix a
> > thermal
> > issue in this way seems overkill to me.
> It is not architecture specific, it belongs to asm-generic. All init
> calls are defined in it and more. It is a common way to define static
> tables from different files without adding dependency and unload it
> after init.
> 
> All clk, timers, acpi tables, irq chip, cpuidle and cpu methods are
> defined this way.
> 
> When the thermal_core.c uses at the end fs_initcall it uses the same
> mechanism.
> 
> 
> > 
> > IMO, to make the code clean, we can build the governors as separate
> > modules just like we do for cpu governors.
> > This brings to the old commit 80a26a5c22b9("Thermal: build thermal
> > governors into thermal_sys module"), and that was introduced to fix
> > a
> > problem when CONFIG_THERMAL is set to 'm'. So I think we can switch
> > back to the old way as the problem is gone now.
> > 
> > what do you think?
> IMO, having the governors built as module is not a good thing because
> the SoC needs the governor to be ready as soon as possible at boot
> time.
> I've been told some boards reboot at boot time because the governor
> comes too late with the userspace governor for example.
> 
> If you don't like the vmlinuz.lds.h approch (but again it is a common
> way to initialize table and I wrote it to extend to more thermal
> table
> in the future) we can change the thermal core to replace
> fs_initcall()
> by core_initcall() and use postcore_initcall() in the governor.
> 
> 
> 
> > 
> > Patch 1,2,5,6,7 applied first.
> > 
> > thanks,
> > rui
> > 
> > On 二, 2019-04-02 at 18:12 +0200, Daniel Lezcano wrote:
> > > 
> > > Currently the governors are declared in their respective files
> > > but
> > > they
> > > export their [un]register functions which in turn call the
> > > [un]register
> > > the governors core's functions. That implies a cyclic dependency
> > > which is
> > > not desirable. There is a way to self-encapsulate the governors
> > > by
> > > letting
> > > them to declare themselves in a __init section table.
> > > 
> > > Define the table in the asm generic linker description like the
> > > other
> > > tables and provide the specific macros to deal with.
> > > 
> > > Signed-off-by: Daniel Lezcano 
> > > ---
> > >  drivers/thermal/thermal_core.h| 16 
> > >  include/asm-generic/vmlinux.lds.h | 11 +++
> > >  2 files changed, 27 insertions(+)
> > > 
> > > diff --git a/drivers/thermal/thermal_core.h
> > > b/drivers/thermal/thermal_core.h
> > > index 0df190ed82a7..28d18083e969 100644
> > > --- a/drivers/thermal/thermal_core.h
> > > +++ b/drivers/thermal/thermal_core.h
> > > @@ -15,6 +15,22 @@
> > >  /* Initial state of a cooling device during binding */
> > >  #define THERMAL_NO_TARGET -1UL
> > >  
> > > +/* Init section thermal table */
> > > +extern struct thermal_governor * __governor_thermal_table[];
> > > +extern struct thermal_governor * __governor_thermal_table_end[];
> > > +
> > > +#define THERMAL_TABLE_ENTRY(table, name) 
> > > \
> > > +static typeof(name) * __thermal_table_entry_##name   
> > > \
> > > + __used __section(__##table##_thermal_table) 
> > > \
> > > + = 
> > > +
> > > +#define THERMAL_GOVERNOR_DECLARE(name)   THERMAL_TABLE_ENTR
> > > Y(go
> > > vernor, name)
> > > +
> > > +#define for_each_governor_table(__governor)  \
> > > + for (__governor = __governor_thermal_table; \
> > > +  __governor < __governor_thermal_table_end; \
> > > +  __governor++)
> > > +
> > >  /*
> > >   * This structure is used to describe the behavior of
> > >   * a certain cooling device on a certain trip point
> > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-
> > > generic/vmlinux.lds.h
> > > index f8f6f04c4453..9893a3ed242a 100644
> > > --- 

[GIT PULL] extcon next for v5.2

2019-04-22 Thread Chanwoo Choi
Dear Greg,

This is extcon-next pull request for v5.2. I add detailed description of
this pull request on below. Please pull extcon with following updates.

[Detailed description for this pull request]
1. Add new extcon-intel-mrfld.c extcon provider driver
- On Intel Merrifield the Basin Cove PMIC provides a feature to detect
the USB connection type. This driver utilizes the feature in order
to support the USB dual role detection.

2. Update the extcon provider drivers
- For extcon-intel-cht-wc.c, make charger detection co-existed
  with OTG host mode and enable external charger.
- For intel extcon driver, add common header file (extcon-intel.h)
  in order to remove the duplicate definitions.
- For extcon-arizonal.c, disable microphone detection on driver removal.

3.
- Edit comment of extcon_unregister_notifer() to fix a build warning
- Add CONFIG_ACPI dependency to Kconfig to fix a build error for extcon-axp.c

Best Regards,
Chanwoo Choi


The following changes since commit 86baf800de84eb89615c138d368b14bff5ee7d8a:

  extcon: ptn5150: fix COMPILE_TEST dependencies (2019-04-05 10:08:37 +0900)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon.git 
tags/extcon-next-for-5.2

for you to fetch changes up to 00053de52231117ddc154042549f2256183ffb86:

  extcon: arizona: Disable mic detect if running when driver is removed 
(2019-04-12 09:38:40 +0900)


Andy Shevchenko (2):
  extcon: intel: Split out some definitions to a common header
  extcon: mrfld: Introduce extcon driver for Basin Cove PMIC

Charles Keepax (1):
  extcon: arizona: Disable mic detect if running when driver is removed

Valdis Kletnieks (1):
  extcon: Fix build warning for extcon_unregister_notifier comment

Yauhen Kharuzhy (2):
  extcon: intel-cht-wc: Make charger detection co-existed with OTG host mode
  extcon: intel-cht-wc: Enable external charger

YueHaibing (1):
  extcon: axp288: Add a depends on ACPI to the Kconfig entry

 drivers/extcon/Kconfig   |   9 +-
 drivers/extcon/Makefile  |   1 +
 drivers/extcon/devres.c  |   2 +-
 drivers/extcon/extcon-arizona.c  |  10 ++
 drivers/extcon/extcon-intel-cht-wc.c |  81 --
 drivers/extcon/extcon-intel-mrfld.c  | 284 +++
 drivers/extcon/extcon-intel.h|  20 +++
 7 files changed, 389 insertions(+), 18 deletions(-)
 create mode 100644 drivers/extcon/extcon-intel-mrfld.c
 create mode 100644 drivers/extcon/extcon-intel.h


Re: [PATCH] riscv: Support non-coherency memory model

2019-04-22 Thread Christoph Hellwig
On Tue, Apr 23, 2019 at 08:13:48AM +0800, Guo Ren wrote:
> > We should probably start a working group for this ASAP unless we can
> > get another working group to help taking care of it.
> Good news, I prefer to use instructions directly instead of SBI_CALL.
> 
> Our instruction is "dcache.c/iva %0" (one cache line) and the parameter is
> virtual address in S-state. When get into M-state by SBI_CALL, we could
> let dcache.c/iva use physical addres directly and it needn't kmap page
> for RV32 with highmem (Of cause highmem is not ready in RV32 now).

So you only have one instruction variant?  Normally we'd have two or
three to implement the non-coherent DMA (or pmem) semantics:

cache writeback, cache invalidate and potentially cache writeback +
invalidate to optimize that case.  Here is the table how Linux
uses them for DMA:

  |   map  ==  for_device |   unmap ==  for_cpu
  |
 TO_DEV   |   writebackwriteback  |   none  none
 FROM_DEV |   invalidate   invalidate |   invalidate*   invalidate*
 BIDI |   writeback+invwriteback+inv  |   invalidateinvalidate

 [*] needed for CPU speculative prefetches


We already have a discussion on isa-dev on something like these
instructions:

https://groups.google.com/a/groups.riscv.org/forum/#!msg/isa-dev/qXbzqaQbDXU/4ThcEAeCCAAJ

It got a little side tracked, both due to the usual noise on isa-dev
and due to the proposal including a lot more instructions that might be
a little more contentious, but it might be a good start to bring this
into a working group.

> > Also is this really a coherent flag, or an 'uncached' flag like in
> > many other architectures?
> There are a lot of features about coherency attributes, eg: cacheable,
> bufferable, strong order ..., and coherency is a more abstract name to
> contain all of these. In our hardware, coherence = uncached +
> unbufferable + (stong order).
> 
> But I'm not very care about the name is, uncached is also ok. My key
> point is the bits of page attributes is very precious and this patch
> will use the last unused attribute bit in PTE.

I don't care about the name actually, more about having defined semantics.
Totally uncached should include unbuffered.  I don't think we need the
strong ordering for DMA memory either.

> Another point is we could get more attribute bits by modify the riscv
> spec:
>  - Remove Global bit, I think it's duplicate with the User bit in linux.

It is in Linux, but it is conceptually very different.

>  - Change _PAGE_PFN_SHIFT from 10 to 12, because the huge pfn in RV32 is
>very useless and current RV32 linux doesn't even implement highmem.

This would seem sensible to me, but I'm not sure everyone agrees.  Even
then we are very late in the game for changes like that.


Re: [PATCH AUTOSEL 4.14 35/43] tty: fix NULL pointer issue when tty_port ops is not set

2019-04-22 Thread Johan Hovold
Hi Sasha,

On Mon, Apr 22, 2019 at 03:47:19PM -0400, Sasha Levin wrote:
> From: Fabien Dessenne 
> 
> [ Upstream commit f4e68d58cf2b20a581759bbc7228052534652673 ]
> 
> Unlike 'client_ops' which is initialized to 'default_client_ops', the
> port operations 'ops' may be left to NULL.
> Check the 'ops' value before checking the 'ops->x' value.
> 
> Signed-off-by: Fabien Dessenne 
> Signed-off-by: Greg Kroah-Hartman 
> Signed-off-by: Sasha Levin (Microsoft) 

Despite the commit message, this one doesn't really fix anything and has
been reverted in Greg's tree (not sure why hasn't shown up in linux-next
yet).

Johan


Re: [PATCH v6 1/3] arm64: dts: fsl: librem5: Add a device tree for the Librem5 devkit

2019-04-22 Thread Marco Felsch
Hi Angus,

looks good to me just a few last nitpicks. Feel free to add or drop it.

Regards,
  Marco

On 19-04-22 08:30, Angus Ainslie (Purism) wrote:
> This is the development kit board for the Librem 5. The current level of
> support yields a working console and is able to boot userspace from the
> Network or eMMC.
> 
> Additional subsystems that are active :
> 
> - Both USB ports
> - SD card socket
> - WiFi usdhc
> - WWAN modem
> - GNSS
> - GPIO keys
> - LEDs
> - gyro
> - magnetometer
> - touchscreen
> - pwm
> - backlight
> - haptic motor
> 
> Signed-off-by: Angus Ainslie (Purism) 
> ---
>  arch/arm64/boot/dts/freescale/Makefile|   1 +
>  .../dts/freescale/imx8mq-librem5-devkit.dts   | 832 ++
>  2 files changed, 833 insertions(+)
>  create mode 100644 arch/arm64/boot/dts/freescale/imx8mq-librem5-devkit.dts
> 
> diff --git a/arch/arm64/boot/dts/freescale/Makefile 
> b/arch/arm64/boot/dts/freescale/Makefile
> index 0bd122f60549..c043aca66572 100644
> --- a/arch/arm64/boot/dts/freescale/Makefile
> +++ b/arch/arm64/boot/dts/freescale/Makefile
> @@ -22,6 +22,7 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb
>  
>  dtb-$(CONFIG_ARCH_MXC) += imx8mm-evk.dtb
>  dtb-$(CONFIG_ARCH_MXC) += imx8mq-evk.dtb
> +dtb-$(CONFIG_ARCH_MXC) += imx8mq-librem5-devkit.dtb
>  dtb-$(CONFIG_ARCH_MXC) += imx8mq-zii-ultra-rmb3.dtb
>  dtb-$(CONFIG_ARCH_MXC) += imx8mq-zii-ultra-zest.dtb
>  dtb-$(CONFIG_ARCH_MXC) += imx8qxp-mek.dtb
> diff --git a/arch/arm64/boot/dts/freescale/imx8mq-librem5-devkit.dts 
> b/arch/arm64/boot/dts/freescale/imx8mq-librem5-devkit.dts
> new file mode 100644
> index ..7b8770fdc5fb
> --- /dev/null
> +++ b/arch/arm64/boot/dts/freescale/imx8mq-librem5-devkit.dts
> @@ -0,0 +1,832 @@
> +/* SPDX-License-Identifier: GPL-2.0+
> + *
> + * Copyright 2018-2019 Purism SPC
> + */
> +
> +/dts-v1/;
> +
> +#include "dt-bindings/input/input.h"
> +#include "dt-bindings/usb/pd.h"
> +#include "imx8mq.dtsi"
> +
> +/ {
> + model = "Purism Librem 5 devkit";
> + compatible = "purism,librem5-devkit", "fsl,imx8mq";
> +
> + backlight_dsi: backlight-dsi {
> + compatible = "pwm-backlight";
> + /* 200 Hz for the PAM2841 */
> + pwms = < 0 500>;
> + brightness-levels = <0 100>;
> + num-interpolated-steps = <100>;
> + /* Default brightness level (index into the array defined by */
> + /* the "brightness-levels" property) */
> + default-brightness-level = <0>;
> + power-supply = <_22v4_P>;
> + };
> +
> + chosen {
> + stdout-path = 
> + };
> +
> + gpio-keys {
> + compatible = "gpio-keys";
> + pinctrl-names = "default";
> + pinctrl-0 = <_gpio_keys>;
> +
> + btn1 {
> + label = "VOL_UP";
> + gpios = < 21 GPIO_ACTIVE_LOW>;
> + gpio-key,wakeup;
> + linux,code = ;
> + };
> +
> + btn2 {
> + label = "VOL_DOWN";
> + gpios = < 22 GPIO_ACTIVE_LOW>;
> + gpio-key,wakeup;
> + linux,code = ;
> + };
> +
> + hp_det {
> + label = "HP_DET";
> + gpios = < 20 GPIO_ACTIVE_LOW>;
> + gpio-key,wakeup;
> + linux,code = ;
> + };
> + };
> +
> + leds {
> + compatible = "gpio-leds";
> + pinctrl-names = "default";
> + pinctrl-0 = <_gpio_leds>;
> +
> + led1 {
> + label = "LED 1";
> + gpios = < 13 GPIO_ACTIVE_HIGH>;
> + default-state = "off";
> + };
> + };
> +
> + pmic_osc: pmic-osc {
> + compatible = "fixed-clock";
> + #clock-cells = <0>;
> + clock-frequency = <32768>;
> + clock-output-names = "pmic_osc";
> + };
> +
> + pwmleds {
> + compatible = "pwm-leds";
> +
> + haptic {
> + label = "librem5::haptic";
> + pwms = < 0 20>;
> + active-low;
> + max-brightness = <255>;
> + power-supply = <_3v3_p>;
> + };
> + };
> +
> + reg_1v8_p: regulator-1V8-P {
> + compatible = "regulator-fixed";
> + regulator-name = "1v8_p";
> + regulator-min-microvolt = <180>;
> + regulator-max-microvolt = <180>;
> + vin-supply = <_pwr_en>;
> + };
> +
> + reg_2v8_p: regulator-2V8-P {
> + compatible = "regulator-fixed";
> + regulator-name = "2v8_p";
> + regulator-min-microvolt = <280>;
> + regulator-max-microvolt = <280>;
> + vin-supply = <_pwr_en>;
> + };
> +
> + reg_3v3_p: regulator-3V3-P 

Re: [PATCH v2 1/2] arm64: dts: imx8mm: Add SAI nodes

2019-04-22 Thread Marco Felsch
Hi Daniel,

On 19-04-22 19:35, Daniel Baluta wrote:
> i.MX8MM has 5 SAI instances with the following base
> addresses according to RM.
> 
> SAI1 base address: 3001_h
> SAI2 base address: 3002_h
> SAI3 base address: 3003_h
> SAI5 base address: 3005_h
> SAI6 base address: 3006_h
> 
> Signed-off-by: Daniel Baluta 
> ---
>  arch/arm64/boot/dts/freescale/imx8mm.dtsi | 71 +++
>  1 file changed, 71 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/imx8mm.dtsi 
> b/arch/arm64/boot/dts/freescale/imx8mm.dtsi
> index de3498c2dd44..4d080dc47216 100644
> --- a/arch/arm64/boot/dts/freescale/imx8mm.dtsi
> +++ b/arch/arm64/boot/dts/freescale/imx8mm.dtsi
> @@ -171,6 +171,77 @@
>   #size-cells = <1>;
>   ranges;
>  
> + sai1: sai@3001 {
> + compatible = "fsl,imx8mm-sai",
> + "fsl,imx8mq-sai";

Just a nitpick but I would not break this line here. This applies to the
other sai nodes too.

Regards,
  Marco

> + reg = <0x3001 0x1>;
> + interrupts = ;
> + clocks = < IMX8MM_CLK_SAI1_IPG>,
> +  < IMX8MM_CLK_SAI1_ROOT>,
> +  < IMX8MM_CLK_DUMMY>, < 
> IMX8MM_CLK_DUMMY>;
> + clock-names = "bus", "mclk1", "mclk2", "mclk3";
> + dmas = < 0 2 0>, < 1 2 0>;
> + dma-names = "rx", "tx";
> + status = "disabled";
> + };
> +
> + sai2: sai@3002 {
> + compatible = "fsl,imx8mm-sai",
> + "fsl,imx8mq-sai";
> + reg = <0x3002 0x1>;
> + interrupts = ;
> + clocks = < IMX8MM_CLK_SAI2_IPG>,
> + < IMX8MM_CLK_SAI2_ROOT>,
> + < IMX8MM_CLK_DUMMY>, < 
> IMX8MM_CLK_DUMMY>;
> + clock-names = "bus", "mclk1", "mclk2", "mclk3";
> + dmas = < 2 2 0>, < 3 2 0>;
> + dma-names = "rx", "tx";
> + status = "disabled";
> + };
> +
> + sai3: sai@3003 {
> + #sound-dai-cells = <0>;
> + compatible = "fsl,imx8mm-sai",
> + "fsl,imx8mq-sai";
> + reg = <0x3003 0x1>;
> + interrupts = ;
> + clocks = < IMX8MM_CLK_SAI3_IPG>,
> +  < IMX8MM_CLK_SAI3_ROOT>,
> +  < IMX8MM_CLK_DUMMY>, < 
> IMX8MM_CLK_DUMMY>;
> + clock-names = "bus", "mclk1", "mclk2", "mclk3";
> + dmas = < 4 2 0>, < 5 2 0>;
> + dma-names = "rx", "tx";
> + status = "disabled";
> + };
> +
> + sai5: sai@3005 {
> + compatible = "fsl,imx8mm-sai",
> + "fsl,imx8mq-sai";
> + reg = <0x3005 0x1>;
> + interrupts = ;
> + clocks = < IMX8MM_CLK_SAI5_IPG>,
> +  < IMX8MM_CLK_SAI5_ROOT>,
> +  < IMX8MM_CLK_DUMMY>, < 
> IMX8MM_CLK_DUMMY>;
> + clock-names = "bus", "mclk1", "mclk2", "mclk3";
> + dmas = < 8 2 0>, < 9 2 0>;
> + dma-names = "rx", "tx";
> + status = "disabled";
> + };
> +
> + sai6: sai@3006 {
> + compatible = "fsl,imx8mm-sai",
> + "fsl,imx8mq-sai";
> + reg = <0x3006 0x1>;
> + interrupts = ;
> + clocks = < IMX8MM_CLK_SAI6_IPG>,
> +  < IMX8MM_CLK_SAI6_ROOT>,
> +  < IMX8MM_CLK_DUMMY>, < 
> IMX8MM_CLK_DUMMY>;
> + clock-names = "bus", "mclk1", "mclk2", "mclk3";
> + dmas = < 10 2 0>, < 11 2 0>;
> + dma-names = "rx", "tx";
> + status = "disabled";
> + };
> +
>   gpio1: gpio@3020 {
>   compatible = "fsl,imx8mm-gpio", 
> "fsl,imx35-gpio";
>   

Re: [PATCH v2 2/2] arm64: dts: imx8mm-evk: Enable audio codec wm8524

2019-04-22 Thread Marco Felsch
Hi Daniel,

On 19-04-22 19:36, Daniel Baluta wrote:
> i.MX8MM has one wm8524 audio codec connected with
> SAI3 digital audio interface.
> 
> This patch uses simple-card machine driver in order
> to enable wm8524 codec.
> 
> We need to set:
>   * SAI3 pinctrl configuration
>   * clock hierarchy
>   * codec node
>   * simple-card configuration
> 
> Signed-off-by: Daniel Baluta 
> ---
>  arch/arm64/boot/dts/freescale/imx8mm-evk.dts | 48 
>  1 file changed, 48 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/imx8mm-evk.dts 
> b/arch/arm64/boot/dts/freescale/imx8mm-evk.dts
> index 2d5d89475b76..207b13266a96 100644
> --- a/arch/arm64/boot/dts/freescale/imx8mm-evk.dts
> +++ b/arch/arm64/boot/dts/freescale/imx8mm-evk.dts
> @@ -37,6 +37,35 @@
>   gpio = < 19 GPIO_ACTIVE_HIGH>;
>   enable-active-high;
>   };
> +
> + wm8524: audio-codec {
> + #sound-dai-cells = <0>;
> + compatible = "wlf,wm8524";
> + wlf,mute-gpios = < 21 GPIO_ACTIVE_LOW>;

I would mux the gpio where I use them.

> + };
> +
> + sound-wm8524 {
> + compatible = "simple-audio-card";
> + simple-audio-card,name = "wm8524-audio";
> + simple-audio-card,format = "i2s";
> + simple-audio-card,frame-master = <>;
> + simple-audio-card,bitclock-master = <>;
> + simple-audio-card,widgets =
> + "Line", "Left Line Out Jack",
> + "Line", "Right Line Out Jack";
> + simple-audio-card,routing =
> + "Left Line Out Jack", "LINEVOUTL",
> + "Right Line Out Jack", "LINEVOUTR";
> +
> + cpudai: simple-audio-card,cpu {
> + sound-dai = <>;
> + };
> +
> + link_codec: simple-audio-card,codec {

Can you drop that phandle?

Regards,
  Marco

> + sound-dai = <>;
> + clocks = < IMX8MM_CLK_SAI3_ROOT>;
> + };
> + };
>  };
>  
>   {
> @@ -61,6 +90,15 @@
>   };
>  };
>  
> + {
> + pinctrl-names = "default";
> + pinctrl-0 = <_sai3>;
> + assigned-clocks = < IMX8MM_CLK_SAI3>;
> + assigned-clock-parents = < IMX8MM_AUDIO_PLL1_OUT>;
> + assigned-clock-rates = <24576000>;
> + status = "okay";
> +};
> +
>   { /* console */
>   pinctrl-names = "default";
>   pinctrl-0 = <_uart2>;
> @@ -130,6 +168,16 @@
>   >;
>   };
>  
> + pinctrl_sai3: sai3grp {
> + fsl,pins = <
> + MX8MM_IOMUXC_SAI3_TXFS_SAI3_TX_SYNC 0xd6
> + MX8MM_IOMUXC_SAI3_TXC_SAI3_TX_BCLK  0xd6
> + MX8MM_IOMUXC_SAI3_MCLK_SAI3_MCLK0xd6
> + MX8MM_IOMUXC_SAI3_TXD_SAI3_TX_DATA0 0xd6
> + MX8MM_IOMUXC_I2C4_SDA_GPIO5_IO210xd6
> + >;
> + };
> +
>   pinctrl_uart2: uart2grp {
>   fsl,pins = <
>   MX8MM_IOMUXC_UART2_RXD_UART2_DCE_RX 0x140
> -- 
> 2.17.1
> 

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |


linux-next: build warning after merge of the staging tree

2019-04-22 Thread Stephen Rothwell
Hi Greg,

After merging the staging tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

drivers/staging/kpc2000/kpc_spi/spi_driver.c:97:5: note: offset of packed 
bit-field 'wl' has changed in GCC 4.4
 } bitfield;
 ^
drivers/staging/kpc2000/kpc_spi/spi_driver.c:97:5: note: offset of packed 
bit-field 'cs' has changed in GCC 4.4
drivers/staging/kpc2000/kpc_spi/spi_driver.c:97:5: note: offset of packed 
bit-field 'wcnt' has changed in GCC 4.4

Introduced by commit

  7dc7967fc39a ("staging: kpc2000: add initial set of Daktronics drivers")

-- 
Cheers,
Stephen Rothwell


pgpmnnij057fG.pgp
Description: OpenPGP digital signature


Re: [PATCH v3 1/3] dmaengine: at_xdmac: remove BUG_ON macro in tasklet

2019-04-22 Thread Vinod Koul
On 03-04-19, 12:23, Nicolas Ferre wrote:
> Even if this case shouldn't happen when controller is properly programmed,
> it's still better to avoid dumping a kernel Oops for this.
> As the sequence may happen only for debugging purposes, log the error and
> just finish the tasklet call.

Applied all, thanks
-- 
~Vinod


[PATCH 1/2] x86/time: check usability of IRQ0 PIT timer

2019-04-22 Thread Daniel Drake
Modern Intel SoCs now include a special ITSSPRC register that can be
used to "gate" the PIT such that IRQ0 interrupts do not fire.

With Intel Apollo Lake we are starting to see consumer products that
have a BIOS option to apply this (defaulting to gated). Some such
products also lack the HPET ACPI table, so there is no HPET either.

At this point, Linux needs to stop assuming that the IRQ0 timer is
available.

Move APIC code to check IRQ0 to time.c, then check and record the IRQ0
PIT timer usability after it is set up. If it does not produce any
interrupts, unregister the clock event source.

Signed-off-by: Daniel Drake 
Link: 
https://lkml.kernel.org/r/CAD8Lp45fedoPLnK=umuhhtkjy5u2h04sykrx3u+m04u6fpv...@mail.gmail.com
---
 arch/x86/include/asm/time.h|   2 +
 arch/x86/kernel/apic/io_apic.c | 101 ---
 arch/x86/kernel/i8253.c|   6 ++
 arch/x86/kernel/time.c | 106 -
 drivers/clocksource/i8253.c|   6 ++
 include/linux/clockchips.h |   3 +
 include/linux/i8253.h  |   2 +
 kernel/time/tick-internal.h|   2 -
 8 files changed, 134 insertions(+), 94 deletions(-)

diff --git a/arch/x86/include/asm/time.h b/arch/x86/include/asm/time.h
index cef818b16045..e6e00d18b39f 100644
--- a/arch/x86/include/asm/time.h
+++ b/arch/x86/include/asm/time.h
@@ -10,4 +10,6 @@ extern void time_init(void);
 
 extern struct clock_event_device *global_clock_event;
 
+extern bool irq0_timer_works(void);
+
 #endif /* _ASM_X86_TIME_H */
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 53aa234a6803..ae46da48c07b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1573,92 +1574,6 @@ void __init setup_ioapic_ids_from_mpc(void)
 }
 #endif
 
-int no_timer_check __initdata;
-
-static int __init notimercheck(char *s)
-{
-   no_timer_check = 1;
-   return 1;
-}
-__setup("no_timer_check", notimercheck);
-
-static void __init delay_with_tsc(void)
-{
-   unsigned long long start, now;
-   unsigned long end = jiffies + 4;
-
-   start = rdtsc();
-
-   /*
-* We don't know the TSC frequency yet, but waiting for
-* 400/HZ TSC cycles is safe:
-* 4 GHz == 10 jiffies
-* 1 GHz == 40 jiffies
-*/
-   do {
-   rep_nop();
-   now = rdtsc();
-   } while ((now - start) < 400ULL / HZ &&
-   time_before_eq(jiffies, end));
-}
-
-static void __init delay_without_tsc(void)
-{
-   unsigned long end = jiffies + 4;
-   int band = 1;
-
-   /*
-* We don't know any frequency yet, but waiting for
-* 4094000/HZ cycles is safe:
-* 4 GHz == 10 jiffies
-* 1 GHz == 40 jiffies
-* 1 << 1 + 1 << 2 +...+ 1 << 11 = 4094
-*/
-   do {
-   __delay(((1U << band++) * 1000UL) / HZ);
-   } while (band < 12 && time_before_eq(jiffies, end));
-}
-
-/*
- * There is a nasty bug in some older SMP boards, their mptable lies
- * about the timer IRQ. We do the following to work around the situation:
- *
- * - timer IRQ defaults to IO-APIC IRQ
- * - if this function detects that timer IRQs are defunct, then we fall
- *   back to ISA timer IRQs
- */
-static int __init timer_irq_works(void)
-{
-   unsigned long t1 = jiffies;
-   unsigned long flags;
-
-   if (no_timer_check)
-   return 1;
-
-   local_save_flags(flags);
-   local_irq_enable();
-
-   if (boot_cpu_has(X86_FEATURE_TSC))
-   delay_with_tsc();
-   else
-   delay_without_tsc();
-
-   local_irq_restore(flags);
-
-   /*
-* Expect a few ticks at least, to be sure some possible
-* glue logic does not lock up after one or two first
-* ticks in a non-ExtINT mode.  Also the local APIC
-* might have cached one ExtINT interrupt.  Finally, at
-* least one tick may be lost due to delays.
-*/
-
-   /* jiffies wrap? */
-   if (time_after(jiffies, t1 + 4))
-   return 1;
-   return 0;
-}
-
 /*
  * In the SMP+IOAPIC case it might happen that there are an unspecified
  * number of pending IRQ events unhandled. These cases are very rare,
@@ -2066,6 +1981,12 @@ static int mp_alloc_timer_irq(int ioapic, int pin)
 }
 
 /*
+ * In the SMP+IOAPIC case it might happen that there are an unspecified
+ * number of pending IRQ events unhandled. These cases are very rare,
+ * so we 'resend' these IRQs via IPIs, to the same CPU. It's much
+ * better to do it this way as thus we do not have to be aware of
+ * 'pending' interrupts in the IRQ path, except at this point.
+ *
  * This code may look a bit paranoid, but it's supposed to cooperate with
  * a wide range of boards and BIOS bugs.  Fortunately only the timer IRQ
  * is so screwy.  Thanks to Brian 

[PATCH 2/2] x86/ioapic: avoid timer manipulation when IRQ0 timer is unavailable

2019-04-22 Thread Daniel Drake
New products based on Intel Apollo Lake are appearing where the HPET is
not present in ACPI, and the legacy 8254 PIT is "gated" by default in
the BIOS setup menu.

This leads an early boot "IO-APIC + timer doesn't work!" kernel panic
on a black screen (before the framebuffer is initialized).

Avoid the IO-APIC IRQ0 timer manipulation & verification on platforms
where the legacy IRQ0 timer has been determined as unavailable.

This fixes boot on Connex L1430 and Scope SN116PYA with default BIOS
settings.

Signed-off-by: Daniel Drake 
Link: 
https://lkml.kernel.org/r/CAD8Lp45fedoPLnK=umuhhtkjy5u2h04sykrx3u+m04u6fpv...@mail.gmail.com
---
 arch/x86/kernel/apic/io_apic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ae46da48c07b..2d29c62abbcb 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2243,7 +2243,7 @@ void __init setup_IO_APIC(void)
sync_Arb_IDs();
setup_IO_APIC_irqs();
init_IO_APIC_traps();
-   if (nr_legacy_irqs())
+   if (global_clock_event && nr_legacy_irqs())
check_timer();
 
ioapic_initialized = 1;
-- 
2.19.1



Re: [PATCH] pinctrl: intel: Clear interrupt status in unmask callback

2019-04-22 Thread Kai-Heng Feng

Hi,

at 02:22,   wrote:


Hi.
I've just applied this patch, and touchpad woorks smoothly, but suspend  
issue is still present.
After suspend, i2c_hid module bursts i2c_hid i2c-ELAN1200:00:  
i2c_hid_get_input: incomplete report (16/65535) messages (more than 50  
reports/sec).
In dmesg I can see a frequency of reporting every 0.0007 - 0.001 dmesg  
time units.


What’s the default suspend mode on the platform?
This is a common issue for system that defaults to Suspend-to-idle, but S3  
is in use.
The root cause is that the power of the touchpad doesn’t get cut off during  
S3 by platform firmware.


Do you also see this issue if S2I is in use?

Kai-Heng



Though I can sucessfully restart module and after restarting it works as  
good as it was.


So suspend issue is still present.

Regards,
Vladislav.


Apr 22, 2019, 7:45 AM by kai.heng.f...@canonical.com:
Commit a939bb57cd47 ("pinctrl: intel: implement gpio_irq_enable") was
added because clearing interrupt status bit is required to avoid
unexpected behavior.

Turns out the unmask callback also needs the fix, which can solve weird
IRQ triggering issues on I2C touchpad ELAN1200.

Signed-off-by: Kai-Heng Feng 
---
drivers/pinctrl/intel/pinctrl-intel.c | 35 ---
1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/drivers/pinctrl/intel/pinctrl-intel.c  
b/drivers/pinctrl/intel/pinctrl-intel.c

index 3b1818184207..53878604537e 100644
--- a/drivers/pinctrl/intel/pinctrl-intel.c
+++ b/drivers/pinctrl/intel/pinctrl-intel.c
@@ -913,35 +913,6 @@ static void intel_gpio_irq_ack(struct irq_data *d)
}
}

-static void intel_gpio_irq_enable(struct irq_data *d)
-{
-   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
-   struct intel_pinctrl *pctrl = gpiochip_get_data(gc);
-   const struct intel_community *community;
-   const struct intel_padgroup *padgrp;
-   int pin;
-
-   pin = intel_gpio_to_pin(pctrl, irqd_to_hwirq(d), , );
-   if (pin >= 0) {
-   unsigned int gpp, gpp_offset, is_offset;
-   unsigned long flags;
-   u32 value;
-
-   gpp = padgrp->reg_num;
-   gpp_offset = padgroup_offset(padgrp, pin);
-   is_offset = community->is_offset + gpp * 4;
-
-   raw_spin_lock_irqsave(>lock, flags);
-   /* Clear interrupt status first to avoid unexpected interrupt */
-   writel(BIT(gpp_offset), community->regs + is_offset);
-
-   value = readl(community->regs + community->ie_offset + gpp * 4);
-   value |= BIT(gpp_offset);
-   writel(value, community->regs + community->ie_offset + gpp * 4);
-   raw_spin_unlock_irqrestore(>lock, flags);
-   }
-}
-
static void intel_gpio_irq_mask_unmask(struct irq_data *d, bool mask)
{
struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
@@ -963,6 +934,11 @@ static void intel_gpio_irq_mask_unmask(struct  
irq_data *d, bool mask)

reg = community->regs + community->ie_offset + gpp * 4;

raw_spin_lock_irqsave(>lock, flags);
+
+   /* Clear interrupt status first to avoid unexpected interrupt */
+   if (!mask)
+	writel(BIT(gpp_offset), community->regs + community->is_offset +  
gpp * 4);

+
value = readl(reg);
if (mask)
value &= ~BIT(gpp_offset);
@@ -1106,7 +1082,6 @@ static irqreturn_t intel_gpio_irq(int irq, void  
*data)


static struct irq_chip intel_gpio_irqchip = {
.name = "intel-gpio",
-   .irq_enable = intel_gpio_irq_enable,
.irq_ack = intel_gpio_irq_ack,
.irq_mask = intel_gpio_irq_mask,
.irq_unmask = intel_gpio_irq_unmask,
--
2.17.1





[PATCH V3 2/2] PCI: dwc: Export APIs to support .remove() implementation

2019-04-22 Thread Vidya Sagar
Export all configuration space access APIs and also other APIs to
support host controller drivers of DesignWare core based implementations
while adding support for .remove() hook to build their respective drivers
as modules

Signed-off-by: Vidya Sagar 
Acked-by: Gustavo Pimentel 
---
v3:
* Rebased on top of linux-next top of the tree branch

v2:
* s/Designware/DesignWare

 .../pci/controller/dwc/pcie-designware-host.c |  4 ++
 drivers/pci/controller/dwc/pcie-designware.c  | 38 +++
 drivers/pci/controller/dwc/pcie-designware.h  | 35 +++--
 3 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c 
b/drivers/pci/controller/dwc/pcie-designware-host.c
index f87c9542eb09..36fd3f5b48f6 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -311,6 +311,7 @@ void dw_pcie_msi_init(struct pcie_port *pp)
dw_pcie_wr_own_conf(pp, PCIE_MSI_ADDR_HI, 4,
upper_32_bits(msi_target));
 }
+EXPORT_SYMBOL_GPL(dw_pcie_msi_init);
 
 int dw_pcie_host_init(struct pcie_port *pp)
 {
@@ -495,6 +496,7 @@ int dw_pcie_host_init(struct pcie_port *pp)
dw_pcie_free_msi(pp);
return ret;
 }
+EXPORT_SYMBOL_GPL(dw_pcie_host_init);
 
 void dw_pcie_host_deinit(struct pcie_port *pp)
 {
@@ -502,6 +504,7 @@ void dw_pcie_host_deinit(struct pcie_port *pp)
pci_remove_root_bus(pp->root_bus);
dw_pcie_free_msi(pp);
 }
+EXPORT_SYMBOL_GPL(dw_pcie_host_deinit);
 
 static int dw_pcie_access_other_conf(struct pcie_port *pp, struct pci_bus *bus,
 u32 devfn, int where, int size, u32 *val,
@@ -694,3 +697,4 @@ void dw_pcie_setup_rc(struct pcie_port *pp)
val |= PORT_LOGIC_SPEED_CHANGE;
dw_pcie_wr_own_conf(pp, PCIE_LINK_WIDTH_SPEED_CONTROL, 4, val);
 }
+EXPORT_SYMBOL_GPL(dw_pcie_setup_rc);
diff --git a/drivers/pci/controller/dwc/pcie-designware.c 
b/drivers/pci/controller/dwc/pcie-designware.c
index d7cc1a0c1de6..8e0081ccf83b 100644
--- a/drivers/pci/controller/dwc/pcie-designware.c
+++ b/drivers/pci/controller/dwc/pcie-designware.c
@@ -40,6 +40,7 @@ int dw_pcie_read(void __iomem *addr, int size, u32 *val)
 
return PCIBIOS_SUCCESSFUL;
 }
+EXPORT_SYMBOL_GPL(dw_pcie_read);
 
 int dw_pcie_write(void __iomem *addr, int size, u32 val)
 {
@@ -57,6 +58,7 @@ int dw_pcie_write(void __iomem *addr, int size, u32 val)
 
return PCIBIOS_SUCCESSFUL;
 }
+EXPORT_SYMBOL_GPL(dw_pcie_write);
 
 u32 __dw_pcie_read_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
   size_t size)
@@ -120,6 +122,42 @@ void __dw_pcie_write_dbi2(struct dw_pcie *pci, void 
__iomem *base, u32 reg,
dev_err(pci->dev, "write DBI address failed\n");
 }
 
+void dw_pcie_writel_dbi(struct dw_pcie *pci, u32 reg, u32 val)
+{
+   __dw_pcie_write_dbi(pci, pci->dbi_base, reg, 0x4, val);
+}
+EXPORT_SYMBOL_GPL(dw_pcie_writel_dbi);
+
+u32 dw_pcie_readl_dbi(struct dw_pcie *pci, u32 reg)
+{
+   return __dw_pcie_read_dbi(pci, pci->dbi_base, reg, 0x4);
+}
+EXPORT_SYMBOL_GPL(dw_pcie_readl_dbi);
+
+void dw_pcie_writew_dbi(struct dw_pcie *pci, u32 reg, u16 val)
+{
+   __dw_pcie_write_dbi(pci, pci->dbi_base, reg, 0x2, val);
+}
+EXPORT_SYMBOL_GPL(dw_pcie_writew_dbi);
+
+u16 dw_pcie_readw_dbi(struct dw_pcie *pci, u32 reg)
+{
+   return __dw_pcie_read_dbi(pci, pci->dbi_base, reg, 0x2);
+}
+EXPORT_SYMBOL_GPL(dw_pcie_readw_dbi);
+
+void dw_pcie_writeb_dbi(struct dw_pcie *pci, u32 reg, u8 val)
+{
+   __dw_pcie_write_dbi(pci, pci->dbi_base, reg, 0x1, val);
+}
+EXPORT_SYMBOL_GPL(dw_pcie_writeb_dbi);
+
+u8 dw_pcie_readb_dbi(struct dw_pcie *pci, u32 reg)
+{
+   return __dw_pcie_read_dbi(pci, pci->dbi_base, reg, 0x1);
+}
+EXPORT_SYMBOL_GPL(dw_pcie_readb_dbi);
+
 static u32 dw_pcie_readl_ob_unroll(struct dw_pcie *pci, u32 index, u32 reg)
 {
u32 offset = PCIE_GET_ATU_OUTB_UNR_REG_OFFSET(index);
diff --git a/drivers/pci/controller/dwc/pcie-designware.h 
b/drivers/pci/controller/dwc/pcie-designware.h
index 4f48ec78c7b9..9ee98ced1ef6 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -270,35 +270,12 @@ void dw_pcie_disable_atu(struct dw_pcie *pci, int index,
 enum dw_pcie_region_type type);
 void dw_pcie_setup(struct dw_pcie *pci);
 
-static inline void dw_pcie_writel_dbi(struct dw_pcie *pci, u32 reg, u32 val)
-{
-   __dw_pcie_write_dbi(pci, pci->dbi_base, reg, 0x4, val);
-}
-
-static inline u32 dw_pcie_readl_dbi(struct dw_pcie *pci, u32 reg)
-{
-   return __dw_pcie_read_dbi(pci, pci->dbi_base, reg, 0x4);
-}
-
-static inline void dw_pcie_writew_dbi(struct dw_pcie *pci, u32 reg, u16 val)
-{
-   __dw_pcie_write_dbi(pci, pci->dbi_base, reg, 0x2, val);
-}
-
-static inline u16 dw_pcie_readw_dbi(struct dw_pcie *pci, u32 reg)
-{
-   return __dw_pcie_read_dbi(pci, pci->dbi_base, reg, 0x2);
-}
-

[PATCH V3 1/2] PCI: dwc: Add API support to de-initialize host

2019-04-22 Thread Vidya Sagar
Add an API to group all the tasks to be done to de-initialize host which
can then be called by any DesignWare core based driver implementations
while adding .remove() support in their respective drivers.

Signed-off-by: Vidya Sagar 
Acked-by: Gustavo Pimentel 
---
v3:
* Rebased on top of linux-next top of the tree branch

v2:
* s/Designware/DesignWare

 drivers/pci/controller/dwc/pcie-designware-host.c | 7 +++
 drivers/pci/controller/dwc/pcie-designware.h  | 5 +
 2 files changed, 12 insertions(+)

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c 
b/drivers/pci/controller/dwc/pcie-designware-host.c
index 77db32529319..f87c9542eb09 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -496,6 +496,13 @@ int dw_pcie_host_init(struct pcie_port *pp)
return ret;
 }
 
+void dw_pcie_host_deinit(struct pcie_port *pp)
+{
+   pci_stop_root_bus(pp->root_bus);
+   pci_remove_root_bus(pp->root_bus);
+   dw_pcie_free_msi(pp);
+}
+
 static int dw_pcie_access_other_conf(struct pcie_port *pp, struct pci_bus *bus,
 u32 devfn, int where, int size, u32 *val,
 bool write)
diff --git a/drivers/pci/controller/dwc/pcie-designware.h 
b/drivers/pci/controller/dwc/pcie-designware.h
index deab426affd3..4f48ec78c7b9 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -348,6 +348,7 @@ void dw_pcie_msi_init(struct pcie_port *pp);
 void dw_pcie_free_msi(struct pcie_port *pp);
 void dw_pcie_setup_rc(struct pcie_port *pp);
 int dw_pcie_host_init(struct pcie_port *pp);
+void dw_pcie_host_deinit(struct pcie_port *pp);
 int dw_pcie_allocate_domains(struct pcie_port *pp);
 #else
 static inline irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
@@ -372,6 +373,10 @@ static inline int dw_pcie_host_init(struct pcie_port *pp)
return 0;
 }
 
+static inline void dw_pcie_host_deinit(struct pcie_port *pp)
+{
+}
+
 static inline int dw_pcie_allocate_domains(struct pcie_port *pp)
 {
return 0;
-- 
2.17.1



Re: [PATCH 3/3] ARM: omap2: move platform-specific asm-offset.h to arch/arm/mach-omap2

2019-04-22 Thread Masahiro Yamada
On Tue, Apr 9, 2019 at 11:20 PM Tony Lindgren  wrote:
>
> * Masahiro Yamada  [190409 07:06]:
> > On Tue, Apr 9, 2019 at 2:17 PM Keerthy  wrote:
> > >
> > >
> > >
> > > On 09/04/19 10:37 AM, Masahiro Yamada wrote:
> > > > On Tue, Apr 9, 2019 at 2:00 PM Keerthy  wrote:
> > > >>
> > > >>
> > > >>
> > > >> On 08/04/19 9:48 PM, Tony Lindgren wrote:
> > > >>> Hi,
> > > >>>
> > > >>> * Masahiro Yamada  [190408 07:56]:
> > >   is only generated and included
> > >  by arch/arm/mach-omap2/, so it does not need to reside in the
> > >  globally visible include/generated/.
> > > 
> > >  I moved and renamed it to arch/arm/mach-omap2/pm-asm-offsets.h
> > >  since the prefix 'omap2-' is just redundant in mach-omap2/.
> > > 
> > >  Signed-off-by: Masahiro Yamada 
> > >  ---
> > > 
> > >  Can this be applied to ARM-SOC tree in a series?
> > >  (with Ack from the platform sub-maintainer.)
> > > 
> > >  ti-pm-asm-offsets.h does not need to reside in include/generated/,
> > >  but you may ask "Why must it get out of include/generated/?"
> > > 
> > >  My main motivation is to avoid a race condition in the currently
> > >  proposed patch:
> > > 
> > >  https://lore.kernel.org/patchwork/patch/1052763/
> > > 
> > >  This patch tries to embed some build artifacts into the kernel.
> > > 
> > >  If arch/arm/mach-omap2/ and kernel/ are built at the same time,
> > >  it may embed a truncated file.
> > > >>>
> > > >>> Looks like a nice improvment to me, adding Keerthy and Dave to Cc.
> > > >>>
> > > >>> Keerthy and Dave, can you please test this series with am3 and am4
> > > >>> PM code?
> > > >>
> > > >> Tested for Deep Sleep0 on AM33xx Beaglebone-black.
> > > >> Tested for Deep Sleep0 on AM437x-gp-evm.
> > > >>
> > > >> Applied this on top of Tony's for-next with the gpio patch
> > > >> required for RTC+DDR mode on am437x-gp-evm.
> > > >
> > > > Was it applied to TI tree?
> > > >
> > > > If so ...
> > > >
> > > > Arnd, Olof,
> > > > Please just ignore this patch
> > > > since it looks it was already applied to TI tree.
> > >
> > > Masahiro Yamada,
> > >
> > > No i manually applied this on top.
> > >
> > > Regards,
> > > Keerthy
> >
> > Keerthy,
> > Sorry, I misunderstood.
> >
> > You just applied it to your local tree for testing.
> >
> > Then, I still think it is better to
> > apply this series in a correct order.
> >
> > The reason I sent this in a series was
> > to make sure asm-offset headers are correctly
> > cleaned up.
>
> Yes looks good to me:
>
> Acked-by: Tony Lindgren 

Sorry, this turned out to break the out-of-tree build.

Please do not apply this for now.

I will come back to this later when ready.


-- 
Best Regards
Masahiro Yamada


Re: [PATCH 2/3] ARM: at91: move platform-specific asm-offset.h to arch/arm/mach-at91

2019-04-22 Thread Masahiro Yamada
On Sat, Apr 20, 2019 at 8:10 AM Masahiro Yamada
 wrote:
>
> On Sat, Apr 20, 2019 at 4:03 AM Ludovic Desroches
>  wrote:
> >
> > On Mon, Apr 15, 2019 at 05:14:50PM +0200, Alexandre Belloni wrote:
> > > External E-Mail
> > >
> > >
> > > On 08/04/2019 16:54:26+0900, Masahiro Yamada wrote:
> > > >  is only generated and included
> > > > by arch/arm/mach-at91/, so it does not need to reside in the
> > > > globally visible include/generated/.
> > > >
> > > > I moved and renamed it to arch/arm/mach-at91/pm_data-offsets.h
> > > > since the prefix 'at91_' is just redundant in mach-at91/.
> > > >
> > > > Signed-off-by: Masahiro Yamada 
> > > Acked-by: Alexandre Belloni 
> > >
> >
> > Applied in at91-soc. Let me know if it's an issue, I plan to do the PR
> > soon.

Sorry. A more fatal issue is this breaks O= build.

Could you drop it?

Thanks.



>
>
> There is one minor issue.
>
> If you apply 2/3 (this one) alone,
> arch/arm/mach-at91/pm_data-offsets.h is not cleaned.
>
>
> 1/3 fixes the "make clean" issue:
> https://lkml.org/lkml/2019/4/8/153
>
>
> That is why I sent this as a series
> in order to avoid the regression of cleaning.
>
>
>
> Thanks.
>
> Masahiro Yamada
>
>
> > Regards
> >
> > Ludovic
> >
> > > > ---
> > > >
> > > > Can this be applied to ARM-SOC tree in a series?
> > > > (with Ack from the platform sub-maintainer.)
> > > >
> > > > at91_pm_data-offsets.h header does not need to reside in
> > > > include/generated/, but you may ask
> > > > "Why must it get out of include/generated/?"
> > > >
> > > > My main motivation is to avoid a race condition in the currently
> > > > proposed patch:
> > > >
> > > > https://lore.kernel.org/patchwork/patch/1052763/
> > > >
> > > > This patch tries to embed some build artifacts into the kernel.
> > > >
> > > > If arch/arm/mach-at91/ and kernel/ are built at the same time,
> > > > it may embed a truncated file.
> > > >
> > > >
> > > >  arch/arm/mach-at91/.gitignore   | 1 +
> > > >  arch/arm/mach-at91/Makefile | 5 +++--
> > > >  arch/arm/mach-at91/pm_suspend.S | 2 +-
> > > >  3 files changed, 5 insertions(+), 3 deletions(-)
> > > >  create mode 100644 arch/arm/mach-at91/.gitignore
> > > >
> > > > diff --git a/arch/arm/mach-at91/.gitignore 
> > > > b/arch/arm/mach-at91/.gitignore
> > > > new file mode 100644
> > > > index ..2ecd6f51c8a9
> > > > --- /dev/null
> > > > +++ b/arch/arm/mach-at91/.gitignore
> > > > @@ -0,0 +1 @@
> > > > +pm_data-offsets.h
> > > > diff --git a/arch/arm/mach-at91/Makefile b/arch/arm/mach-at91/Makefile
> > > > index 31b61f0e1c07..de64301dcff2 100644
> > > > --- a/arch/arm/mach-at91/Makefile
> > > > +++ b/arch/arm/mach-at91/Makefile
> > > > @@ -19,9 +19,10 @@ ifeq ($(CONFIG_PM_DEBUG),y)
> > > >  CFLAGS_pm.o += -DDEBUG
> > > >  endif
> > > >
> > > > -include/generated/at91_pm_data-offsets.h: 
> > > > arch/arm/mach-at91/pm_data-offsets.s FORCE
> > > > +$(obj)/pm_data-offsets.h: $(obj)/pm_data-offsets.s FORCE
> > > > $(call filechk,offsets,__PM_DATA_OFFSETS_H__)
> > > >
> > > > -arch/arm/mach-at91/pm_suspend.o: 
> > > > include/generated/at91_pm_data-offsets.h
> > > > +$(obj)/pm_suspend.o: $(obj)/pm_data-offsets.h
> > > >
> > > >  targets += pm_data-offsets.s
> > > > +clean-files += pm_data-offsets.h
> > > > diff --git a/arch/arm/mach-at91/pm_suspend.S 
> > > > b/arch/arm/mach-at91/pm_suspend.S
> > > > index bfe1c4d06901..a31c1b20f3fa 100644
> > > > --- a/arch/arm/mach-at91/pm_suspend.S
> > > > +++ b/arch/arm/mach-at91/pm_suspend.S
> > > > @@ -14,7 +14,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include "pm.h"
> > > > -#include "generated/at91_pm_data-offsets.h"
> > > > +#include "pm_data-offsets.h"
> > > >
> > > >  #defineSRAMC_SELF_FRESH_ACTIVE 0x01
> > > >  #defineSRAMC_SELF_FRESH_EXIT   0x00
> > > > --
> > > > 2.17.1
> > > >
> > >
> > > --
> > > Alexandre Belloni, Bootlin
> > > Embedded Linux and Kernel engineering
> > > https://bootlin.com
> > >
>
>
>
> --
> Best Regards
> Masahiro Yamada



-- 
Best Regards
Masahiro Yamada


Re: linux-next: build failure after merge of the at91 tree

2019-04-22 Thread Masahiro Yamada
On Tue, Apr 23, 2019 at 10:33 AM Stephen Rothwell  wrote:
>
> Hi all,
>
> After merging the at91 tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
>
> arch/arm/mach-at91/pm_suspend.S:17:10: fatal error: pm_data-offsets.h: No 
> such file or directory
>  #include "pm_data-offsets.h"
>   ^~~
>
> Caused by commit
>
>   ab690fa1eb4b ("ARM: at91: move platform-specific asm-offset.h to 
> arch/arm/mach-at91")
>
> I used the version of the at91 tree from next-20190418 for today.


Sorry, I missed to test the out-of-tree build.

-I $(srctree)/$(src) is not added
when check-in assembly files include a generated header.
(I think this should be automatically cared by Kbuild, though.)


Ludovic,

Could you drop this patch for now?




-- 
Best Regards
Masahiro Yamada


Re: kernel BUG at kernel/cred.c:434!

2019-04-22 Thread Yang Yingliang




On 2019/4/23 3:48, Paul Moore wrote:

On Sat, Apr 20, 2019 at 3:39 AM Yang Yingliang  wrote:

I'm not sure you got my point.

I went back and looked at your previous emails again to try and
understand what you are talking about, and I'm a little confused by
some of the output ...


--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -481,6 +481,7 @@ static void do_acct_process(struct bsd_acct_struct
*acct)
  flim = current->signal->rlim[RLIMIT_FSIZE].rlim_cur;
  current->signal->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY;
  /* Perform file operations on behalf of whoever enabled
accounting */
+   pr_info("task:%px new cred:%px real cred:%px cred:%px\n",
current, file->f_cred, current->real_cred, current->cred);
  orig_cred = override_creds(file->f_cred);

Okay, with this patch applied we should the task/cred info when
do_acct_process is called.  Got it.


Messages:
[   56.643298] task:88841a9595c0 new cred:88841ae450c0 real
cred:88841ae450c0 cred:88841ae450c0//They are same.

Okay, it looks like do_acct_process() was called and f_cred,
real_cred, and cred are all the same.

This is a original message, without patch applied.



[   56.646609] Process accounting resumed

It looks like do_acct_process() has called check_free_space() now.  So
far so good.


[   56.649943] task:88841a9595c0 new cred:88841ae450c0 real
cred:88841c96c300 cred:88841ae450c0

Wait a minute ... why are we seeing this again?  Looking at the task
pointer and the timestamp, this is the same task exiting and trying to
write to the accounting file, yes?  This output is particularly
curious since it appears that real_cred has changed; where is this
happening?

This is the message when the BUG_ON was triggered without applying any
fix patch.


If we apply this patch "proc: prevent changes to overridden 
credentials", program

runs like this:

1. As print message shows, before overriden, the pointer has the 
following value:

real_cread=cred=0x88841ae450c0, f_cred=0x88841ae450c0
override_creds() is called in do_acct_process():
...
/* Perform file operations on behalf of whoever enabled accounting */
orig_cred = override_creds(file->f_cred);
...


2. After override_creds(), if (current_cred() != current_real_cred()) is 
not work here,

we will call commit_creds()  in security_setprocattr().
...
/* Prevent changes to overridden credentials. */
if (current_cred() != current_real_cred()) {
rcu_read_unlock();
return -EBUSY;
}
...


3. After commit_creds(), we have new cred and real_cred.
security_setprocattr()//commit_creds is called here

4. revert_creds() is called in in do_acct_process(), the cred
is reverted to the old value(0x88841ae450c0)
...
current->signal->rlim[RLIMIT_FSIZE].rlim_cur = flim;
revert_creds(orig_cred);

5. After reverting, cred and real_cred are not equal.
If it has a risk to trigger the BUG_ON, when doing another
commit_creds() ?





Re: linux-next: Fixes tag needs some work in the crypto tree

2019-04-22 Thread Herbert Xu
On Tue, Apr 23, 2019 at 07:30:36AM +1000, Stephen Rothwell wrote:
> 
> In commit
> 
>   f5a2aeb8b254 ("crypto: ccp - Do not free psp_master when PLATFORM_INIT 
> fails")
> 
> Fixes tag
> 
>   Fixes: 200664d5237f ("crypto: ccp: Add SEV support")
> 
> has these problem(s):
> 
>   - Subject does not match target commit subject
> Just use
>   git log -1 --format='Fixes: %h ("%s")'

Thanks for the heads up Stephen.

I have taken a look and it seems to be a rephrasing of the original
commit subject which seems to make sense.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[Question] Should direct reclaim time be bounded?

2019-04-22 Thread Mike Kravetz
I was looking into an issue on our distro kernel where allocation of huge
pages via "echo X > /proc/sys/vm/nr_hugepages" was taking a LONG time.
In this particular case, we were actually allocating huge pages VERY slowly
at the rate of about one every 30 seconds.  I don't want to talk about the
code in our distro kernel, but the situation that caused this issue exists
upstream and appears to be worse there.

One thing to note is that hugetlb page allocation can really stress the
page allocator.  The routine alloc_pool_huge_page is of special concern.

/*
 * Allocates a fresh page to the hugetlb allocator pool in the node interleaved
 * manner.
 */
static int alloc_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
{
struct page *page;
int nr_nodes, node;
gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;

for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
page = alloc_fresh_huge_page(h, gfp_mask, node, nodes_allowed);
if (page)
break;
}

if (!page)
return 0;

put_page(page); /* free it into the hugepage allocator */

return 1;
}

This routine is called for each huge page the user wants to allocate.  If
they do "echo 4096 > nr_hugepages", this is called 4096 times.
alloc_fresh_huge_page() will eventually call __alloc_pages_nodemask with
__GFP_COMP|__GFP_RETRY_MAYFAIL|__GFP_NOWARN in addition to __GFP_THISNODE.
That for_each_node_mask_to_alloc() macro is hugetlbfs specific and attempts
to allocate huge pages in a round robin fashion.  When asked to allocate a
huge page, it first tries the 'next_nid_to_alloc'.  If that fails, it goes
to the next allowed node.  This is 'documented' in kernel docs as:

"On a NUMA platform, the kernel will attempt to distribute the huge page pool
 over all the set of allowed nodes specified by the NUMA memory policy of the
 task that modifies nr_hugepages.  The default for the allowed nodes--when the
 task has default memory policy--is all on-line nodes with memory.  Allowed
 nodes with insufficient available, contiguous memory for a huge page will be
 silently skipped when allocating persistent huge pages.  See the discussion
 below of the interaction of task memory policy, cpusets and per node attributes
 with the allocation and freeing of persistent huge pages.

 The success or failure of huge page allocation depends on the amount of
 physically contiguous memory that is present in system at the time of the
 allocation attempt.  If the kernel is unable to allocate huge pages from
 some nodes in a NUMA system, it will attempt to make up the difference by
 allocating extra pages on other nodes with sufficient available contiguous
 memory, if any."

However, consider the case of a 2 node system where:
node 0 has 2GB memory
node 1 has 4GB memory

Now, if one wants to allocate 4GB of huge pages they may be tempted to simply,
"echo 2048 > nr_hugepages".  At first this will go well until node 0 is out
of memory.  When this happens, alloc_pool_huge_page() will continue to be
called.  Because of that for_each_node_mask_to_alloc() macro, it will likely
attempt to first allocate a page from node 0.  It will call direct reclaim and
compaction until it fails.  Then, it will successfully allocate from node 1.

In our distro kernel, I am thinking about making allocations try "less hard"
on nodes where we start to see failures.  less hard == NORETRY/NORECLAIM.
I was going to try something like this on an upstream kernel when I noticed
that it seems like direct reclaim may never end/exit.  It 'may' exit, but I
instrumented __alloc_pages_slowpath() and saw it take well over an hour
before I 'tricked' it into exiting.

[ 5916.248341] hpage_slow_alloc: jiffies 5295742  tries 2   node 0 success
[ 5916.249271]   reclaim 5295741  compact 1

This is where it stalled after "echo 4096 > nr_hugepages" on a little VM
with 8GB total memory.

I have not started looking at the direct reclaim code to see exactly where
we may be stuck, or trying really hard.  My question is, "Is this expected
or should direct reclaim be somewhat bounded?"  With __alloc_pages_slowpath
getting 'stuck' in direct reclaim, the documented behavior for huge page
allocation is not going to happen.
-- 
Mike Kravetz


Re: [PATCH] KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size

2019-04-22 Thread Peter Xu
On Wed, Apr 17, 2019 at 03:42:41PM +0200, Paolo Bonzini wrote:
> If a memory slot's size is not a multiple of 64 pages (256K), then
> the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages
> either requires the requested page range to go beyond memslot->npages,
> or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect
> requires log->num_pages to be both in range and aligned.
> 
> To allow this case, allow log->num_pages not to be a multiple of 64 if
> it ends exactly on the last page of the slot.
> 
> Reported-by: Peter Xu 
> Fixes: 98938aa8edd6 ("KVM: validate userspace input in 
> kvm_clear_dirty_log_protect()", 2019-01-02)
> Signed-off-by: Paolo Bonzini 
> ---
>  Documentation/virtual/kvm/api.txt| 5 +++--
>  tools/testing/selftests/kvm/dirty_log_test.c | 4 ++--
>  virt/kvm/kvm_main.c  | 7 ---
>  3 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index b62ad0d94234..de97369ad30d 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3829,8 +3829,9 @@ The ioctl clears the dirty status of pages in a memory 
> slot, according to
>  the bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap
>  field.  Bit 0 of the bitmap corresponds to page "first_page" in the
>  memory slot, and num_pages is the size in bits of the input bitmap.
> -Both first_page and num_pages must be a multiple of 64.  For each bit
> -that is set in the input bitmap, the corresponding page is marked "clean"
> +first_page must be a multiple of 64; num_pages must also be a multiple of
> +64 unless first_page + num_pages is the size of the memory slot.  For each
> +bit that is set in the input bitmap, the corresponding page is marked "clean"
>  in KVM's dirty bitmap, and dirty tracking is re-enabled for that page
>  (for example via write-protection, or by clearing the dirty bit in
>  a page table entry).
> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c 
> b/tools/testing/selftests/kvm/dirty_log_test.c
> index 4715cfba20dc..052fb5856df4 100644
> --- a/tools/testing/selftests/kvm/dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> @@ -289,7 +289,7 @@ static void run_test(enum vm_guest_mode mode, unsigned 
> long iterations,
>   max_gfn = (1ul << (guest_pa_bits - guest_page_shift)) - 1;
>   guest_page_size = (1ul << guest_page_shift);
>   /* 1G of guest page sized pages */
> - guest_num_pages = (1ul << (30 - guest_page_shift));
> + guest_num_pages = (1ul << (30 - guest_page_shift)) + 3;

Some comment mentioning the reason to shift a random number?

>   host_page_size = getpagesize();
>   host_num_pages = (guest_num_pages * guest_page_size) / host_page_size +
>!!((guest_num_pages * guest_page_size) % 
> host_page_size);
> @@ -359,7 +359,7 @@ static void run_test(enum vm_guest_mode mode, unsigned 
> long iterations,
>   kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap);
>  #ifdef USE_CLEAR_DIRTY_LOG
>   kvm_vm_clear_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap, 0,
> -DIV_ROUND_UP(host_num_pages, 64) * 64);
> +host_num_pages);
>  #endif
>   vm_dirty_log_verify(bmap);
>   iteration++;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f4da53321161..ace23d8a309f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1269,7 +1269,7 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
>   if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS)
>   return -EINVAL;
>  
> - if ((log->first_page & 63) || (log->num_pages & 63))
> + if (log->first_page & 63)
>   return -EINVAL;
>  
>   slots = __kvm_memslots(kvm, as_id);
> @@ -1282,8 +1282,9 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
>   n = kvm_dirty_bitmap_bytes(memslot);
>  
>   if (log->first_page > memslot->npages ||
> - log->num_pages > memslot->npages - log->first_page)
> - return -EINVAL;
> + log->num_pages > memslot->npages - log->first_page ||
> + (log->num_pages < memslot->npages - log->first_page && 
> (log->num_pages & 63)))
> + return -EINVAL;

There seems to be some indentation issue and overlong line, besides
that the patch content looks good to me.

Reviewed-by: Peter Xu 

Thanks,

-- 
Peter Xu


Re: [PATCH v2] Input: uinput: Avoid Object-Already-Free with a global lock

2019-04-22 Thread dmitry.torok...@gmail.com
On Fri, Apr 19, 2019 at 02:13:48PM +0530, Mukesh Ojha wrote:
> 
> On 4/19/2019 12:41 PM, dmitry.torok...@gmail.com wrote:
> > Hi Mukesh,
> > 
> > On Fri, Apr 19, 2019 at 12:17:44PM +0530, Mukesh Ojha wrote:
> > > For some reason my last mail did not get delivered,  sending it again.
> > > 
> > > 
> > > On 4/18/2019 11:55 AM, Mukesh Ojha wrote:
> > > > 
> > > > On 4/18/2019 7:13 AM, dmitry.torok...@gmail.com wrote:
> > > > > Hi Mukesh,
> > > > > 
> > > > > On Mon, Apr 15, 2019 at 03:35:51PM +0530, Mukesh Ojha wrote:
> > > > > > Hi Dmitry,
> > > > > > 
> > > > > > Can you please have a look at this patch ? as this seems to 
> > > > > > reproducing
> > > > > > quite frequently
> > > > > > 
> > > > > > Thanks,
> > > > > > Mukesh
> > > > > > 
> > > > > > On 4/10/2019 1:29 PM, Mukesh Ojha wrote:
> > > > > > > uinput_destroy_device() gets called from two places. In one place,
> > > > > > > uinput_ioctl_handler() where it is protected under a lock
> > > > > > > udev->mutex but there is no protection on udev device from freeing
> > > > > > > inside uinput_release().
> > > > > uinput_release() should be called when last file handle to the uinput
> > > > > instance is being dropped, so there should be no other users and thus 
> > > > > we
> > > > > can't be racing with anyone.
> > > > Lets say an example where i am creating input device quite frequently
> > > > 
> > > > [   97.836603] input: syz0 as /devices/virtual/input/input262
> > > > [   97.845589] input: syz0 as /devices/virtual/input/input261
> > > > [   97.849415] input: syz0 as /devices/virtual/input/input263
> > > > [   97.856479] input: syz0 as /devices/virtual/input/input264
> > > > [   97.936128] input: syz0 as /devices/virtual/input/input265
> > > > 
> > > > e.g input265
> > > > 
> > > > while input265 gets created [1] and handlers are getting registered on
> > > > that device*fput* gets called on
> > > > that device as user space got to know that input265 is created and its
> > > > reference is still 1(rare but possible).
> > Are you saying that there are 2 threads sharing the same file
> > descriptor, one issuing the registration ioctl while the other closing
> > the same fd?
> 
> Dmitry,
> 
> I don't have a the exact look inside the app here, but this looks like the
> same as it is able to do
> fput on the uinput device.
> 
> FYI
> Syskaller app is running in userspace (which is for syscall fuzzing) on
> kernel which is enabled
> with various config fault injection, FAULT_INJECTION,FAIL_SLAB,
> FAIL_PAGEALLOC, KASAN etc.

Mukesh,

We need to understand exactly the failure mode. I do not think that
introducing another mutex into uinput actually fixes the issue, as we do
not order mutex acquisition, so I think it is still possible for the
release function to acquire the mutex and run first, and then ioctl
would run with freed object.

My guess that this needs to be fixed in VFS layer.

Thanks.

-- 
Dmitry


Re: [PATCH 2/4] ARM: ep93xx: keypad: stop using mach/platform.h

2019-04-22 Thread Dmitry Torokhov
On Mon, Apr 15, 2019 at 09:25:24PM +0200, Arnd Bergmann wrote:
> We can communicate the clock rate using platform data rather than setting
> a flag to use a particular value in the driver, which is cleaner and
> avoids the dependency.
> 
> No platform in the kernel currently defines the ep93xx keypad device
> structure, so this is a rather pointless excercise.  Any out of tree
> users are probably dead now, but if not, they have to change their
> platform code to match the new platform_data structure.
> 
> Signed-off-by: Arnd Bergmann 

Acked-by: Dmitry Torokhov 

Please feel free to merge with the rest of the patches.

> ---
>  drivers/input/keyboard/Kconfig  | 2 +-
>  drivers/input/keyboard/ep93xx_keypad.c  | 5 +
>  include/linux/platform_data/keypad-ep93xx.h | 4 ++--
>  3 files changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/input/keyboard/Kconfig b/drivers/input/keyboard/Kconfig
> index a878351f1643..b373f3274542 100644
> --- a/drivers/input/keyboard/Kconfig
> +++ b/drivers/input/keyboard/Kconfig
> @@ -194,7 +194,7 @@ config KEYBOARD_LKKBD
>  
>  config KEYBOARD_EP93XX
>   tristate "EP93xx Matrix Keypad support"
> - depends on ARCH_EP93XX
> + depends on ARCH_EP93XX || COMPILE_TEST
>   select INPUT_MATRIXKMAP
>   help
> Say Y here to enable the matrix keypad on the Cirrus EP93XX.
> diff --git a/drivers/input/keyboard/ep93xx_keypad.c 
> b/drivers/input/keyboard/ep93xx_keypad.c
> index f77b295e0123..71472f6257c0 100644
> --- a/drivers/input/keyboard/ep93xx_keypad.c
> +++ b/drivers/input/keyboard/ep93xx_keypad.c
> @@ -137,10 +137,7 @@ static void ep93xx_keypad_config(struct ep93xx_keypad 
> *keypad)
>   struct ep93xx_keypad_platform_data *pdata = keypad->pdata;
>   unsigned int val = 0;
>  
> - if (pdata->flags & EP93XX_KEYPAD_KDIV)
> - clk_set_rate(keypad->clk, EP93XX_KEYTCHCLK_DIV4);
> - else
> - clk_set_rate(keypad->clk, EP93XX_KEYTCHCLK_DIV16);
> + clk_set_rate(keypad->clk, pdata->clk_rate);
>  
>   if (pdata->flags & EP93XX_KEYPAD_DISABLE_3_KEY)
>   val |= KEY_INIT_DIS3KY;
> diff --git a/include/linux/platform_data/keypad-ep93xx.h 
> b/include/linux/platform_data/keypad-ep93xx.h
> index 0e36818e3680..3054fced8509 100644
> --- a/include/linux/platform_data/keypad-ep93xx.h
> +++ b/include/linux/platform_data/keypad-ep93xx.h
> @@ -9,8 +9,7 @@ struct matrix_keymap_data;
>  #define EP93XX_KEYPAD_DIAG_MODE  (1<<1)  /* diagnostic mode */
>  #define EP93XX_KEYPAD_BACK_DRIVE (1<<2)  /* back driving mode */
>  #define EP93XX_KEYPAD_TEST_MODE  (1<<3)  /* scan only column 0 */
> -#define EP93XX_KEYPAD_KDIV   (1<<4)  /* 1/4 clock or 1/16 clock */
> -#define EP93XX_KEYPAD_AUTOREPEAT (1<<5)  /* enable key autorepeat */
> +#define EP93XX_KEYPAD_AUTOREPEAT (1<<4)  /* enable key autorepeat */
>  
>  /**
>   * struct ep93xx_keypad_platform_data - platform specific device structure
> @@ -24,6 +23,7 @@ struct ep93xx_keypad_platform_data {
>   unsigned intdebounce;
>   unsigned intprescale;
>   unsigned intflags;
> + unsigned intclk_rate;
>  };
>  
>  #define EP93XX_MATRIX_ROWS   (8)
> -- 
> 2.20.0
> 

-- 
Dmitry


Re: [PATCH v3] proc/sysctl: add shared variables for range check

2019-04-22 Thread Matteo Croce
On April 19, 2019 10:07:14 AM GMT+09:00, Matthew Wilcox  
wrote:
> On Fri, Apr 19, 2019 at 09:17:17AM +0900, Matteo Croce wrote:
> > > extern const int sysctl_zero;
> > > /* comment goes here */
> > > #define SYSCTL_ZERO ((void *)_zero)
> > > 
> > > and then use SYSCTL_ZERO everywhere.  That centralizes the
> ugliness
> > > and
> > > makes it easier to switch over if/when extra1&2 are constified.
> > > 
> > > But it's all a bit sad and lame :( 
> > 
> > No, we didn't decide yet. I need to check for all extra1,2
> assignment. Not an impossible task, anyway.
> > 
> > I agree that the casts are ugly. Your suggested macro moves the
> ugliness in a single point, which is good. Or maybe we can do a single
> macro like:
> > 
> > #define SYSCTL_VAL(x) ((void *)_##x)
> > 
> > to avoid defining one for every value. And when we decide that
> everything can be const, we just update the macro.
> 
> If we're going to do that, we can save two EXPORTs and do:
> 
> const int sysctl_vals[] = { 0, 1, -1 };
> EXPORT_SYMBOL(sysctl_vals);
> 
> #define SYSCTL_ZERO   ((void *)_vals[0])

Hi Matthew,

I like this approach, regardless of the const or not const extra1.

I'll be AFK for a few days, then I will investigate if extra1,2 can be made 
const and then prepare a v4 with the single export.

Thanks,
-- 
Matteo Croce
per aspera ad upstream


RE: [PATCH] drivers: hid: Add a module description line

2019-04-22 Thread Joseph Salisbury
Thanks for the feedback.  I'll probably update each patch subject with the 
module names as well.  I'll send a v2 for all three.

Thanks,

Joe


-Original Message-
From: Michael Kelley  
Sent: Monday, April 22, 2019 11:16 PM
To: Joseph Salisbury ; KY Srinivasan 
; Haiyang Zhang ; Stephen Hemminger 
; sas...@kernel.org; ji...@kernel.org; 
benjamin.tissoi...@redhat.com
Cc: linux-hyp...@vger.kernel.org; linux-in...@vger.kernel.org; 
linux-kernel@vger.kernel.org
Subject: RE: [PATCH] drivers: hid: Add a module description line

From: Joseph Salisbury  Sent: Monday, April 22, 
2019 2:31 PM
> 
> Signed-off-by: Joseph Salisbury 
> ---
>  drivers/hid/hid-hyperv.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c index 
> 704049e62d58..d3311d714d35 100644
> --- a/drivers/hid/hid-hyperv.c
> +++ b/drivers/hid/hid-hyperv.c
> @@ -614,5 +614,7 @@ static void __exit mousevsc_exit(void)  }
> 
>  MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Microsoft Hyper-V Synthetic HID Driver");
> +
>  module_init(mousevsc_init);
>  module_exit(mousevsc_exit);
> --
> 2.17.1

Even though it will likely be redundant with the commit title, there
probably needs to be a short commit message.   (And also with the
other two similar patches.)

Michael



Re: [PATCH] KVM: x86: Add Intel CPUID.1F cpuid emulation support

2019-04-22 Thread Like Xu

On 2019/4/23 2:35, Sean Christopherson wrote:

On Mon, Apr 22, 2019 at 02:40:34PM +0800, Like Xu wrote:

Expose Intel V2 Extended Topology Enumeration Leaf to guest only when
host system has multiple software-visible die within each package.

Signed-off-by: Like Xu 
---
  arch/x86/kvm/cpuid.c | 13 +
  1 file changed, 13 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index fd39516..9fc14f2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
return xcr0;
  }
  
+/* We need to check if the host cpu has multi-chip packaging technology. */

+static bool kvm_supported_intel_mcp(void)
+{
+   u32 eax, ignored;
+
+   cpuid_count(0x1f, 0, , , , );
+
+   return boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && (eax != 0);
+}
+
  #define F(x) bit(X86_FEATURE_##x)
  
  int kvm_update_cpuid(struct kvm_vcpu *vcpu)

@@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
switch (function) {
case 0:
entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
+   entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry->eax;


This all seems unnecessary.  And by 'all', I mean the existing Intel PT
and XSAVE leaf checks, as well as the new mcp check.  entry->eax comes
directly from hardware, and unless I missed something, PT and XSAVE are
only exposed to the guest when they're supported in hardware.  In other
words, KVM will never need to adjust entry->eax to expose PT or XSAVE.


We call this function for both case KVM_GET_SUPPORTED_CPUID and 
KVM_GET_EMULATED_CPUID although kvm user could reconfig them via 
KVM_SET_CPUID* path.




The original min() check was added by commit 0771671749b5 ("KVM: Enhance
guest cpuid management"), which doesn't provide any explicit information
on why KVM does min() in the first place.  


Exposing cpuid.0.eax in a blind way (with host hardware support)
is not a good practice for guest migration and improves compatibility 
requirements.



Given that the original code
was "entry->eax = min(entry->eax, (u32)0xb);", my *guess* is that the
idea was to always report "Extended Topology Enumeration Leaf" as
supported so that userspace can enumerate the VM's topology to the guest
even when hardware itself doesn't do so.


If the host cpu mode is too antiquated to support 0xb, it wouldn't 
report 0xb for sure. The host cpuid.0.eax has been over 0xb for a long 
time and reached 0x1f in the latest SDM.


AFAICT, the original code keeps minimum cpuid.0.eax out of features 
guest just used or at least it claimed to use.




Assuming we want to allow userspace to use "V2 Extended Topology
Enumeration Leaf" regardless of hardware support, then this can simply be:

   entry->eax = min(entry->eax, (u32)0x1f);

Or am I completely missing something?


break;
case 1:
entry->edx &= kvm_cpuid_1_edx_x86_features;
@@ -544,6 +555,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
entry->edx = edx.full;
break;
}
+   /* function 0x1f has additional index. */


The original comment is rather useless, it's obvious from the code that
it has additional indices.  No need to repeat its sins.  A more useful
comment would be to explain that 0x1f and 0xb have identical formats and
thus can be handled by common code.


I agree and let me fix it in next version.



Which begs the question, why does leaf 0x1f exist?  AFAICT the only
difference is that 0x1f supports additional "level types", but 0x1f's
types are backwards compatibile.  Any idea why leaf 0xb wasn't simply
extended for the new types?


It's not just about backwards compatibility on numerical parsing.

So many software (whatever OS and applications) are using 0x1b
to get CPU topology. In most cases, they (legacy code) would assume that 
the next level of CORE is package (at lease for Intel) not die and it's 
a semantic conflict if we reuse 0xb.


As said in SDM, Intel recommends first checking for the existence of 
Leaf 1FH and using this if available.





+   case 0x1f:
/* function 0xb has additional index. */
case 0xb: {
int i, level_type;
--
1.8.3.1







RE: [PATCH] drivers: hid: Add a module description line

2019-04-22 Thread Michael Kelley
From: Joseph Salisbury  Sent: Monday, April 22, 
2019 2:31 PM
> 
> Signed-off-by: Joseph Salisbury 
> ---
>  drivers/hid/hid-hyperv.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
> index 704049e62d58..d3311d714d35 100644
> --- a/drivers/hid/hid-hyperv.c
> +++ b/drivers/hid/hid-hyperv.c
> @@ -614,5 +614,7 @@ static void __exit mousevsc_exit(void)
>  }
> 
>  MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Microsoft Hyper-V Synthetic HID Driver");
> +
>  module_init(mousevsc_init);
>  module_exit(mousevsc_exit);
> --
> 2.17.1

Even though it will likely be redundant with the commit title, there
probably needs to be a short commit message.   (And also with the
other two similar patches.)

Michael



Re: [PATCH 2/2] x86/tsc: set LAPIC timer frequency to crystal clock frequency

2019-04-22 Thread Daniel Drake
On Mon, Apr 22, 2019 at 8:04 PM Ingo Molnar  wrote:
> Minor style nit: the parentheses are unnecessary, integer expressions
> like this are evaluated left to right and multiplication and division has
> the same precedence.

Fair point, although the same could be said for cpu_khz_from_msr().

> But it might also make sense to actually store crystal_mhz instead of
> crystal_khz, because both CPUID 15H and 16H provides MHz values.
>
> That way the above expression would simplify to:
>
> lapic_timer_frequency = crystal_mhz / HZ;

Wouldn't it be
lapic_timer_frequency = crystal_mhz * 100 / HZ;
?

Thanks
Daniel


Re: [PATCH v3 14/28] userfaultfd: wp: handle COW properly for uffd-wp

2019-04-22 Thread Peter Xu
On Mon, Apr 22, 2019 at 10:54:02AM -0400, Jerome Glisse wrote:
> On Mon, Apr 22, 2019 at 08:20:10PM +0800, Peter Xu wrote:
> > On Fri, Apr 19, 2019 at 11:02:53AM -0400, Jerome Glisse wrote:
> > 
> > [...]
> > 
> > > > > > +   if (uffd_wp_resolve) {
> > > > > > +   /* If the fault is resolved already, 
> > > > > > skip */
> > > > > > +   if (!pte_uffd_wp(*pte))
> > > > > > +   continue;
> > > > > > +   page = vm_normal_page(vma, addr, 
> > > > > > oldpte);
> > > > > > +   if (!page || page_mapcount(page) > 1) {
> > > > > > +   struct vm_fault vmf = {
> > > > > > +   .vma = vma,
> > > > > > +   .address = addr & 
> > > > > > PAGE_MASK,
> > > > > > +   .page = page,
> > > > > > +   .orig_pte = oldpte,
> > > > > > +   .pmd = pmd,
> > > > > > +   /* pte and ptl not 
> > > > > > needed */
> > > > > > +   };
> > > > > > +   vm_fault_t ret;
> > > > > > +
> > > > > > +   if (page)
> > > > > > +   get_page(page);
> > > > > > +   arch_leave_lazy_mmu_mode();
> > > > > > +   pte_unmap_unlock(pte, ptl);
> > > > > > +   ret = wp_page_copy();
> > > > > > +   /* PTE is changed, or OOM */
> > > > > > +   if (ret == 0)
> > > > > > +   /* It's done by others 
> > > > > > */
> > > > > > +   continue;
> > > > > 
> > > > > This is wrong if ret == 0 you still need to remap the pte before
> > > > > continuing as otherwise you will go to next pte without the page
> > > > > table lock for the directory. So 0 case must be handled after
> > > > > arch_enter_lazy_mmu_mode() below.
> > > > > 
> > > > > Sorry i should have catch that in previous review.
> > > > 
> > > > My fault to not have noticed it since the very beginning... thanks for
> > > > spotting that.
> > > > 
> > > > I'm squashing below changes into the patch:
> > > 
> > > 
> > > Well thinking of this some more i think you should use do_wp_page() and
> > > not wp_page_copy() it would avoid bunch of code above and also you are
> > > not properly handling KSM page or page in the swap cache. Instead of
> > > duplicating same code that is in do_wp_page() it would be better to call
> > > it here.
> > 
> > Yeah it makes sense to me.  Then here's my plan:
> > 
> > - I'll need to drop previous patch "export wp_page_copy" since then
> >   it'll be not needed
> > 
> > - I'll introduce another patch to split current do_wp_page() and
> >   introduce function "wp_page_copy_cont" (better suggestion on the
> >   naming would be welcomed) which contains most of the wp handling
> >   that'll be needed for change_pte_range() in this patch and isolate
> >   the uffd handling:
> > 
> > static vm_fault_t do_wp_page(struct vm_fault *vmf)
> > __releases(vmf->ptl)
> > {
> > struct vm_area_struct *vma = vmf->vma;
> > 
> > if (userfaultfd_pte_wp(vma, *vmf->pte)) {
> > pte_unmap_unlock(vmf->pte, vmf->ptl);
> > return handle_userfault(vmf, VM_UFFD_WP);
> > }
> > 
> > return do_wp_page_cont(vmf);
> > }
> > 
> > Then I can probably use do_wp_page_cont() in this patch.
> 
> Instead i would keep the do_wp_page name and do:
> static vm_fault_t do_userfaultfd_wp_page(struct vm_fault *vmf) {
> ... // what you have above
> return do_wp_page(vmf);
> }
> 
> Naming wise i think it would be better to keep do_wp_page() as
> is.

In case I misunderstood... what I've proposed will be simply:

diff --git a/mm/memory.c b/mm/memory.c
index 64bd8075f054..ab98a1eb4702 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2497,6 +2497,14 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
return handle_userfault(vmf, VM_UFFD_WP);
}

+   return do_wp_page_cont(vmf);
+}
+
+vm_fault_t do_wp_page_cont(struct vm_fault *vmf)
+   __releases(vmf->ptl)
+{
+   struct vm_area_struct *vma = vmf->vma;
+
vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
if (!vmf->page) {
/*

And the other proposal is:

diff --git a/mm/memory.c b/mm/memory.c
index 64bd8075f054..a73792127553 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2469,6 +2469,8 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf)
return VM_FAULT_WRITE;
 }

+static vm_fault_t do_wp_page(struct vm_fault *vmf);
+
 /*
  * This routine handles present 

Re: [PATCH] fs/proc/proc_sysctl.c: Fix a NULL pointer dereference

2019-04-22 Thread YueHaibing
Friendly ping...

On 2019/4/9 23:36, Yue Haibing wrote:
> From: YueHaibing 
> 
> Syzkaller report this:
> 
> sysctl could not get directory: /net//bridge -12
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN PTI
> CPU: 1 PID: 7027 Comm: syz-executor.0 Tainted: G C5.1.0-rc3+ 
> #8
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 
> 04/01/2014
> RIP: 0010:__write_once_size include/linux/compiler.h:220 [inline]
> RIP: 0010:__rb_change_child include/linux/rbtree_augmented.h:144 [inline]
> RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:186 [inline]
> RIP: 0010:rb_erase+0x5f4/0x19f0 lib/rbtree.c:459
> Code: 00 0f 85 60 13 00 00 48 89 1a 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f 
> c3 48 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 
> 0c 00 00 4d 85 ed 4c 89 2e 74 ce 4c 89 ea 48
> RSP: 0018:8881bb507778 EFLAGS: 00010206
> RAX: dc00 RBX: 8881f224b5b8 RCX: 818f3f6a
> RDX: 000a RSI: 0050 RDI: 8881f224b568
> RBP:  R08: ed10376a0ef4 R09: ed10376a0ef4
> R10: 0001 R11: ed10376a0ef4 R12: 8881f224b558
> R13:  R14:  R15: 
> FS:  7f3e7ce13700() GS:8881f730() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7fd60fbe9398 CR3: 0001cb55c001 CR4: 007606e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> PKRU: 5554
> Call Trace:
>  erase_entry fs/proc/proc_sysctl.c:178 [inline]
>  erase_header+0xe3/0x160 fs/proc/proc_sysctl.c:207
>  start_unregistering fs/proc/proc_sysctl.c:331 [inline]
>  drop_sysctl_table+0x558/0x880 fs/proc/proc_sysctl.c:1631
>  get_subdir fs/proc/proc_sysctl.c:1022 [inline]
>  __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
>  ? 0xc1a88000
>  br_netfilter_init+0x68/0x1000 [br_netfilter]
>  do_one_initcall+0xbc/0x47d init/main.c:901
>  do_init_module+0x1b5/0x547 kernel/module.c:3456
>  load_module+0x6405/0x8c10 kernel/module.c:3804
>  __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
>  do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x462e99
> Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 
> 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7f3e7ce12c58 EFLAGS: 0246 ORIG_RAX: 0139
> RAX: ffda RBX: 0073bf00 RCX: 00462e99
> RDX:  RSI: 2280 RDI: 0003
> RBP: 7f3e7ce12c70 R08:  R09: 
> R10:  R11: 0246 R12: 7f3e7ce136bc
> R13: 004bcefa R14: 006f6fb0 R15: 0004
> Modules linked in: br_netfilter(+) backlight comedi(C) hid_sensor_hub max3100 
> ti_ads8688 udc_core fddi snd_mona leds_gpio rc_streamzap mtd pata_netcell 
> nf_log_common rc_winfast udp_tunnel snd_usbmidi_lib snd_usb_toneport 
> snd_usb_line6 snd_rawmidi snd_seq_device snd_hwdep videobuf2_v4l2 
> videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops 
> rc_gadmei_rm008z 8250_of smm665 hid_tmff hid_saitek hwmon_vid 
> rc_ati_tv_wonder_hd_600 rc_core pata_pdc202xx_old dn_rtmsg as3722 ad714x_i2c 
> ad714x snd_soc_cs4265 hid_kensington panel_ilitek_ili9322 drm 
> drm_panel_orientation_quirks ipack cdc_phonet usbcore phonet hid_jabra hid 
> extcon_arizona can_dev industrialio_triggered_buffer kfifo_buf industrialio 
> adm1031 i2c_mux_ltc4306 i2c_mux ipmi_msghandler mlxsw_core snd_soc_cs35l34 
> snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd 
> soundcore gpio_da9055 uio ecdh_generic mdio_thunder of_mdio fixed_phy libphy 
> mdio_cavium iptable_security iptable_raw iptable_mangle
>  iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter 
> bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim 
> vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan 
> bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm 
> kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel 
> ghash_clmulni_intel aesni_intel ide_pci_generic piix aes_x86_64 crypto_simd 
> cryptd ide_core glue_helper input_leds psmouse intel_agp intel_gtt serio_raw 
> ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos 
> sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: 
> br_netfilter]
> Dumping ftrace buffer:
>(ftrace buffer empty)
> ---[ end trace 68741688d5fbfe85 ]---
> 
> commit 23da9588037e forget handle start_unregistering() case,
> 

[PATCH] staging: vchiq_arm: Fix misuse of %x

2019-04-22 Thread Fuqian Huang
Pointers should be printed with %p or %px rather than
cast to unsigned long type and printed with %lx.
Change %lx to %pK to print the pointers.

Signed-off-by: Fuqian Huang 
---
 drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c 
b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
index 064d0db..c2c9fae 100644
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
@@ -1486,16 +1486,16 @@ vchiq_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
if ((status == VCHIQ_SUCCESS) && (ret < 0) && (ret != -EINTR) &&
(ret != -EWOULDBLOCK))
vchiq_log_info(vchiq_arm_log_level,
-   "  ioctl instance %lx, cmd %s -> status %d, %ld",
-   (unsigned long)instance,
+   "  ioctl instance %pK, cmd %s -> status %d, %ld",
+   instance,
(_IOC_NR(cmd) <= VCHIQ_IOC_MAX) ?
ioctl_names[_IOC_NR(cmd)] :
"",
status, ret);
else
vchiq_log_trace(vchiq_arm_log_level,
-   "  ioctl instance %lx, cmd %s -> status %d, %ld",
-   (unsigned long)instance,
+   "  ioctl instance %pK, cmd %s -> status %d, %ld",
+   instance,
(_IOC_NR(cmd) <= VCHIQ_IOC_MAX) ?
ioctl_names[_IOC_NR(cmd)] :
"",
-- 
2.11.0



[PATCH] staging: vchiq_arm: Fix misuse of %x

2019-04-22 Thread Fuqian Huang
Pointers should be printed with %p or %px rather than
cast to unsigned long type and printed with %lx.
Change %lx to %pK to print the pointers.

Signed-off-by: Fuqian Huang 
---
 drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c 
b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
index 064d0db..c2c9fae 100644
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
@@ -1486,16 +1486,16 @@ vchiq_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
if ((status == VCHIQ_SUCCESS) && (ret < 0) && (ret != -EINTR) &&
(ret != -EWOULDBLOCK))
vchiq_log_info(vchiq_arm_log_level,
-   "  ioctl instance %lx, cmd %s -> status %d, %ld",
-   (unsigned long)instance,
+   "  ioctl instance %pK, cmd %s -> status %d, %ld",
+   instance,
(_IOC_NR(cmd) <= VCHIQ_IOC_MAX) ?
ioctl_names[_IOC_NR(cmd)] :
"",
status, ret);
else
vchiq_log_trace(vchiq_arm_log_level,
-   "  ioctl instance %lx, cmd %s -> status %d, %ld",
-   (unsigned long)instance,
+   "  ioctl instance %pK, cmd %s -> status %d, %ld",
+   instance,
(_IOC_NR(cmd) <= VCHIQ_IOC_MAX) ?
ioctl_names[_IOC_NR(cmd)] :
"",
-- 
2.11.0



[PATCH v6] arm64: dts: ls1088a: add one more thermal zone node

2019-04-22 Thread Yuantian Tang
Ls1088a has 2 thermal sensors, core cluster and SoC platform. Core cluster
sensor is used to monitor the temperature of core and SoC platform is for
platform. The current dts only support the first sensor.
This patch adds the second sensor node to dts to enable it.

Signed-off-by: Yuantian Tang 
---
v6:
- add cooling device map to cpu0-7 in platform node.
 arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi |   43 +--
 1 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
index 661137f..a697a82 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
@@ -129,19 +129,19 @@
};
 
thermal-zones {
-   cpu_thermal: cpu-thermal {
+   core-cluster {
polling-delay-passive = <1000>;
polling-delay = <5000>;
thermal-sensors = < 0>;
 
trips {
-   cpu_alert: cpu-alert {
+   core_cluster_alert: core-cluster-alert {
temperature = <85000>;
hysteresis = <2000>;
type = "passive";
};
 
-   cpu_crit: cpu-crit {
+   core_cluster_crit: core-cluster-crit {
temperature = <95000>;
hysteresis = <2000>;
type = "critical";
@@ -150,7 +150,42 @@
 
cooling-maps {
map0 {
-   trip = <_alert>;
+   trip = <_cluster_alert>;
+   cooling-device =
+   < THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>,
+   < THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>,
+   < THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>,
+   < THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>,
+   < THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>,
+   < THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>,
+   < THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>,
+   < THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>;
+   };
+   };
+   };
+
+   platform {
+   polling-delay-passive = <1000>;
+   polling-delay = <5000>;
+   thermal-sensors = < 1>;
+
+   trips {
+   platform_alert: platform-alert {
+   temperature = <85000>;
+   hysteresis = <2000>;
+   type = "passive";
+   };
+
+   platform_crit: platform-crit {
+   temperature = <95000>;
+   hysteresis = <2000>;
+   type = "critical";
+   };
+   };
+
+   cooling-maps {
+   map0 {
+   trip = <_alert>;
cooling-device =
< THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>,
< THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>,
-- 
1.7.1



Re: scripts/selinux build error in 4.14 after glibc update

2019-04-22 Thread Nathan Chancellor
On Mon, Apr 22, 2019 at 09:59:47PM -0400, Paul Moore wrote:
> On Mon, Apr 22, 2019 at 5:00 PM Nathan Chancellor
>  wrote:
> > Hi all,
> >
> > After a glibc update to 2.29, my 4.14 builds started failing like so:
> 
> ...
> 
> >   HOSTCC  scripts/selinux/genheaders/genheaders
> > In file included from scripts/selinux/genheaders/genheaders.c:19:
> > ./security/selinux/include/classmap.h:245:2: error: #error New address 
> > family defined, please update secclass_map.
> >  #error New address family defined, please update secclass_map.
> >   ^
> 
> This is a known problem that has a fix in the selinux/next branch and
> will be going up to Linus during the next merge window.  The fix is
> quite small and should be relatively easy for you to backport to your
> kernel build if you are interested; the patch can be found at the
> archive link below:
> 
> https://lore.kernel.org/selinux/20190225005528.28371-1-pa...@paulo.ac
> 
> -- 
> paul moore
> www.paul-moore.com

Awesome, thank you! I will apply that for now and wait for it to get
backported to stable after the next merge window.

I appreciate the quick response,
Nathan


[PATCH v2 2/2] ASoC: sprd: Add Spreadtrum multi-channel data transfer support

2019-04-22 Thread Baolin Wang
On Spreadtrum platform, the audio subsystem will use the multi-channel
data transfer controller to transfer sound stream between audio subsystem
and other AP/CP subsystem.

It can support 10 DAC channel and 10 ADC channel, and each channel has
512 bytes depth data fifo. Moreover each channel can be used DMA mode
or interrupt mode to transfer data.

Signed-off-by: Baolin Wang 
---
Changes from v1:
 - Move the driver from drivers/soc/sprd to sound/soc/sprd/, since it
 is only used by audio driver.
 - Rename the driver file and head file.
---
 sound/soc/sprd/Kconfig |8 +
 sound/soc/sprd/Makefile|2 +
 sound/soc/sprd/sprd-mcdt.c | 1011 
 sound/soc/sprd/sprd-mcdt.h |  107 +
 4 files changed, 1128 insertions(+)
 create mode 100644 sound/soc/sprd/sprd-mcdt.c
 create mode 100644 sound/soc/sprd/sprd-mcdt.h

diff --git a/sound/soc/sprd/Kconfig b/sound/soc/sprd/Kconfig
index 3b1eb32..21f9cc3 100644
--- a/sound/soc/sprd/Kconfig
+++ b/sound/soc/sprd/Kconfig
@@ -5,3 +5,11 @@ config SND_SOC_SPRD
help
  Say Y or M if you want to add support for codecs attached to
  the Spreadtrum SoCs' Audio interfaces.
+
+config SND_SOC_SPRD_MCDT
+   bool "Spreadtrum multi-channel data transfer support"
+   depends on SND_SOC_SPRD
+   help
+ Say y here to enable multi-channel data transfer support. It
+ is used for sound stream transmission between audio subsystem
+ and other AP/CP subsystem.
diff --git a/sound/soc/sprd/Makefile b/sound/soc/sprd/Makefile
index e6c2606..a95fa56 100644
--- a/sound/soc/sprd/Makefile
+++ b/sound/soc/sprd/Makefile
@@ -4,3 +4,5 @@
 snd-soc-sprd-platform-objs := sprd-pcm-dma.o sprd-pcm-compress.o
 
 obj-$(CONFIG_SND_SOC_SPRD) += snd-soc-sprd-platform.o
+
+obj-$(CONFIG_SND_SOC_SPRD_MCDT) += sprd-mcdt.o
diff --git a/sound/soc/sprd/sprd-mcdt.c b/sound/soc/sprd/sprd-mcdt.c
new file mode 100644
index 000..28f5e64
--- /dev/null
+++ b/sound/soc/sprd/sprd-mcdt.c
@@ -0,0 +1,1011 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2019 Spreadtrum Communications Inc.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "sprd-mcdt.h"
+
+/* MCDT registers definition */
+#define MCDT_CH0_TXD   0x0
+#define MCDT_CH0_RXD   0x28
+#define MCDT_DAC0_WTMK 0x60
+#define MCDT_ADC0_WTMK 0x88
+#define MCDT_DMA_EN0xb0
+
+#define MCDT_INT_EN0   0xb4
+#define MCDT_INT_EN1   0xb8
+#define MCDT_INT_EN2   0xbc
+
+#define MCDT_INT_CLR0  0xc0
+#define MCDT_INT_CLR1  0xc4
+#define MCDT_INT_CLR2  0xc8
+
+#define MCDT_INT_RAW1  0xcc
+#define MCDT_INT_RAW2  0xd0
+#define MCDT_INT_RAW3  0xd4
+
+#define MCDT_INT_MSK1  0xd8
+#define MCDT_INT_MSK2  0xdc
+#define MCDT_INT_MSK3  0xe0
+
+#define MCDT_DAC0_FIFO_ADDR_ST 0xe4
+#define MCDT_ADC0_FIFO_ADDR_ST 0xe8
+
+#define MCDT_CH_FIFO_ST0   0x134
+#define MCDT_CH_FIFO_ST1   0x138
+#define MCDT_CH_FIFO_ST2   0x13c
+
+#define MCDT_INT_MSK_CFG0  0x140
+#define MCDT_INT_MSK_CFG1  0x144
+
+#define MCDT_DMA_CFG0  0x148
+#define MCDT_FIFO_CLR  0x14c
+#define MCDT_DMA_CFG1  0x150
+#define MCDT_DMA_CFG2  0x154
+#define MCDT_DMA_CFG3  0x158
+#define MCDT_DMA_CFG4  0x15c
+#define MCDT_DMA_CFG5  0x160
+
+/* Channel water mark definition */
+#define MCDT_CH_FIFO_AE_SHIFT  16
+#define MCDT_CH_FIFO_AE_MASK   GENMASK(24, 16)
+#define MCDT_CH_FIFO_AF_MASK   GENMASK(8, 0)
+
+/* DMA channel select definition */
+#define MCDT_DMA_CH0_SEL_MASK  GENMASK(3, 0)
+#define MCDT_DMA_CH0_SEL_SHIFT 0
+#define MCDT_DMA_CH1_SEL_MASK  GENMASK(7, 4)
+#define MCDT_DMA_CH1_SEL_SHIFT 4
+#define MCDT_DMA_CH2_SEL_MASK  GENMASK(11, 8)
+#define MCDT_DMA_CH2_SEL_SHIFT 8
+#define MCDT_DMA_CH3_SEL_MASK  GENMASK(15, 12)
+#define MCDT_DMA_CH3_SEL_SHIFT 12
+#define MCDT_DMA_CH4_SEL_MASK  GENMASK(19, 16)
+#define MCDT_DMA_CH4_SEL_SHIFT 16
+#define MCDT_DAC_DMA_SHIFT 16
+
+/* DMA channel ACK select definition */
+#define MCDT_DMA_ACK_SEL_MASK  GENMASK(3, 0)
+
+/* Channel FIFO definition */
+#define MCDT_CH_FIFO_ADDR_SHIFT16
+#define MCDT_CH_FIFO_ADDR_MASK GENMASK(9, 0)
+#define MCDT_ADC_FIFO_SHIFT16
+#define MCDT_FIFO_LENGTH   512
+
+#define MCDT_ADC_CHANNEL_NUM   10
+#define MCDT_DAC_CHANNEL_NUM   10
+#define MCDT_CHANNEL_NUM   (MCDT_ADC_CHANNEL_NUM + MCDT_DAC_CHANNEL_NUM)
+
+enum sprd_mcdt_fifo_int {
+   MCDT_ADC_FIFO_AE_INT,
+   MCDT_ADC_FIFO_AF_INT,
+   MCDT_DAC_FIFO_AE_INT,
+   MCDT_DAC_FIFO_AF_INT,
+   MCDT_ADC_FIFO_OV_INT,
+   MCDT_DAC_FIFO_OV_INT
+};
+
+enum sprd_mcdt_fifo_sts {
+   MCDT_ADC_FIFO_REAL_FULL,
+   MCDT_ADC_FIFO_REAL_EMPTY,
+   MCDT_ADC_FIFO_AF,
+   MCDT_ADC_FIFO_AE,
+   MCDT_DAC_FIFO_REAL_FULL,
+   MCDT_DAC_FIFO_REAL_EMPTY,
+   MCDT_DAC_FIFO_AF,
+   

[PATCH v2 1/2] dt-bindings: ASoC: Add Spreadtrum multi-channel data transfer support

2019-04-22 Thread Baolin Wang
On Spreadtrum platform, the audio subsystem will use the multi-channel
data transfer controller to transfer sound stream between audio subsystem
and other AP/CP subsystem.

It can support 10 DAC channel and 10 ADC channel, and each channel has
512 bytes depth data fifo. Moreover each channel can be used DMA mode
or interrupt mode to transfer data.

Signed-off-by: Baolin Wang 
---
Changes from v1:
 - Move the documentation into sound/.
---
 .../devicetree/bindings/sound/sprd-mcdt.txt|   19 +++
 1 file changed, 19 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/sprd-mcdt.txt

diff --git a/Documentation/devicetree/bindings/sound/sprd-mcdt.txt 
b/Documentation/devicetree/bindings/sound/sprd-mcdt.txt
new file mode 100644
index 000..274ba0a
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/sprd-mcdt.txt
@@ -0,0 +1,19 @@
+Spreadtrum Multi-Channel Data Transfer Binding
+
+The Multi-channel data transfer controller is used for sound stream
+transmission between audio subsystem and other AP/CP subsystem. It
+supports 10 DAC channel and 10 ADC channel, and each channel can be
+configured with DMA mode or interrupt mode.
+
+Required properties:
+- compatible: Should be "sprd,sc9860-mcdt".
+- reg: Should contain registers address and length.
+- interrupts: Should contain one interrupt shared by all channel.
+
+Example:
+
+mcdt@4149 {
+   compatible = "sprd,sc9860-mcdt";
+   reg = <0 0x4149 0 0x170>;
+   interrupts = ;
+};
-- 
1.7.9.5



[PATCH] staging: most: protect potential string overflow

2019-04-22 Thread Bo YU
There maybe cause potential string overflow issue due to use
strcpy without checking the length

Detected By CoversityScan CID# 1444760

Fixes: 131ac62253dba:(staging: most: core: use device description as name)
Signed-off-by: Bo YU 
---
 drivers/staging/most/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/most/core.c b/drivers/staging/most/core.c
index 956daf8c3bd2..0f26cebac91a 100644
--- a/drivers/staging/most/core.c
+++ b/drivers/staging/most/core.c
@@ -1431,7 +1431,7 @@ int most_register_interface(struct most_interface *iface)
 
INIT_LIST_HEAD(>p->channel_list);
iface->p->dev_id = id;
-   strcpy(iface->p->name, iface->description);
+   strlcpy(iface->p->name, iface->description, sizeof(iface->p->name));
iface->dev.init_name = iface->p->name;
iface->dev.bus = 
iface->dev.parent = 
-- 
2.11.0



Re: [RFC PATCH 0/5] NUMA Balancer Suite

2019-04-22 Thread 王贇
On 2019/4/22 下午10:34, 禹舟键 wrote:
> Hi, Michael
> I really want to know how could you fix the conflict between numa balancer 
> and load balancer. Maybe you gained numa bonus by migrating some tasks to the 
> node with most of the cache there, but, cpu load balance was break, so how to 
> do it ?

The trick here is to allow migration when load balancing keep failing,
which means no better tasks to move.

However, since the idea here is cgroup workloads scheduling, it could be
hard to make sure load balanced, for example only two cgroup with different
workloads and putting them to different node.

Thus why we make this a module, rather than changing the kernel logical,
at this moment not every situation could gain benefit from numa balancer,
but in some situations, balanced load can't bring benefit while numa
balancer could.

Also we are improving the module to give it an overall sight, so it will
know whether the decision is breaking the load balance, but this introduced
big lock and more per cpu/node counters, we need more testing to know whether
this is really helpful.

Anyway, if you have any scenery may could gain benefit, please take a try
and let me know what's the problem is, we'll try to address them :-)

Regards,
Michael Wang

> 
> Thanks
> Wind
> 
> 
> 王贇 mailto:yun.w...@linux.alibaba.com>> 
> 于2019年4月22日周一 上午10:13写道:
> 
> We have NUMA Balancing feature which always trying to move pages
> of a task to the node it executed more, while still got issues:
> 
> * page cache can't be handled
> * no cgroup level balancing
> 
> Suppose we have a box with 4 cpu, two cgroup A & B each running 4 tasks,
> below scenery could be easily observed:
> 
> NODE0                   |       NODE1
>                         |
> CPU0            CPU1    |       CPU2            CPU3
> task_A0         task_A1 |       task_A2         task_A3
> task_B0         task_B1 |       task_B2         task_B3
> 
> and usually with the equal memory consumption on each node, when tasks 
> have
> similar behavior.
> 
> In this case numa balancing try to move pages of task_A0,1 & task_B0,1 to 
> node 0,
> pages of task_A2,3 & task_B2,3 to node 1, but page cache will be located 
> randomly,
> depends on the first read/write CPU location.
> 
> Let's suppose another scenery:
> 
> NODE0                   |       NODE1
>                         |
> CPU0            CPU1    |       CPU2            CPU3
> task_A0         task_A1 |       task_B0         task_B1
> task_A2         task_A3 |       task_B2         task_B3
> 
> By switching the cpu & memory resources of task_A0,1 and task_B0,1, now 
> workloads
> of cgroup A all on node 0, and cgroup B all on node 1, resource 
> consumption are same
> but related tasks could share a closer cpu cache, while cache still 
> randomly located.
> 
> Now what if the workloads generate lot's of page cache, and most of the 
> memory
> accessing are page cache writing?
> 
> A page cache generated by task_A0 on NODE1 won't follow it to NODE0, but 
> if task_A0
> was already on NODE0 before it read/write files, caches will be there, so 
> how to
> make sure this happen?
> 
> Usually we could solve this problem by binding workloads on a single 
> node, if the
> cgroup A was binding to CPU0,1, then all the caches it generated will be 
> on NODE0,
> the numa bonus will be maximum.
> 
> However, this require a very well administration on specified workloads, 
> suppose in our
> cases if A & B are with a changing CPU requirement from 0% to 400%, then 
> binding to a
> single node would be a bad idea.
> 
> So what we need is a way to detect memory topology on cgroup level, and 
> try to migrate
> cpu/mem resources to the node with most of the caches there, as long as 
> the resource
> is plenty on that node.
> 
> This patch set introduced:
>   * advanced per-cgroup numa statistic
>   * numa preferred node feature
>   * Numa Balancer module
> 
> Which helps to achieve an easy and flexible numa resource assignment, to 
> gain numa bonus
> as much as possible.
> 
> Michael Wang (5):
>   numa: introduce per-cgroup numa balancing locality statistic
>   numa: append per-node execution info in memory.numa_stat
>   numa: introduce per-cgroup preferred numa node
>   numa: introduce numa balancer infrastructure
>   numa: numa balancer
> 
>  drivers/Makefile             |   1 +
>  drivers/numa/Makefile        |   1 +
>  drivers/numa/numa_balancer.c | 715 
> +++
>  include/linux/memcontrol.h   |  99 ++
>  include/linux/sched.h        |   9 +-
>  kernel/sched/debug.c         |   8 +
>  kernel/sched/fair.c          |  41 +++
>  mm/huge_memory.c             |   7 +-
>  mm/memcontrol.c              | 246 +++
>  mm/memory.c                  |  

Re: linux-next: build failure after merge of the imx-mxs tree

2019-04-22 Thread Shawn Guo
Hi Stephen,

On Tue, Apr 23, 2019 at 08:45:01AM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the imx-mxs tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> arch/arm/boot/dts/imx7d-zii-rpu2.dts:46.12-50.4: Warning 
> (io_channels_property): /iio-hwmon: Missing property '#io-channel-cells' in 
> node /soc/aips-bus@3040/adc@3061 or bad phandle (referred from 
> io-channels[0])
> 
> Caused by commit
> 
>   69ab5392f517 ("ARM: dts: Add support for ZII i.MX7 RPU2 board")

Fixed.  Thanks for reporting.

Shawn


Re: scripts/selinux build error in 4.14 after glibc update

2019-04-22 Thread Paul Moore
On Mon, Apr 22, 2019 at 5:00 PM Nathan Chancellor
 wrote:
> Hi all,
>
> After a glibc update to 2.29, my 4.14 builds started failing like so:

...

>   HOSTCC  scripts/selinux/genheaders/genheaders
> In file included from scripts/selinux/genheaders/genheaders.c:19:
> ./security/selinux/include/classmap.h:245:2: error: #error New address family 
> defined, please update secclass_map.
>  #error New address family defined, please update secclass_map.
>   ^

This is a known problem that has a fix in the selinux/next branch and
will be going up to Linus during the next merge window.  The fix is
quite small and should be relatively easy for you to backport to your
kernel build if you are interested; the patch can be found at the
archive link below:

https://lore.kernel.org/selinux/20190225005528.28371-1-pa...@paulo.ac

-- 
paul moore
www.paul-moore.com


Re: [PATCH v3 1/4] dt-bindings: iio: imx7d-adc: Add #io-channel-cells to required

2019-04-22 Thread Shawn Guo
On Sun, Apr 14, 2019 at 11:34:00AM -0700, Andrey Smirnov wrote:
> Add #io-channel-cells to list of required properties. Needed to be
> able to reference that node by phandle.
> 
> Signed-off-by: Andrey Smirnov 
> Cc: Shawn Guo 
> Cc: Chris Healy 
> Cc: Andrew Lunn 
> Cc: Fabio Estevam 
> Cc: Rob Herring 
> Cc: linux-kernel@vger.kernel.org
> Cc: devicet...@vger.kernel.org

Applied, thanks.


Re: [PATCH v3 2/4] ARM: dts: imx7s: Specify #io-channel-cells in ADC nodes

2019-04-22 Thread Shawn Guo
On Sun, Apr 14, 2019 at 11:34:01AM -0700, Andrey Smirnov wrote:
> Specify #io-channel-cells in ADC nodes. Needed to be able to reference
> them by phandle.
> 
> Signed-off-by: Andrey Smirnov 
> Cc: Shawn Guo 
> Cc: Chris Healy 
> Cc: Andrew Lunn 
> Cc: Fabio Estevam 
> Cc: Rob Herring 
> Cc: linux-kernel@vger.kernel.org
> Cc: devicet...@vger.kernel.org

Applied, thanks.


Re: [RFC PATCH v1 3/3] selftests/x86: Augment SGX selftest to test new __vdso_sgx_enter_enclave() and its callback interface

2019-04-22 Thread Sean Christopherson
On Mon, Apr 22, 2019 at 06:29:06PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 22, 2019 at 5:37 PM Cedric Xing  wrote:
> >
> > Given the changes to __vdso_sgx_enter_enclave(), the selftest is augmented 
> > to
> > test the newly added callback interface. This addtional test marks the whole
> > enclave range as PROT_READ, and calls mprotect() upon #PFs to add necessary 
> > PTE
> > permissions per PFEC (#PF Error Code) until the enclave finishes.
> 
> Nifty.
> 
> What's not tested here is running this code with EFLAGS.TF set and
> making sure that it unwinds correctly.  Also, Jarkko, unless I missed
> something, the vDSO extable code likely has a bug.  If you run the
> instruction right before ENCLU with EFLAGS.TF set, then do_debug()
> will eat the SIGTRAP and skip to the exception handler.  Similarly, if
> you put an instruction breakpoint on ENCLU, it'll get skipped.  Or is
> the code actually correct and am I just remembering wrong?

My money would be on the code being broken as opposed to you remembering
wrong.  I'll take a look at it tomorrow.


[PATCH] rcu/srcutree: make __call_srcu static

2019-04-22 Thread Jiang Biao
__call_srcu() is only used in current file, just make it static.

Signed-off-by: Jiang Biao 
---
 kernel/rcu/srcutree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index a60b8ba9e1ac..a2ade0c6cd87 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -835,7 +835,7 @@ static void srcu_leak_callback(struct rcu_head *rhp)
  * srcu_read_lock(), and srcu_read_unlock() that are all passed the same
  * srcu_struct structure.
  */
-void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
+static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
 rcu_callback_t func, bool do_norm)
 {
unsigned long flags;
-- 
2.17.2 (Apple Git-113)



Re: [RFC PATCH v1 3/3] selftests/x86: Augment SGX selftest to test new __vdso_sgx_enter_enclave() and its callback interface

2019-04-22 Thread Andy Lutomirski
On Mon, Apr 22, 2019 at 5:37 PM Cedric Xing  wrote:
>
> Given the changes to __vdso_sgx_enter_enclave(), the selftest is augmented to
> test the newly added callback interface. This addtional test marks the whole
> enclave range as PROT_READ, and calls mprotect() upon #PFs to add necessary 
> PTE
> permissions per PFEC (#PF Error Code) until the enclave finishes.

Nifty.

What's not tested here is running this code with EFLAGS.TF set and
making sure that it unwinds correctly.  Also, Jarkko, unless I missed
something, the vDSO extable code likely has a bug.  If you run the
instruction right before ENCLU with EFLAGS.TF set, then do_debug()
will eat the SIGTRAP and skip to the exception handler.  Similarly, if
you put an instruction breakpoint on ENCLU, it'll get skipped.  Or is
the code actually correct and am I just remembering wrong?

--Andy


Re: [RFC PATCH v1 2/3] x86/vdso: Modify __vdso_sgx_enter_enclave() to allow parameter passing on untrusted stack

2019-04-22 Thread Andy Lutomirski
On Mon, Apr 22, 2019 at 5:37 PM Cedric Xing  wrote:
>
> The previous __vdso_sgx_enter_enclave() requires enclaves to preserve %rsp,
> which prohibits enclaves from allocating and passing parameters for
> untrusted function calls (aka. o-calls).
>
> This patch addresses the problem above by introducing a new ABI that preserves
> %rbp instead of %rsp. Then __vdso_sgx_enter_enclave() can anchor its frame
> using %rbp so that enclaves are allowed to allocate space on the untrusted
> stack by decrementing %rsp. Please note that the stack space allocated in such
> way will be part of __vdso_sgx_enter_enclave()'s frame so will be freed after
> __vdso_sgx_enter_enclave() returns. Therefore, __vdso_sgx_enter_enclave() has
> been changed to take a callback function as an optional parameter, which if
> supplied, will be invoked upon enclave exits (both AEX (Asynchronous Enclave
> eXit) and normal exits), with the value of %rsp left
> off by the enclave as a parameter to the callback.
>
> Here's the summary of API/ABI changes in this patch. More details could be
> found in arch/x86/entry/vdso/vsgx_enter_enclave.S.
> * 'struct sgx_enclave_exception' is renamed to 'struct sgx_enclave_exinfo'
>   because it is filled upon both AEX (i.e. exceptions) and normal enclave
>   exits.
> * __vdso_sgx_enter_enclave() anchors its frame using %rbp (instead of %rsp in
>   the previous implementation).
> * __vdso_sgx_enter_enclave() takes one more parameter - a callback function to
>   be invoked upon enclave exits. This callback is optional, and if not
>   supplied, will cause __vdso_sgx_enter_enclave() to return upon enclave exits
>   (same behavior as previous implementation).
> * The callback function is given as a parameter the value of %rsp at enclave
>   exit to address data "pushed" by the enclave. A positive value returned by
>   the callback will be treated as an ENCLU leaf for re-entering the enclave,
>   while a zero or negative value will be passed through as the return
>   value of __vdso_sgx_enter_enclave() to its caller. It's also safe to
>   leave callback by longjmp() or by throwing a C++ exception.
>
> Signed-off-by: Cedric Xing 
> ---
>  arch/x86/entry/vdso/vsgx_enter_enclave.S | 156 ++-
>  arch/x86/include/uapi/asm/sgx.h  |  14 +-
>  2 files changed, 100 insertions(+), 70 deletions(-)
>
> diff --git a/arch/x86/entry/vdso/vsgx_enter_enclave.S 
> b/arch/x86/entry/vdso/vsgx_enter_enclave.S
> index fe0bf6671d6d..210f4366374a 100644
> --- a/arch/x86/entry/vdso/vsgx_enter_enclave.S
> +++ b/arch/x86/entry/vdso/vsgx_enter_enclave.S
> @@ -14,88 +14,118 @@
>  .code64
>  .section .text, "ax"
>
> -#ifdef SGX_KERNEL_DOC
>  /**
>   * __vdso_sgx_enter_enclave() - Enter an SGX enclave
>   *
>   * @leaf:  **IN \%eax** - ENCLU leaf, must be EENTER or ERESUME
> - * @tcs:   **IN \%rbx** - TCS, must be non-NULL
> - * @ex_info:   **IN \%rcx** - Optional 'struct sgx_enclave_exception' pointer
> + * @tcs:   **IN 0x08(\%rsp)** - TCS, must be non-NULL
> + * @ex_info:   **IN 0x10(\%rsp)** - Optional 'struct sgx_enclave_exinfo'
> + *  pointer
> + * @callback:  **IN 0x18(\%rsp)** - Optional callback function to be called 
> on
> + *  enclave exit or exception
>   *
>   * Return:
>   *  **OUT \%eax** -
> - *  %0 on a clean entry/exit to/from the enclave, %-EINVAL if ENCLU leaf is
> - *  not allowed or if TCS is NULL, %-EFAULT if ENCLU or the enclave faults
> + *  %0 on a clean entry/exit to/from the enclave, %-EINVAL if ENCLU leaf is 
> not
> + *  allowed, %-EFAULT if ENCLU or the enclave faults, or a non-positive value
> + *  returned from ``callback`` (if one is supplied).
>   *
>   * **Important!**  __vdso_sgx_enter_enclave() is **NOT** compliant with the
> - * x86-64 ABI, i.e. cannot be called from standard C code.   As noted above,
> - * input parameters must be passed via ``%eax``, ``%rbx`` and ``%rcx``, with
> - * the return value passed via ``%eax``.  All registers except ``%rsp`` must
> - * be treated as volatile from the caller's perspective, including but not
> - * limited to GPRs, EFLAGS.DF, MXCSR, FCW, etc...  Conversely, the enclave
> - * being run **must** preserve the untrusted ``%rsp`` and stack.
> + * x86-64 ABI, i.e. cannot be called from standard C code. As noted above,
> + * input parameters must be passed via ``%eax``, ``8(%rsp)``, ``0x10(%rsp)`` 
> and
> + * ``0x18(%rsp)``, with the return value passed via ``%eax``. All other 
> registers
> + * will be passed through to the enclave as is. All registers except ``%rbp``
> + * must be treated as volatile from the caller's perspective, including but 
> not
> + * limited to GPRs, EFLAGS.DF, MXCSR, FCW, etc... Conversely, the enclave 
> being
> + * run **must** preserve the untrusted ``%rbp``.
> + *
> + * ``callback`` has the following signature:
> + * int callback(long rdi, long rsi, long rdx,
> + * struct sgx_enclave_exinfo *ex_info, long r8, long r9,
> + *   

Re: [PATCH v2] binfmt_elf: Move brk out of mmap when doing direct loader exec

2019-04-22 Thread Guenter Roeck

On 4/22/19 3:57 PM, Kees Cook wrote:

Commit eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"),
made changes in the rare case when the ELF loader was directly invoked
(e.g to set a non-inheritable LD_LIBRARY_PATH, testing new versions of
the loader), by moving into the mmap region to avoid both ET_EXEC and PIE
binaries. This had the effect of also moving the brk region into mmap,
which could lead to the stack and brk being arbitrarily close to each
other. An unlucky process wouldn't get its requested stack size and stack
allocations could end up scribbling on the heap.

This is illustrated here. In the case of using the loader directly, brk
(so helpfully identified as "[heap]") is allocated with the _loader_
not the binary. For example, with ASLR entirely disabled, you can see
this more clearly:

$ /bin/cat /proc/self/maps
4000-c000 r-xp  ... /bin/cat
5575b000-5575c000 r--p 7000 ... /bin/cat
5575c000-5575d000 rw-p 8000 ... /bin/cat
5575d000-5577e000 rw-p  ... [heap]
...
77ff7000-77ffa000 r--p  ... [vvar]
77ffa000-77ffc000 r-xp  ... [vdso]
77ffc000-77ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so
77ffd000-77ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so
77ffe000-77fff000 rw-p  ...
7ffde000-7000 rw-p  ... [stack]

$ /lib/x86_64-linux-gnu/ld-2.27.so /bin/cat /proc/self/maps
...
77bcc000-77bd4000 r-xp  ... /bin/cat
77bd4000-77dd3000 ---p 8000 ... /bin/cat
77dd3000-77dd4000 r--p 7000 ... /bin/cat
77dd4000-77dd5000 rw-p 8000 ... /bin/cat
77dd5000-77dfc000 r-xp  ... /lib/x86_64-linux-gnu/ld-2.27.so
77fb2000-77fd6000 rw-p  ...
77ff7000-77ffa000 r--p  ... [vvar]
77ffa000-77ffc000 r-xp  ... [vdso]
77ffc000-77ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so
77ffd000-77ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so
77ffe000-7802 rw-p  ... [heap]
7ffde000-7000 rw-p  ... [stack]

The solution is to move brk out of mmap and into ELF_ET_DYN_BASE since
nothing is there in the direct loader case (and ET_EXEC is still far
away at 0x40). Anything that ran before should still work (i.e. the
ultimately-launched binary already had the brk very far from its text, so
this should be no different from a COMPAT_BRK standpoint). The only risk
I see here is that if someone started to suddenly depend on the entire
memory space lower than the mmap region being available when launching
binaries via a direct loader execs which seems highly unlikely, I'd hope:
this would mean a binary would _not_ work when exec()ed normally.

(Note that this is only done under CONFIG_ARCH_HAS_ELF_RANDOMIZATION when
randomization is turned on.)

Reported-by: Ali Saidi 
Link: 
https://lkml.kernel.org/r/CAGXu5jJ5sj3emOT2QPxQkNQk0qbU6zEfu9=omfhx_p0nckp...@mail.gmail.com
Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
Signed-off-by: Kees Cook 
---
v2: limit effect to only architectures that are expecting it! (Gunter)


This patch applies cleanly on top of next-20190418, and the crashes
observed with my xtensa boot tests no longer occur. I didn't test
any other architectures.

I don't know the base of Andrew's modification - the code snipped he says
isn't there anymore is still present in next-20190418.

Guenter


[PATCH] rcu/tree_exp: cleanup initialized but not used rdp

2019-04-22 Thread Jiang Biao
rdp is initialized but never used in synchronize_rcu_expedited(),
just remove it.

Signed-off-by: Jiang Biao 
---
 kernel/rcu/tree_exp.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 4c2a0189e748..5772612379e4 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -733,7 +733,6 @@ static void sync_sched_exp_online_cleanup(int cpu)
  */
 void synchronize_rcu_expedited(void)
 {
-   struct rcu_data *rdp;
struct rcu_exp_work rew;
struct rcu_node *rnp;
unsigned long s;
@@ -770,7 +769,6 @@ void synchronize_rcu_expedited(void)
}
 
/* Wait for expedited grace period to complete. */
-   rdp = per_cpu_ptr(_data, raw_smp_processor_id());
rnp = rcu_get_root();
wait_event(rnp->exp_wq[rcu_seq_ctr(s) & 0x3],
   sync_exp_work_done(s));
-- 
2.17.2 (Apple Git-113)



Re: [PATCH v2] clk: rockchip: undo several noc and special clocks as critical on rk3288

2019-04-22 Thread elaine.zhang

hi,

在 2019/4/22 下午11:23, Doug Anderson 写道:

Elaine,

On Fri, Apr 12, 2019 at 9:18 AM Douglas Anderson  wrote:

This is mostly a revert of commit 55bb6a633c33 ("clk: rockchip: mark
noc and some special clk as critical on rk3288") except that we're
keeping "pmu_hclk_otg0" as critical still.

NOTE: turning these clocks off doesn't seem to do a whole lot in terms
of power savings (checking the power on the logic rail).  It appears
to save maybe 1-2mW.  ...but still it seems like we should turn the
clocks off if they aren't needed.

About "pmu_hclk_otg0" (the one clock from the original commit we're
still keeping critical) from an email thread:


pmu ahb clock

Function: Clock to pmu module when hibernation and/or ADP is
enabled. Must be greater than or equal to 30 MHz.

If the SOC design does not support hibernation/ADP function, only have
hclk_otg, this clk can be switched according to the usage of otg.
If the SOC design support hibernation/ADP, has two clocks, hclk_otg and
pmu_hclk_otg0.
Hclk_otg belongs to the closed part of otg logic, which can be switched
according to the use of otg.

pmu_hclk_otg0 belongs to the always on part.

As for whether pmu_hclk_otg0 can be turned off when otg is not in use,
we have not tested. IC suggest make pmu_hclk_otg0 always on.

For the rest of the clocks:

atclk: No documentation about this clock other than that it goes to
the CPU.  CPU functions fine without it on.  Maybe needed for JTAG?

jtag: Presumably this clock is only needed if you're debugging with
JTAG.  It doesn't seem like it makes sense to waste power for every
rk3288 user.  In any case to do JTAG you'd need private patches to
adjust the pinctrl the mux the JTAG out anyway.

pclk_dbg, pclk_core_niu: On veyron Chromebooks we turn these two
clocks on only during kernel panics in order to access some coresight
registers.  Since nothing in the upstream kernel does this we should
be able to leave them off safely.  Maybe also needed for JTAG?

hsicphy12m_xin12m: There is no indication of why this clock would need
to be turned on for boards that don't use HSIC.

pclk_ddrupctl[0-1], pclk_publ0[0-1]: On veyron Chromebooks we turn
these 4 clocks on only when doing DDR transitions and they are off
otherwise.  I see no reason why they'd need to be on in the upstream
kernel which doesn't support DDRFreq.

Signed-off-by: Douglas Anderson 
---

Changes in v2:
- Now keep pmu_hclk_otg0 as critical.
- Updated description since this isn't a clean revert.
- PWM patches have landed, so just one patch in the series now.

  drivers/clk/rockchip/clk-rk3288.c | 13 -
  1 file changed, 4 insertions(+), 9 deletions(-)

>From previous discussions I think you're all happy with this patch
now, right?  Care to give it an official Reviewed-by tag?


Yes.

Reviewed-by: Elaine Zhang 



-Doug








Re: [PATCH v3] signal: trace_signal_deliver when signal_group_exit

2019-04-22 Thread kbuild test robot
Hi Zhenliang,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.1-rc6 next-20190418]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Zhenliang-Wei/signal-trace_signal_deliver-when-signal_group_exit/20190423-062107
config: parisc-allyesconfig (attached as .config)
compiler: hppa-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 


All errors (new ones prefixed by >>):

   In file included from arch/parisc/include/asm/signal.h:5:0,
from include/uapi/linux/signal.h:5,
from include/linux/signal_types.h:10,
from include/linux/sched.h:28,
from include/linux/sched/mm.h:7,
from kernel/signal.c:16:
   kernel/signal.c: In function 'get_signal':
>> arch/parisc/include/uapi/asm/signal.h:77:17: error: passing argument 3 of 
>> 'trace_signal_deliver' from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
#define SIG_DFL ((__sighandler_t)0) /* default signal handling */
^
   kernel/signal.c:2444:50: note: in expansion of macro 'SIG_DFL'
  trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, SIG_DFL);
 ^~~
   In file included from include/trace/syscall.h:5:0,
from include/linux/syscalls.h:86,
from kernel/signal.c:29:
   include/linux/tracepoint.h:235:21: note: expected 'struct k_sigaction *' but 
argument is of type 'void (*)(int)'
 static inline void trace_##name(proto)\
^
   include/linux/tracepoint.h:398:2: note: in expansion of macro 
'__DECLARE_TRACE'
 __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
 ^~~
   include/linux/tracepoint.h:534:2: note: in expansion of macro 'DECLARE_TRACE'
 DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
 ^
   include/trace/events/signal.h:96:1: note: in expansion of macro 'TRACE_EVENT'
TRACE_EVENT(signal_deliver,
^~~
   cc1: some warnings being treated as errors

vim +/trace_signal_deliver +77 arch/parisc/include/uapi/asm/signal.h

70c1674f6 David Howells 2012-10-16  76  
70c1674f6 David Howells 2012-10-16 @77  #define SIG_DFL ((__sighandler_t)0) 
/* default signal handling */
70c1674f6 David Howells 2012-10-16  78  #define SIG_IGN ((__sighandler_t)1) 
/* ignore signal */
70c1674f6 David Howells 2012-10-16  79  #define SIG_ERR ((__sighandler_t)-1)
/* error return from signal */
70c1674f6 David Howells 2012-10-16  80  

:: The code at line 77 was first introduced by commit
:: 70c1674f62026e455c0c821fb7f4baf24d2d1139 UAPI: (Scripted) Disintegrate 
arch/parisc/include/asm

:: TO: David Howells 
:: CC: David Howells 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: clk/clk-next boot bisection: v5.1-rc1-142-ga55b079c961b on panda

2019-04-22 Thread Stephen Boyd
Quoting kernelci.org bot (2019-04-22 17:16:44)
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> clk/clk-next boot bisection: v5.1-rc1-142-ga55b079c961b on panda
> 
> Summary:
>   Start:  a55b079c961b Merge branch 'clk-hisi' into clk-next
>   Details:https://kernelci.org/boot/id/5cbe3cdb59b514fd22fe6025
>   Plain log:  
> https://storage.kernelci.org//clk/clk-next/v5.1-rc1-142-ga55b079c961b/arm/omap2plus_defconfig/gcc-7/lab-baylibre/boot-omap4-panda.txt
>   HTML log:   
> https://storage.kernelci.org//clk/clk-next/v5.1-rc1-142-ga55b079c961b/arm/omap2plus_defconfig/gcc-7/lab-baylibre/boot-omap4-panda.html
>   Result: ecbf3f1795fd clk: fixed-factor: Let clk framework find parent
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   clk
>   URL:https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git
>   Branch: clk-next
>   Target: panda
>   CPU arch:   arm
>   Lab:lab-baylibre
>   Compiler:   gcc-7
>   Config: omap2plus_defconfig
>   Test suite: boot
> 
> Breaking commit found:

Awesome! I LOVE IT!!!

> 
> diff --git a/drivers/clk/clk-fixed-factor.c b/drivers/clk/clk-fixed-factor.c
> index 241b3f8c61a9..5b09f2cdb7de 100644
> --- a/drivers/clk/clk-fixed-factor.c
> +++ b/drivers/clk/clk-fixed-factor.c
> @@ -64,12 +64,14 @@ const struct clk_ops clk_fixed_factor_ops = {
>  };
>  EXPORT_SYMBOL_GPL(clk_fixed_factor_ops);
>  
> -struct clk_hw *clk_hw_register_fixed_factor(struct device *dev,
> -   const char *name, const char *parent_name, unsigned long 
> flags,
> -   unsigned int mult, unsigned int div)
> +static struct clk_hw *
> +__clk_hw_register_fixed_factor(struct device *dev, struct device_node *np,
> +   const char *name, const char *parent_name, int index,
> +   unsigned long flags, unsigned int mult, unsigned int div)
>  {
> struct clk_fixed_factor *fix;
> struct clk_init_data init;
> +   struct clk_parent_data pdata = { .index = index };
> struct clk_hw *hw;
> int ret;
>  
> @@ -85,11 +87,17 @@ struct clk_hw *clk_hw_register_fixed_factor(struct device 
> *dev,
> init.name = name;
> init.ops = _fixed_factor_ops;
> init.flags = flags | CLK_IS_BASIC;
> -   init.parent_names = _name;
> +   if (parent_name)
> +   init.parent_names = _name;
> +   else
> +   init.parent_data = 

Ick. I realized that 'init.parent_names' here can be full of junk! Let's
initialize it properly. Maybe that makes this all better?

8<
diff --git a/drivers/clk/clk-fixed-factor.c b/drivers/clk/clk-fixed-factor.c
index 5b09f2cdb7de..2d988a7585d5 100644
--- a/drivers/clk/clk-fixed-factor.c
+++ b/drivers/clk/clk-fixed-factor.c
@@ -70,7 +70,7 @@ __clk_hw_register_fixed_factor(struct device *dev, struct 
device_node *np,
unsigned long flags, unsigned int mult, unsigned int div)
 {
struct clk_fixed_factor *fix;
-   struct clk_init_data init;
+   struct clk_init_data init = { };
struct clk_parent_data pdata = { .index = index };
struct clk_hw *hw;
int ret;


Re: [PATCH] x86_64: uninline TASK_SIZE

2019-04-22 Thread Andy Lutomirski
On Mon, Apr 22, 2019 at 3:09 PM Alexey Dobriyan  wrote:
>
> On Mon, Apr 22, 2019 at 07:30:40AM -0700, Andy Lutomirski wrote:
> >
> >
> > > On Apr 22, 2019, at 3:34 AM, Ingo Molnar  wrote:
> > >
> > >
> > > * Alexey Dobriyan  wrote:
> > >
> > > +++ b/arch/x86/kernel/task_size_64.c
> > > @@ -0,0 +1,9 @@
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +unsigned long _task_size(void)
> > > +{
> > > +return test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET :
> >  TASK_SIZE_MAX;
> > > +}
> > > +EXPORT_SYMBOL(_task_size);
> > 
> >  Good idea - but instead of adding yet another compilation unit, why not
> > 
> >  stick _task_size() into arch/x86/kernel/process_64.c, which is the
> >  canonical place for process management related arch functions?
> > 
> >  Thanks,
> > 
> > Ingo
> > >>>
> > >>> Better yet... since TIF_ADDR32 isn't something that changes randomly,
> > >>> perhaps this should be a separate variable?
> > >>
> > >> Maybe. I only thought about putting every 32-bit related flag under
> > >> CONFIG_COMPAT to further eradicate bloat (and force everyone else to
> > >> keep an eye on it, ha-ha).
> > >
> > > Basically TIF_ADDR32 is only set for a task if set_personality_ia32() is
> > > called, which function is called in the following circumstances:
> > >
> > > - arch/x86/ia32/ia32_aout.c:load_aout_binary()
> > >
> > >   This is in exec(), when a new binary is loaded for the current task,
> > >   via search_binary_handler() and exec_binprm(). Ordering is
> > >   synchronous, AFAICS there can be no race between TASK_SIZE users and
> > >   the set_personality_ia32() call which is always for the current task.
> > >
> > > - in COMPAT_SET_PERSONALITY(), which through macro detours ends up being
> > >   in SET_PERSONALITY2(), which is used in fs/compat_binfmt_elf.c's
> > >   load_elf_binary(), used in a similar fashion in exec() as the AOUT
> > >   case above. One particular macro detour of note is that
> > >   fs/compat_binfmt_elf.c #includes fs/binfmt_elf.c and re-defines the
> > >   personality setting method to map to set_personality_ia32().
> > >
> > > When set_personality_ia32() is called then TIF_ADDR32 is set
> > > unconditionally, without any Kconfig variations.
> > >
> > > TIF_ADDR32 is cleared:
> > >
> > > - In set_personality_64bit(), when a 64-bit binary is loaded via
> > >   fs/binfmt_elf.c.
> > >
> > > - It also defaults to clear in the init task, which is inherited by the
> > >   initial kernel threads and any user-space task they might end up
> > >   executing.
> > >
> > > So the conclusion is that IMO we can safely put TASK_SIZE into a new
> > > thread_info()->task_size field, and:
> > >
> > > - change ->task_size to the 32-bit address space in
> > >   set_personality_ia32()
> > >
> > > - change ->task_size to teh 64-bit address space in the init task and in
> > >   set_personality_64bit().
> > >
> > > This should cover it I think, unless I missed something.
> > >
> >
> > Are there really enough TASK_SIZE users to justify any of this?
>
> Saving 2KB on a defconfig is quite a lot.

Saving 2kB of text by adding 8 bytes to thread_info seems rather
dubious to me.  You only need 256 tasks before you lose.  My
not-particularly-loaded laptop has 865 tasks right now.

As a general principle, the mere existence of TIF_ADDR32 is a bug.
The value of that flag is *wrong* under the 32-bit variant of CRIU.
How about instead making some more progress toward getting rid of
dubious TASK_SIZE users?  I'm working on a little series to get rid of
most of them.  Meanwhile: it sure looks like a large fraction of the
users are confused as to whether TASK_SIZE is the highest user address
or the lowest non-user address.


Re: [PATCH v1] clk: Probe defer clk_get() on orphans

2019-04-22 Thread Stephen Boyd
Quoting Jeffrey Hugo (2019-02-11 10:57:47)
> If a parent to a clock comes from outside that clock's provider, the parent
> may not be present at the time the clock is registered (ie the parent comes
> from another driver that has not yet probed).  The clock can still be
> registered, and a reference to it obtained, however that clock may not be
> fully functional - ie get_rate might return an invalid value.
> 
> This has been a problem that has resulted in the UART console breaking on
> some Qualcomm SoCs, as the UART baud rate is based on a clock that is the
> child of XO.  Due to the large chain of dependencies, its possible that the
> RPM has not provided XO by the time that the UART driver probes, gets the
> baud rate clock, and calls get_rate - which returns 0 and results in a bad
> configuration.
> 
> An orphan clock is a clock that is missing a parent or some other ancestor.
> Since the parent is defined, we can assume that it is expected to appear at
> some point in a properly configured system (all bets are off if a required
> driver is not compiled, etc), and it is unlikely that the clock can be
> properly consumed during the time the clock is an orphan.  Therefore,
> return EPROBE_DEFER for orphan clocks so that consumers wait until the
> parent chain is established, and proper clock operation can occur.
> 
> Signed-off-by: Jeffrey Hugo 
> ---
> 
> This is based upon the "Rewrite clk parent handling" series at [1], and 
> assumes
> that the suspected missing line commented on at [2] is added.
> 
> The idea for this solution came from [3] and [4].
> 
> [1] https://lore.kernel.org/lkml/20190129061021.94775-1-sb...@kernel.org/T/#u
> [2] https://lkml.org/lkml/2019/2/11/1634
> [3] https://lkml.org/lkml/2019/2/6/382
> [4] https://lkml.org/lkml/2015/12/27/209

There have been multiple attempts over the years to support probe defer
for clks that don't have parents. If you search the kernel mailing list
archives I'm sure you'll come across them (for example
https://patchwork.kernel.org/patch/6313051/). That's why we have the
first part of the code to indicate if a clk is an orphan or not, i.e.
commit e6500344edbb ("clk: track the orphan status of clocks and their
children"), but not this patch that you've sent.

There are a couple requirements that we need to make sure we don't break
first.

 1. clk_get() should work for clks on the orphan list if that clk is
 parented to something that will never be registered with the framework

 2. We need a way for drivers to express that the parent of a clk
 won't exist

 3. Critical clks need to turn clks on even if they'll never get
 parents registered

We've had problems in the past
(http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/343007.html)
where bootloaders configure clks in certain ways that the kernel doesn't
care to even consider as possible. In these cases we either need to let
clk_set_rate() reparent them when consumers are ready or we need to
convert drivers that are forcing on clks early to use the
CLK_IS_CRITICAL flag to turn on clks even if they're never going to find
their parents.

Last time we tried to do this in 2015 (wow so many years ago!) we were
blocked on not having the critical clks infrastructure and the legacy
sunxi clk driver needed to convert to DT and critical clks flag to keep
working. I think we could have done it a year or two ago, because sunxi
moved to a new design, but then we got more use-cases where clks may
never get the parent they're currently configured for in the bootloader
and then the kernel would never hand out the clk to consumers and the
clk_set_rate() case would fail.

To fix that last part, I'm proposing we introduce the .get_parent_hw()
op and then rely on drivers to tell the framework that the parent is
there either with a direct pointer reference or by knowing that the
DT/firmware is telling us the parent is valid. If we just rely on string
names and a u8 to indicate parents then we don't have enough information
to figure out how the parent is provided and if it will ever appear at
some point in the future. Once we have a way to describe this through
DT/firmware then we're able to indicate the clk is an orphan when that's
actually the case vs. when the clk is configured in hardware for
something that we won't know about. You can see this work in the
clk-parent-rewrite series in clk.git.

There's also one more problem, which is what we do with clks that we've
handed out to consumers and then the driver for the parent of that clk
is removed and the parent is unregistered. Right now, we move these clks
to the orphan list and set the clk_nodrv_ops on the parent that's
unregistered. We probably need to set clk_nodrv_ops on all the children
that get orphaned, and remove the cached clk_core pointer in all the
clk_core::parents members (even ones that aren't currently using it!),
and stash away the original clk_ops so we can restore them later when
the clk is properly reparented if the parent comes back. 

linux-next: manual merge of the v4l-dvb-next tree with the v4l-dvb tree

2019-04-22 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the v4l-dvb-next tree got a conflict in:

  drivers/media/platform/Kconfig

between commit:

  63604a143fe1 ("media: seco-cec: fix building with RC_CORE=m")

from the v4l-dvb tree and commit:

  81527254e151 ("media: seco: depend on CONFIG_RC_CORE=y when not a module")

from the v4l-dvb-next tree.

I fixed it up (I just used the v4l-dvb tree version) and can carry the
fix as necessary. This is now fixed as far as linux-next is concerned,
but any non trivial conflicts should be mentioned to your upstream
maintainer when your tree is submitted for merging.  You may also want
to consider cooperating with the maintainer of the conflicting tree to
minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgp4UsRqGih2T.pgp
Description: OpenPGP digital signature


[RFC PATCH v1 1/3] selftests/x86: Fixed Makefile for SGX selftest

2019-04-22 Thread Cedric Xing
The original x86/sgx/Makefile doesn't work when 'x86/sgx' is specified as the
test target. This patch fixes that problem, along with minor changes to the
dependencies between 'x86' and 'x86/sgx' in selftests/x86/Makefile.

Signed-off-by: Cedric Xing 
---
 tools/testing/selftests/x86/Makefile | 12 +++
 tools/testing/selftests/x86/sgx/Makefile | 45 +---
 2 files changed, 22 insertions(+), 35 deletions(-)

diff --git a/tools/testing/selftests/x86/Makefile 
b/tools/testing/selftests/x86/Makefile
index 4fc9a42f56ea..1294c5f5b6ca 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -70,11 +70,11 @@ all_32: $(BINARIES_32)
 
 all_64: $(BINARIES_64)
 
-all_64: $(SUBDIRS_64)
-   @for DIR in $(SUBDIRS_64); do   \
-   BUILD_TARGET=$(OUTPUT)/$$DIR;   \
-   mkdir $$BUILD_TARGET  -p;   \
-   make OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
+all_64: | $(SUBDIRS_64)
+   @for DIR in $|; do  \
+   BUILD_TARGET=$(OUTPUT)/$$DIR;   \
+   mkdir $$BUILD_TARGET  -p;   \
+   $(MAKE) OUTPUT=$$BUILD_TARGET -C $$DIR $@;  \
done
 
 EXTRA_CLEAN := $(BINARIES_32) $(BINARIES_64)
@@ -90,7 +90,7 @@ ifeq ($(CAN_BUILD_I386)$(CAN_BUILD_X86_64),01)
 all: warn_32bit_failure
 
 warn_32bit_failure:
-   @echo "Warning: you seem to have a broken 32-bit build" 2>&1;   \
+   @echo "Warning: you seem to have a broken 32-bit build" 2>&1;   \
echo "environment.  This will reduce test coverage of 64-bit" 2>&1; \
echo "kernels.  If you are using a Debian-like distribution," 2>&1; \
echo "try:"; 2>&1; \
diff --git a/tools/testing/selftests/x86/sgx/Makefile 
b/tools/testing/selftests/x86/sgx/Makefile
index 1fd6f2708e81..3af15d7c8644 100644
--- a/tools/testing/selftests/x86/sgx/Makefile
+++ b/tools/testing/selftests/x86/sgx/Makefile
@@ -2,47 +2,34 @@ top_srcdir = ../../../../..
 
 include ../../lib.mk
 
-HOST_CFLAGS := -Wall -Werror -g $(INCLUDES) -fPIC
-ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
+ifeq ($(shell $(CC) -dumpmachine | cut --delimiter=- -f1),x86_64)
+all: all_64
+endif
+
+HOST_CFLAGS := -Wall -Werror -g $(INCLUDES)
+ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIE \
   -fno-stack-protector -mrdrnd $(INCLUDES)
 
 TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
 all_64: $(TEST_CUSTOM_PROGS)
 
-$(TEST_CUSTOM_PROGS): $(OUTPUT)/main.o $(OUTPUT)/sgx_call.o \
- $(OUTPUT)/encl_piggy.o
+$(TEST_CUSTOM_PROGS): main.c sgx_call.S $(OUTPUT)/encl_piggy.o
$(CC) $(HOST_CFLAGS) -o $@ $^
 
-$(OUTPUT)/main.o: main.c
-   $(CC) $(HOST_CFLAGS) -c $< -o $@
+$(OUTPUT)/encl_piggy.o: encl_piggy.S $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss
+   $(CC) $(HOST_CFLAGS) -I$(OUTPUT) -c $< -o $@
 
-$(OUTPUT)/sgx_call.o: sgx_call.S
-   $(CC) $(HOST_CFLAGS) -c $< -o $@
-
-$(OUTPUT)/encl_piggy.o: $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss
-   $(CC) $(HOST_CFLAGS) -c encl_piggy.S -o $@
-
-$(OUTPUT)/encl.bin: $(OUTPUT)/encl.elf $(OUTPUT)/sgxsign
+$(OUTPUT)/encl.bin: $(OUTPUT)/encl.elf
objcopy --remove-section=.got.plt -O binary $< $@
 
-$(OUTPUT)/encl.elf: $(OUTPUT)/encl.o $(OUTPUT)/encl_bootstrap.o
-   $(CC) $(ENCL_CFLAGS) -T encl.lds -o $@ $^
+$(OUTPUT)/encl.elf: encl.lds encl.c encl_bootstrap.S
+   $(CC) $(ENCL_CFLAGS) -T $^ -o $@
 
-$(OUTPUT)/encl.o: encl.c
-   $(CC) $(ENCL_CFLAGS) -c $< -o $@
-
-$(OUTPUT)/encl_bootstrap.o: encl_bootstrap.S
-   $(CC) $(ENCL_CFLAGS) -c $< -o $@
-
-$(OUTPUT)/encl.ss: $(OUTPUT)/encl.bin  $(OUTPUT)/sgxsign
-   $(OUTPUT)/sgxsign signing_key.pem $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss
+$(OUTPUT)/encl.ss: $(OUTPUT)/sgxsign signing_key.pem $(OUTPUT)/encl.bin
+   $^ $@
 
 $(OUTPUT)/sgxsign: sgxsign.c
$(CC) -o $@ $< -lcrypto
 
-EXTRA_CLEAN := $(OUTPUT)/sgx-selftest $(OUTPUT)/sgx-selftest.o \
-  $(OUTPUT)/sgx_call.o $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss \
-  $(OUTPUT)/encl.elf $(OUTPUT)/encl.o $(OUTPUT)/encl_bootstrap.o \
-  $(OUTPUT)/sgxsign
-
-.PHONY: clean
+EXTRA_CLEAN := $(TEST_CUSTOM_PROGS) $(addprefix $(OUTPUT)/,\
+   encl.elf encl.bin encl.ss encl_piggy.o sgxsign)
-- 
2.17.1



[RFC PATCH v1 0/3] An alternative __vdso_sgx_enter_enclave() to allow enclave/host parameter passing using untrusted stack

2019-04-22 Thread Cedric Xing
The current proposed __vdso_sgx_enter_enclave() requires enclaves to preserve
%rsp, which prohibits enclaves from allocating space on the untrusted stack.
However, there are existing enclaves (e.g. those built with current Intel SGX
SDK libraries) relying on the untrusted stack for passing parameters to
untrusted functions (aka. o-calls), which requires allocating space on the
untrusted stack by enclaves. And given its simplicity and convenience, it could
be desired by future SGX applications as well.

This patchset introduces a new ABI for __vdso_sgx_enter_enclave() to anchor its
stack frame on %rbp (instead of %rsp), so as to allow enclaves to "push" onto
the untrusted stack by decrementing the untrusted %rsp. Additionally, this new
__vdso_sgx_enter_enclave() will take one more parameter - a callback function,
to be invoked upon all enclave exits (both AEX and normal exits). The
callback function will be given the value of %rsp left off by the enclave,
so that data "pushed" by the enclave (if any) could be addressed/accessed.
Please note that the callback function is optional, and if not supplied
(i.e. null), __vdso_sgx_enter_enclave() will just return (i.e. behave the
same as the current implementation) after the enclave exits (or AEX
due to exceptions).

The SGX selftest is augmented to test out the new callback interface, and to
serve as a simple example to showcase how to use the callback interface in
practice.

Reference:
* This patchset is based upon SGX1 patch v20
  (https://lkml.org/lkml/2019/4/17/344) by Jarkko Sakkinen

Cedric Xing (3):
  selftests/x86: Fixed Makefile for SGX selftest
  x86/vdso: Modify __vdso_sgx_enter_enclave() to allow parameter passing
on untrusted stack
  selftests/x86: Augment SGX selftest to test new
__vdso_sgx_enter_enclave() and its callback interface

 arch/x86/entry/vdso/vsgx_enter_enclave.S   | 156 -
 arch/x86/include/uapi/asm/sgx.h|  14 +-
 tools/testing/selftests/x86/Makefile   |  12 +-
 tools/testing/selftests/x86/sgx/Makefile   |  45 +++---
 tools/testing/selftests/x86/sgx/main.c | 123 +---
 tools/testing/selftests/x86/sgx/sgx_call.S |  40 +-
 6 files changed, 264 insertions(+), 126 deletions(-)

-- 
2.17.1



[RFC PATCH v1 3/3] selftests/x86: Augment SGX selftest to test new __vdso_sgx_enter_enclave() and its callback interface

2019-04-22 Thread Cedric Xing
Given the changes to __vdso_sgx_enter_enclave(), the selftest is augmented to
test the newly added callback interface. This addtional test marks the whole
enclave range as PROT_READ, and calls mprotect() upon #PFs to add necessary PTE
permissions per PFEC (#PF Error Code) until the enclave finishes.

Signed-off-by: Cedric Xing 
---
 tools/testing/selftests/x86/sgx/main.c | 123 ++---
 tools/testing/selftests/x86/sgx/sgx_call.S |  40 ++-
 2 files changed, 142 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/x86/sgx/main.c 
b/tools/testing/selftests/x86/sgx/main.c
index e2265f841fb0..234cfbad14a5 100644
--- a/tools/testing/selftests/x86/sgx/main.c
+++ b/tools/testing/selftests/x86/sgx/main.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -18,6 +19,10 @@
 #include "../../../../../arch/x86/kernel/cpu/sgx/arch.h"
 #include "../../../../../arch/x86/include/uapi/asm/sgx.h"
 
+#define _Q(x)  __Q(x)
+#define __Q(x) #x
+#define ERRLN  "Line " _Q(__LINE__)
+
 static const uint64_t MAGIC = 0x1122334455667788ULL;
 
 struct vdso_symtab {
@@ -138,7 +143,7 @@ static bool encl_create(int dev_fd, unsigned long bin_size,
base = mmap(NULL, secs->size, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_SHARED, dev_fd, 0);
if (base == MAP_FAILED) {
-   perror("mmap");
+   perror(ERRLN);
return false;
}
 
@@ -224,24 +229,113 @@ static bool encl_load(struct sgx_secs *secs, unsigned 
long bin_size)
return false;
 }
 
-void sgx_call(void *rdi, void *rsi, void *tcs,
- struct sgx_enclave_exception *exception,
- void *eenter);
+int sgx_call(void *rdi, void *rsi, long rdx, void *rcx, void *r8, void *r9,
+void *tcs, struct sgx_enclave_exinfo *ei, void *cb, void *eenter);
+
+static void show_enclave_exinfo(const struct sgx_enclave_exinfo *exinfop,
+   const char *header)
+{
+   printf("%s: leaf:%d", header, exinfop->leaf);
+   if (exinfop->leaf != 4)
+   printf(" trap#:%d ec:%d addr:0x%llx\n", exinfop->trapnr,
+   exinfop->error_code, exinfop->address);
+   else printf("\n");
+}
+
+static void test1(void *eenter, struct sgx_secs *secs)
+{
+   uint64_t result = 0;
+   struct sgx_enclave_exinfo exinfo;
+
+   printf("[1] Entering the enclave without callback.\n");
+
+   printf("Input: 0x%lx\n Expect: Same as input\n", MAGIC);
+   sgx_call((void *), , 0, NULL, NULL, NULL,
+(void *)secs->base, , NULL, eenter);
+   if (result != MAGIC) {
+   fprintf(stderr, "0x%lx != 0x%lx\n", result, MAGIC);
+   exit(1);
+   }
+   printf(" Output: 0x%lx\n", result);
+
+   printf("Input: Null TCS\n Expect: #PF at EENTER\n");
+   sgx_call((void *), , 0, NULL, NULL, NULL,
+NULL, , NULL, eenter);
+   show_enclave_exinfo(, " Exit");
+   if (exinfo.leaf != 2 /*EENTER*/ || exinfo.trapnr != 14 /*#PF*/)
+   exit(1);
+}
+
+static int enclave_ex_callback(long rdi, long rsi, long rdx,
+   struct sgx_enclave_exinfo *ei, long r8, long r9, void *tcs, long ursp)
+{
+   show_enclave_exinfo(ei, "  callback");
+
+   switch (ei->leaf)
+   {
+   case 4:
+   return 0;
+   case 3:
+   case 2:
+   if (ei->trapnr != 14 /*#PF*/ || (ei->error_code & 1) == 0) {
+   fprintf(stderr, ERRLN ": Unexpected exception\n");
+   exit(1);
+   }
+
+   if (mprotect((void*)(ei->address & -0x1000), 0x1000,
+((ei->error_code & 2) ? PROT_WRITE : 0) |
+((ei->error_code & 0x10) ? PROT_EXEC : 0) |
+PROT_READ)) {
+   perror(ERRLN);
+   exit(1);
+   }
+
+   return ei->leaf == 2 ? -EAGAIN : ei->leaf;
+   }
+   return -EINVAL;
+}
+
+static void test2(void *eenter, struct sgx_secs *secs)
+{
+   uint64_t result = 0;
+   struct sgx_enclave_exinfo exinfo;
+
+   printf("[2] Entering the enclave with callback.\n");
+
+   printf("Input: 0x%lx\n Expect: Same as input\n", MAGIC);
+   sgx_call((void *), , 0, NULL, NULL, NULL,
+(void *)secs->base, , enclave_ex_callback, eenter);
+   if (result != MAGIC) {
+   fprintf(stderr, "0x%lx != 0x%lx\n", result, MAGIC);
+   exit(1);
+   }
+   printf(" Output: 0x%lx\n", result);
+
+   printf("Input: Read-only enclave (0x%lx-0x%lx)\n"
+  " Expect: #PFs to be fixed by callback\n",
+  secs->base, secs->base + (encl_bin_end - encl_bin) - 1);
+   if (mprotect((void*)secs->base, encl_bin_end - encl_bin, PROT_READ)) {
+   perror(ERRLN);
+   exit(1);
+   }
+   while (sgx_call((void *), , 0, NULL, 

[RFC PATCH v1 2/3] x86/vdso: Modify __vdso_sgx_enter_enclave() to allow parameter passing on untrusted stack

2019-04-22 Thread Cedric Xing
The previous __vdso_sgx_enter_enclave() requires enclaves to preserve %rsp,
which prohibits enclaves from allocating and passing parameters for
untrusted function calls (aka. o-calls).

This patch addresses the problem above by introducing a new ABI that preserves
%rbp instead of %rsp. Then __vdso_sgx_enter_enclave() can anchor its frame
using %rbp so that enclaves are allowed to allocate space on the untrusted
stack by decrementing %rsp. Please note that the stack space allocated in such
way will be part of __vdso_sgx_enter_enclave()'s frame so will be freed after
__vdso_sgx_enter_enclave() returns. Therefore, __vdso_sgx_enter_enclave() has
been changed to take a callback function as an optional parameter, which if
supplied, will be invoked upon enclave exits (both AEX (Asynchronous Enclave
eXit) and normal exits), with the value of %rsp left
off by the enclave as a parameter to the callback.

Here's the summary of API/ABI changes in this patch. More details could be
found in arch/x86/entry/vdso/vsgx_enter_enclave.S.
* 'struct sgx_enclave_exception' is renamed to 'struct sgx_enclave_exinfo'
  because it is filled upon both AEX (i.e. exceptions) and normal enclave
  exits.
* __vdso_sgx_enter_enclave() anchors its frame using %rbp (instead of %rsp in
  the previous implementation).
* __vdso_sgx_enter_enclave() takes one more parameter - a callback function to
  be invoked upon enclave exits. This callback is optional, and if not
  supplied, will cause __vdso_sgx_enter_enclave() to return upon enclave exits
  (same behavior as previous implementation).
* The callback function is given as a parameter the value of %rsp at enclave
  exit to address data "pushed" by the enclave. A positive value returned by
  the callback will be treated as an ENCLU leaf for re-entering the enclave,
  while a zero or negative value will be passed through as the return
  value of __vdso_sgx_enter_enclave() to its caller. It's also safe to
  leave callback by longjmp() or by throwing a C++ exception.

Signed-off-by: Cedric Xing 
---
 arch/x86/entry/vdso/vsgx_enter_enclave.S | 156 ++-
 arch/x86/include/uapi/asm/sgx.h  |  14 +-
 2 files changed, 100 insertions(+), 70 deletions(-)

diff --git a/arch/x86/entry/vdso/vsgx_enter_enclave.S 
b/arch/x86/entry/vdso/vsgx_enter_enclave.S
index fe0bf6671d6d..210f4366374a 100644
--- a/arch/x86/entry/vdso/vsgx_enter_enclave.S
+++ b/arch/x86/entry/vdso/vsgx_enter_enclave.S
@@ -14,88 +14,118 @@
 .code64
 .section .text, "ax"
 
-#ifdef SGX_KERNEL_DOC
 /**
  * __vdso_sgx_enter_enclave() - Enter an SGX enclave
  *
  * @leaf:  **IN \%eax** - ENCLU leaf, must be EENTER or ERESUME
- * @tcs:   **IN \%rbx** - TCS, must be non-NULL
- * @ex_info:   **IN \%rcx** - Optional 'struct sgx_enclave_exception' pointer
+ * @tcs:   **IN 0x08(\%rsp)** - TCS, must be non-NULL
+ * @ex_info:   **IN 0x10(\%rsp)** - Optional 'struct sgx_enclave_exinfo'
+ *  pointer
+ * @callback:  **IN 0x18(\%rsp)** - Optional callback function to be called on
+ *  enclave exit or exception
  *
  * Return:
  *  **OUT \%eax** -
- *  %0 on a clean entry/exit to/from the enclave, %-EINVAL if ENCLU leaf is
- *  not allowed or if TCS is NULL, %-EFAULT if ENCLU or the enclave faults
+ *  %0 on a clean entry/exit to/from the enclave, %-EINVAL if ENCLU leaf is not
+ *  allowed, %-EFAULT if ENCLU or the enclave faults, or a non-positive value
+ *  returned from ``callback`` (if one is supplied).
  *
  * **Important!**  __vdso_sgx_enter_enclave() is **NOT** compliant with the
- * x86-64 ABI, i.e. cannot be called from standard C code.   As noted above,
- * input parameters must be passed via ``%eax``, ``%rbx`` and ``%rcx``, with
- * the return value passed via ``%eax``.  All registers except ``%rsp`` must
- * be treated as volatile from the caller's perspective, including but not
- * limited to GPRs, EFLAGS.DF, MXCSR, FCW, etc...  Conversely, the enclave
- * being run **must** preserve the untrusted ``%rsp`` and stack.
+ * x86-64 ABI, i.e. cannot be called from standard C code. As noted above,
+ * input parameters must be passed via ``%eax``, ``8(%rsp)``, ``0x10(%rsp)`` 
and
+ * ``0x18(%rsp)``, with the return value passed via ``%eax``. All other 
registers
+ * will be passed through to the enclave as is. All registers except ``%rbp``
+ * must be treated as volatile from the caller's perspective, including but not
+ * limited to GPRs, EFLAGS.DF, MXCSR, FCW, etc... Conversely, the enclave being
+ * run **must** preserve the untrusted ``%rbp``.
+ *
+ * ``callback`` has the following signature:
+ * int callback(long rdi, long rsi, long rdx,
+ * struct sgx_enclave_exinfo *ex_info, long r8, long r9,
+ * void *tcs, long ursp);
+ * ``callback`` **shall** follow x86_64 ABI. All GPRs **except** ``%rax``, 
``%rbx``
+ * and ``rcx`` are passed through to ``callback``. ``%rdi``, ``%rsi``, 
``%rdx``,
+ * ``%r8``, ``%r9``, along with the 

Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation

2019-04-22 Thread Alex Williamson
On Mon, 22 Apr 2019 19:05:57 -0500
Alex G  wrote:

> On 4/22/19 5:43 PM, Alex Williamson wrote:
> > [  329.725607] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5
> > GT/s x16 link)
> > [  708.151488] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5
> > GT/s x16 link)
> > [  718.262959] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5
> > GT/s x16 link)
> > [ 1138.124932] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5
> > GT/s x16 link)
> > 
> > What is the value of this nagging?  
> 
> Good! The bandwidth notification service is working as intended. If this 
> bothers you, you can unbind the device from the bandwidth notification 
> driver:
> 
> echo :07:00.0:pcie010 |
> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind

That's a bad solution for users, this is meaningless tracking of a
device whose driver is actively managing the link bandwidth for power
purposes.  There is nothing wrong happening here that needs to fill
logs.  I thought maybe if I enabled notification of autonomous
bandwidth changes that it might categorize these as something we could
ignore, but it doesn't.  How can we identify only cases where this is
an erroneous/noteworthy situation?  Thanks,

Alex

> > diff --git a/drivers/pci/pcie/portdrv_core.c 
> > b/drivers/pci/pcie/portdrv_core.c
> > index 7d04f9d087a6..1b330129089f 100644
> > --- a/drivers/pci/pcie/portdrv_core.c
> > +++ b/drivers/pci/pcie/portdrv_core.c
> > @@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int 
> > mask,
> >  * 7.8.2, 7.10.10, 7.31.2.
> >  */
> >   
> > -   if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) {
> > +   if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
> > +   PCIE_PORT_SERVICE_BWNOTIF)) {
> > pcie_capability_read_word(dev, PCI_EXP_FLAGS, );
> > *pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9;
> > nvec = *pme + 1;  
> 
> Good catch!



Re: [PATCH] riscv: Support non-coherency memory model

2019-04-22 Thread kbuild test robot
Hi,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.1-rc6 next-20190418]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/guoren-kernel-org/riscv-Support-non-coherency-memory-model/20190423-075013
config: riscv-tinyconfig (attached as .config)
compiler: riscv64-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=riscv 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 


All errors (new ones prefixed by >>):

   arch/riscv/mm/dma-mapping.c: In function 'arch_dma_prep_coherent':
>> arch/riscv/mm/dma-mapping.c:15:2: error: implicit declaration of function 
>> 'sbi_fence_dma' [-Werror=implicit-function-declaration]
 sbi_fence_dma(page_to_phys(page), size, DMA_BIDIRECTIONAL);
 ^
   cc1: some warnings being treated as errors

vim +/sbi_fence_dma +15 arch/riscv/mm/dma-mapping.c

10  
11  void arch_dma_prep_coherent(struct page *page, size_t size)
12  {
13  memset(page_address(page), 0, size);
14  
  > 15  sbi_fence_dma(page_to_phys(page), size, DMA_BIDIRECTIONAL);
16  }
17  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] kexec_buffer measure

2019-04-22 Thread Mimi Zohar
[Cc'ing LSM mailing list]

On Fri, 2019-04-19 at 17:30 -0700, prakhar srivastava wrote:

> 2) Adding a LSM hook
> We are doing both the command line and kernel version measurement in IMA.
> Can you please elaborate on how this can be used outside of the scenario?
> That will help me come back with a better design and code. I am
> neutral about this.

As I said previously, initially you might want to only measure the
kexec boot command line, but will you ever want to verify or audit log
the boot command line hash?  Perhaps LSMs would be interested in the
boot command line.  Should this be an LSM hook?

Mimi



Testing the recent RISC-V DT patchsets

2019-04-22 Thread Paul Walmsley


I've heard from two separate people who have had trouble getting started 
with BBL & open-source FSBL test flows with arbitrary DT files on the 
Freedom Unleashed board.  The following instructions should help get 
people started.

The core issue, aside from general unfamiliarity, is that multiple parts 
of the pre-kernel software stack try to parse and/or modify the kernel DT.
We wish to avoid this as much as possible.

Testing with U-boot and OpenSBI is currently left as an exercise for the 
reader, for a similar reason and because those ports are still quite new.

The following instructions are provided with no warranty whatsoever, and 
assume knowledge of the shell and Linux.  If implemented carelessly, may 
trash your filesystems or do other horrible things.


- Paul


These instructions assume that bare metal and Linux RV64 cross-toolchains 
are installed.  If not, consider using crosstool-ng with the 
"riscv64-unknown-elf" and "riscv64-unknown-linux-gnu" experimental sample 
configurations.  You will need both.

1. Put the location of the temporary build tree into the BASE
   environment variable, and set up some initial directories:
   export BASE=~/riscv-test; mkdir -p ${BASE}/work

2. Partition a microSD card with (at least) two GPT partitions.
   Here is a sample sfdisk dump:

label: gpt
label-id: 074689DB-0440-411C-91DB-440DFE5BA0B6
device: /dev/sda
unit: sectors
first-lba: 34
last-lba: 62333918

/dev/sda1 : start=2048, size=2048, 
type=5B193300-FC78-40CD-8002-E86C45580B47, 
uuid=DEAD9378-45FF-44FB-B2E3-F3FEA45ADC9E, name="fsbl"
/dev/sda2 : start=4096, size=   65536, 
type=2E54B353-1271-4842-806F-E436D6AF6985, 
uuid=1B48DE68-8004-444D-BA47-AAA8DBEBFA60, name="bbl"
/dev/sda3 : start=   69632, size=62264287, 
type=0FC63DAF-8483-4772-8E79-3D69D8477DE4, 
uuid=D672F1FC-3E45-4CC1-835A-E6384A26C395, name="rootfs"


3. Download the open-source FSBL:
   cd ${BASE}
   git clone https://github.com/sifive/freedom-u540-c000-bootloader

4. Build the open-source FSBL:
   cd freedom-u540-c000-bootloader
   CROSSCOMPILE=/opt/rv64gc-mmu-elf/bin/riscv64-unknown-elf- make

5. Write the open-source FSBL to the first partition of the SD card with
   something like:
   sudo dd if=fsbl.bin of=/dev/SD-CARD-DEVICE1 conv=nocreat

6. Copy an initramfs sysroot into ${BASE}/work/buildroot_initramfs_sysroot.
   A reasonable one to start with is the sysroot built by
   freedom-u-sdk, in work/buildroot_initramfs_sysroot.

7. Set the CROSS_COMPILE environment variable to point to your
   cross-compiler, in the Linux kernel form:
   export 
CROSS_COMPILE=/opt/rv64gc-mmu-linux-8.2.0/bin/riscv64-unknown-linux-gnu-

8. Put something like this into a script and run it:

if [ ! -d ${BASE} ]; then
echo Base build directory must be set in the BASE environment variable
fi
if [ ! -x ${CROSS_COMPILE}gcc ]; then
echo Path to cross-compiler must be set in the CROSS_COMPILE 
environment variable
fi

export ARCH=riscv
export OBJCOPY=${CROSS_COMPILE}objcopy
export CC=${CROSS_COMPILE}gcc

CORES=$(getconf _NPROCESSORS_ONLN)

#
#

cd ${BASE}
git clone -b dev/paulw/reduce-dt-load-v1 
https://github.com/sifive/riscv-pk
git clone -b dev/paulw/dts-v5.1-rc6-experimental 
https://github.com/sifive/riscv-linux

cd riscv-linux
make -j${CORES} defconfig dtbs vmlinux

${CROSS_COMPILE}strip -o ${BASE}/work/vmlinux-stripped 
${BASE}/riscv-linux/vmlinux

rm -rf ${BASE}/work/riscv-pk
mkdir -p ${BASE}/work/riscv-pk
cd ${BASE}/work/riscv-pk
ln -sf 
${BASE}/riscv-linux/arch/riscv/boot/dts/sifive/hifive-unleashed-a00-fu540.dtb 
${BASE}/riscv-pk/linux.dtb
${BASE}/riscv-pk/configure \
--host=riscv64-unknown-linux-gnu \
--enable-print-device-tree --with-payload=../vmlinux-stripped
CFLAGS="-mabi=lp64d -march=rv64imafdc" make

$OBJCOPY -S -O binary --change-addresses -0x8000 bbl ../bbl.bin


9.  Write ${BASE}/work/bbl.bin to the second partition of your microSD 
card with something like:
sudo dd if=${BASE}/work/bbl.bin of=/dev/SD-CARD-DEVICE2 bs=64k conv=nocreat

10. Boot the microSD card on your Unleashed board.




clk/clk-next boot bisection: v5.1-rc1-142-ga55b079c961b on panda

2019-04-22 Thread kernelci.org bot
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* This automated bisection report was sent to you on the basis  *
* that you may be involved with the breaking commit it has  *
* found.  No manual investigation has been done to verify it,   *
* and the root cause of the problem may be somewhere else.  *
* Hope this helps!  *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

clk/clk-next boot bisection: v5.1-rc1-142-ga55b079c961b on panda

Summary:
  Start:  a55b079c961b Merge branch 'clk-hisi' into clk-next
  Details:https://kernelci.org/boot/id/5cbe3cdb59b514fd22fe6025
  Plain log:  
https://storage.kernelci.org//clk/clk-next/v5.1-rc1-142-ga55b079c961b/arm/omap2plus_defconfig/gcc-7/lab-baylibre/boot-omap4-panda.txt
  HTML log:   
https://storage.kernelci.org//clk/clk-next/v5.1-rc1-142-ga55b079c961b/arm/omap2plus_defconfig/gcc-7/lab-baylibre/boot-omap4-panda.html
  Result: ecbf3f1795fd clk: fixed-factor: Let clk framework find parent

Checks:
  revert: PASS
  verify: PASS

Parameters:
  Tree:   clk
  URL:https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git
  Branch: clk-next
  Target: panda
  CPU arch:   arm
  Lab:lab-baylibre
  Compiler:   gcc-7
  Config: omap2plus_defconfig
  Test suite: boot

Breaking commit found:

---
commit ecbf3f1795fda56122632c1d024cfd0d3f4fe353
Author: Stephen Boyd 
Date:   Fri Apr 12 11:31:50 2019 -0700

clk: fixed-factor: Let clk framework find parent

Convert this driver to a more modern way of specifying parents now that
we have a way to specify clk parents by DT index. This lets us nicely
avoid a problem where a parent clk name isn't know because the parent
clk hasn't been registered yet.

Cc: Miquel Raynal 
Cc: Jerome Brunet 
Cc: Russell King 
Cc: Michael Turquette 
Cc: Jeffrey Hugo 
Cc: Chen-Yu Tsai 
Tested-by: Jeffrey Hugo 
Signed-off-by: Stephen Boyd 

diff --git a/drivers/clk/clk-fixed-factor.c b/drivers/clk/clk-fixed-factor.c
index 241b3f8c61a9..5b09f2cdb7de 100644
--- a/drivers/clk/clk-fixed-factor.c
+++ b/drivers/clk/clk-fixed-factor.c
@@ -64,12 +64,14 @@ const struct clk_ops clk_fixed_factor_ops = {
 };
 EXPORT_SYMBOL_GPL(clk_fixed_factor_ops);
 
-struct clk_hw *clk_hw_register_fixed_factor(struct device *dev,
-   const char *name, const char *parent_name, unsigned long flags,
-   unsigned int mult, unsigned int div)
+static struct clk_hw *
+__clk_hw_register_fixed_factor(struct device *dev, struct device_node *np,
+   const char *name, const char *parent_name, int index,
+   unsigned long flags, unsigned int mult, unsigned int div)
 {
struct clk_fixed_factor *fix;
struct clk_init_data init;
+   struct clk_parent_data pdata = { .index = index };
struct clk_hw *hw;
int ret;
 
@@ -85,11 +87,17 @@ struct clk_hw *clk_hw_register_fixed_factor(struct device 
*dev,
init.name = name;
init.ops = _fixed_factor_ops;
init.flags = flags | CLK_IS_BASIC;
-   init.parent_names = _name;
+   if (parent_name)
+   init.parent_names = _name;
+   else
+   init.parent_data = 
init.num_parents = 1;
 
hw = >hw;
-   ret = clk_hw_register(dev, hw);
+   if (dev)
+   ret = clk_hw_register(dev, hw);
+   else
+   ret = of_clk_hw_register(np, hw);
if (ret) {
kfree(fix);
hw = ERR_PTR(ret);
@@ -97,6 +105,14 @@ struct clk_hw *clk_hw_register_fixed_factor(struct device 
*dev,
 
return hw;
 }
+
+struct clk_hw *clk_hw_register_fixed_factor(struct device *dev,
+   const char *name, const char *parent_name, unsigned long flags,
+   unsigned int mult, unsigned int div)
+{
+   return __clk_hw_register_fixed_factor(dev, NULL, name, parent_name, -1,
+ flags, mult, div);
+}
 EXPORT_SYMBOL_GPL(clk_hw_register_fixed_factor);
 
 struct clk *clk_register_fixed_factor(struct device *dev, const char *name,
@@ -143,11 +159,10 @@ static const struct of_device_id 
set_rate_parent_matches[] = {
{ /* Sentinel */ },
 };
 
-static struct clk *_of_fixed_factor_clk_setup(struct device_node *node)
+static struct clk_hw *_of_fixed_factor_clk_setup(struct device_node *node)
 {
-   struct clk *clk;
+   struct clk_hw *hw;
const char *clk_name = node->name;
-   const char *parent_name;
unsigned long flags = 0;
u32 div, mult;
int ret;
@@ -165,30 +180,28 @@ static struct clk *_of_fixed_factor_clk_setup(struct 
device_node *node)
}
 
of_property_read_string(node, "clock-output-names", _name);
-   parent_name = of_clk_get_parent_name(node, 0);
 
if 

[PATCH] driver core: platform: Fix the usage of platform device name(pdev->name)

2019-04-22 Thread Venkata Narendra Kumar Gutta
Platform core is using pdev->name as the platform device name to do
the binding of the devices with the drivers. But, when the platform
driver overrides the platform device name with dev_set_name(),
the pdev->name is pointing to a location which is freed and becomes
an invalid parameter to do the binding match.

use-after-free instance:

[   33.325013] BUG: KASAN: use-after-free in strcmp+0x8c/0xb0
[   33.330646] Read of size 1 at addr ffc10beae600 by task modprobe
[   33.339068] CPU: 5 PID: 518 Comm: modprobe Tainted:
G S  W  O  4.19.30+ #3
[   33.346835] Hardware name: MTP (DT)
[   33.350419] Call trace:
[   33.352941]  dump_backtrace+0x0/0x3b8
[   33.356713]  show_stack+0x24/0x30
[   33.360119]  dump_stack+0x160/0x1d8
[   33.363709]  print_address_description+0x84/0x2e0
[   33.368549]  kasan_report+0x26c/0x2d0
[   33.372322]  __asan_report_load1_noabort+0x2c/0x38
[   33.377248]  strcmp+0x8c/0xb0
[   33.380306]  platform_match+0x70/0x1f8
[   33.384168]  __driver_attach+0x78/0x3a0
[   33.388111]  bus_for_each_dev+0x13c/0x1b8
[   33.392237]  driver_attach+0x4c/0x58
[   33.395910]  bus_add_driver+0x350/0x560
[   33.399854]  driver_register+0x23c/0x328
[   33.403886]  __platform_driver_register+0xd0/0xe0

So, use dev_name(>dev), which fetches the platform device name from
the kobject(dev->kobj->name) of the device instead of the pdev->name.

Signed-off-by: Venkata Narendra Kumar Gutta 
---
 drivers/base/platform.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index dab0a5a..0e23aa2 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -888,7 +888,7 @@ static ssize_t modalias_show(struct device *dev, struct 
device_attribute *a,
if (len != -ENODEV)
return len;
 
-   len = snprintf(buf, PAGE_SIZE, "platform:%s\n", pdev->name);
+   len = snprintf(buf, PAGE_SIZE, "platform:%s\n", dev_name(>dev));
 
return (len >= PAGE_SIZE) ? (PAGE_SIZE - 1) : len;
 }
@@ -964,7 +964,7 @@ static int platform_uevent(struct device *dev, struct 
kobj_uevent_env *env)
return rc;
 
add_uevent_var(env, "MODALIAS=%s%s", PLATFORM_MODULE_PREFIX,
-   pdev->name);
+   dev_name(>dev));
return 0;
 }
 
@@ -973,7 +973,7 @@ static const struct platform_device_id *platform_match_id(
struct platform_device *pdev)
 {
while (id->name[0]) {
-   if (strcmp(pdev->name, id->name) == 0) {
+   if (strcmp(dev_name(>dev), id->name) == 0) {
pdev->id_entry = id;
return id;
}
@@ -1017,7 +1017,7 @@ static int platform_match(struct device *dev, struct 
device_driver *drv)
return platform_match_id(pdrv->id_table, pdev) != NULL;
 
/* fall-back to driver name match */
-   return (strcmp(pdev->name, drv->name) == 0);
+   return (strcmp(dev_name(>dev), drv->name) == 0);
 }
 
 #ifdef CONFIG_PM_SLEEP
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [PATCH] riscv: Support non-coherency memory model

2019-04-22 Thread Guo Ren
Thx Christoph,

On Mon, Apr 22, 2019 at 06:18:14PM +0200, Christoph Hellwig wrote:
> On Mon, Apr 22, 2019 at 11:44:30PM +0800, guo...@kernel.org wrote:
> >  - Add _PAGE_COHERENCY bit in current page table entry attributes. The bit
> >designates a coherence for this page mapping. Software set the bit to
> >tell the hardware that the region of the page's memory area must be
> >coherent with IOs devices in SOC system by PMA settings.
> >If IOs and CPU are already coherent in SOC system, CPU just ignore
> >this bit.
> > 
> >PTE format:
> >| XLEN-1  10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0
> >  PFN  C  RSW  D   A   G   U   X   W   R   V
> >   ^
> >BIT(9): Coherence attribute bit
> >   0: hardware needn't keep the page coherenct and software will
> >  maintain the coherence with cache clear/invalid operations.
> >   1: hardware must keep the page coherenct and software needn't
> >  maintain the coherence.
> >BIT(8): Reserved for software and now it's _PAGE_SPECIAL in linux
> > 
> >Add a new hardware bit in PTE also need to modify Privileged
> >Architecture Supervisor-Level ISA:
> >https://github.com/riscv/riscv-isa-manual/pull/374
> > 
> >  - Add SBI_FENCE_DMA 9 in riscv-sbi.
> >sbi_fence_dma(start, size, dir) could synchronize CPU cache data with
> >DMA device in non-coherency memory model. The third param's definition
> >is the same with linux's in include/linux/dma-direction.h:
> 
> Please don't make this an SBI call.  We need a proper instruction
> for cache flushing and invalidation.  We'll also need that for pmem
> support for example.  I heard at least one other vendor already
> had an instruction, and we really need to get this into the privileged
> spec ASAP (yesterday in fact).
> 
> If you have your own instructions already we can probably binary
> patch those in using the Linux alternatives mechanism once we have
> a standardized way in the privileged spec.
> 
> We should probably start a working group for this ASAP unless we can
> get another working group to help taking care of it.
Good news, I prefer to use instructions directly instead of SBI_CALL.

Our instruction is "dcache.c/iva %0" (one cache line) and the parameter is
virtual address in S-state. When get into M-state by SBI_CALL, we could
let dcache.c/iva use physical addres directly and it needn't kmap page
for RV32 with highmem (Of cause highmem is not ready in RV32 now).

> 
> > +#define pgprot_noncached pgprot_noncached
> > +static inline pgprot_t pgprot_noncached(pgprot_t _prot)
> > +{
> > +   unsigned long prot = pgprot_val(_prot);
> > +
> > +   prot |= _PAGE_COHERENCY;
> > +
> > +   return __pgprot(prot);
> 
> Nitpick: this can be shortened to
> 
>   return __pgprot(pgprot_val(prot) | _PAGE_COHERENCY));
Good.

> 
> Also is this really a coherent flag, or an 'uncached' flag like in
> many other architectures?
There are a lot of features about coherency attributes, eg: cacheable,
bufferable, strong order ..., and coherency is a more abstract name to
contain all of these. In our hardware, coherence = uncached +
unbufferable + (stong order).

But I'm not very care about the name is, uncached is also ok. My key
point is the bits of page attributes is very precious and this patch
will use the last unused attribute bit in PTE.

Another point is we could get more attribute bits by modify the riscv
spec:
 - Remove Global bit, I think it's duplicate with the User bit in linux.
 - Change _PAGE_PFN_SHIFT from 10 to 12, because the huge pfn in RV32 is
   very useless and current RV32 linux doesn't even implement highmem.

And then we could get another three page attribute bits in PTE.

> 
> > +++ b/arch/riscv/mm/dma-mapping.c
> 
> This should probably be called dma-noncoherent.c
> 
> It should also have a user visible config option so that we don't
> have to build it for fully coherent systems.
Ok, dma-noncoherent.c is more clear.

> 
> > +void arch_dma_prep_coherent(struct page *page, size_t size)
> > +{
> > +   memset(page_address(page), 0, size);
> 
> No need for this memset, the caller takes care of it.
Ok

> 
> > diff --git a/arch/riscv/mm/ioremap.c b/arch/riscv/mm/ioremap.c
> > index bd2f2db..f6aaf1e 100644
> > --- a/arch/riscv/mm/ioremap.c
> > +++ b/arch/riscv/mm/ioremap.c
> > @@ -73,7 +73,7 @@ static void __iomem *__ioremap_caller(phys_addr_t addr, 
> > size_t size,
> >   */
> >  void __iomem *ioremap(phys_addr_t offset, unsigned long size)
> >  {
> > -   return __ioremap_caller(offset, size, PAGE_KERNEL,
> > +   return __ioremap_caller(offset, size, PAGE_KERNEL_COHERENCY,
> > __builtin_return_address(0));
> >  }
> >  EXPORT_SYMBOL(ioremap);
> 
> I think ioremap is a different story, and should be a separate patch.
Ok

Best Regards
 Guo Ren


Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation

2019-04-22 Thread Alex G

On 4/22/19 5:43 PM, Alex Williamson wrote:

[  329.725607] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)
[  708.151488] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)
[  718.262959] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)
[ 1138.124932] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)

What is the value of this nagging?


Good! The bandwidth notification service is working as intended. If this 
bothers you, you can unbind the device from the bandwidth notification 
driver:


echo :07:00.0:pcie010 |
sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind




diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 7d04f9d087a6..1b330129089f 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
 * 7.8.2, 7.10.10, 7.31.2.
 */
  
-	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) {

+   if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
+   PCIE_PORT_SERVICE_BWNOTIF)) {
pcie_capability_read_word(dev, PCI_EXP_FLAGS, );
*pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9;
nvec = *pme + 1;


Good catch!


[PATCH v2 1/2] PCI: Prevent 64-bit resources from being counted in 32-bit bridge region

2019-04-22 Thread Logan Gunthorpe
In some situations (described below), hierarchies of 32-bit resources can
fail to be assigned when the kernel has to attempt to assign a large
64-bit resource. When this happens, lspci will report
some PCI BAR resources as 'ignored' and some PCI Bridge windows
being left unset. Sample lspci lines may look like:

  Memory behind bridge: fff0-000f

or

  Region 0: Memory at  (32-bit, non-prefetchable) [size=256K]

lspci reports a BAR as 'ignored' when the kernel does not populate
the struct resource and the corresponding entry in either
/proc/bus/pci/devices or /sys/bus/pci/devices/.../resource are all zero.
Any device driver that depends on one of these BARs are likely to fail
initializing and the device will not be usable. Typically when this
happens, the underlying Base Address Registers in the configuration
space are still set to whatever the firmware set them to, it's only
the kernel's view of this that is wrong.

The possible situations where this can happen will be a bit varied and
depend highly on the exact hierarchy, what the firmware has assigned
and what the kernel must do to properly re-assign resources. In the
setup that first hit this bug, it failed only with the 'pci=realloc'
command line parameter. The bug has also been hackily reproduced with
QEMU[1] without the realloc parameter.

The following things are required to hit this bug:

1) A large 64-bit prefetchable BAR that can't be assigned in any
   pass of pci_assign_unassigned_bridge_resources(). The resource must
   be large enough that it will not be able to fit with-in the 32-bit
   region. This resource may or may not be assignable into the 64-bit
   prefetchable region after additional passes.

2) A victim 32-bit non-prefetchable BAR that is a neighbor of the
   large BAR (so typically it will have to be behind a switch). When
   the bug is hit, this BAR's struct resource will not be assign and
   lspci will report it as ignored.

3) There must exist a 64-bit prefetchable window for the original large
   BAR to fit in. Which generally implies there is no 32-bit
   prefetchable window.

4) The kernel has to have a reason to re-assign the heirarchy that
   contains both BARs.

The cause of this bug is in __pci_bus_size_bridges() which tries to
calculate the total resource space required for each of the bridge windows
(typically IO, 64-bit, and 32-bit / non-prefetchable). The code, as
written, tries to allocate all the 64-bit prefetchable resources
followed by all the remaining resources. It uses three calls to
pbus_size_mem() for this:

  1) If bridge has a 64-bit prefetchable window, find the size of all
 64-bit prefetchable resources below the bridge

  2) If bridge has no 64-bit prefetchable window, find the size
 of all prefetchable resources below the bridge

  3) Find the size of everything else (non-prefetchable resources plus
 any prefetchable ones that couldn't be accommodated above)

By the requirement (3) above, the system has a 64-bit prefetchable
window, so the large 64-bit BAR *should* be assigned to the 64-bit
prefetchable region. However, if the 64-bit bus resource has already
been assigned, then this call to pbus_size_mem() will fail. (See
the find_free_bus_resource() helper). When the first call fails, it falls
to the second call but, by requirement (3) above, there is no 32-bit
prefetchable window so this call also fails. Thus, it falls to the last
call which tries to fit all the resources into the 32-bit
catch-all window. However, because of requirement (1), the large
BAR will overfill this region and cause the victim 32-bit BAR to not
be assignable.

Looking at the first call to pbus_size_mem(): there are only two reasons
for it to fail: if there is no 64-bit/prefetchable bridge window, or if that
window is already assigned. We know the former case can't be true because,
in __pci_bus_size_bridges(), its existence is checked before making the call.
So if the pbus_size_mem() call in question fails, the window must already
be assigned, and in this case, the code should not try to assign
64-bit resources into the 32-bit catch-all window.

Thus, the fix for the bug is to ensure mask, type2 and type3 are set in
cases where a 64-bit resource exists even if pbus_size_mem() fails. Once
we do this, the large BAR resource will never be attempted to be
assigned to the 32-bit catch-all window and the victim BAR will still
be correctly assigned.

[1] 
https://lore.kernel.org/lkml/de3e34d8-2ac3-e89b-30f1-a18826ce5...@deltatee.com/T/#u

Reported-by: Kit Chow 
Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 
64-bit resources")
Signed-off-by: Logan Gunthorpe 
Cc: Bjorn Helgaas 
Cc: Yinghai Lu 
---
 drivers/pci/setup-bus.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ec44a0f3a7ac..0eb40924169b 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1228,21 +1228,20 @@ void 

[PATCH v2 0/2] Fix a pair of setup bus bugs.

2019-04-22 Thread Logan Gunthorpe
Hey,

This is largely a resend to get some more attention. I've attempted to clean
up and expand on the commit message of the first commit because the
bug is a bit of a nightmare to explain and follow. There's a lot more
information on the first commit in the original thread here[1] including
instructions on how to reproduce it in QEMU.

The second patch fixes an unrelated bug, with similar symptoms, in
the same code. It was a lot easier to debug and the reasoning should
hopefully be easier to follow, but I don't think it was reviewed much
during the first posting due to the nightmare in the first patch.

Thanks,

Logan

[1] 
https://lore.kernel.org/lkml/de3e34d8-2ac3-e89b-30f1-a18826ce5...@deltatee.com/T/#m96ba95de4678146ed46b602e8bfd6ac08a588fa2

Logan Gunthorpe (2):
  PCI: Prevent 64-bit resources from being counted in 32-bit bridge
region
  PCI: Fix disabling of bridge BARs when assigning bus resources

 drivers/pci/setup-bus.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

--
2.20.1


[PATCH v2 2/2] PCI: Fix disabling of bridge BARs when assigning bus resources

2019-04-22 Thread Logan Gunthorpe
One odd quirk of PLX switches is that their upstream bridge port has
256K of space allocated behind its BAR0 (most other bridge
implementations do not report any BAR space). The lspci for such  device
looks like:

  04:00.0 PCI bridge: PLX Technology, Inc. PEX 8724 24-Lane, 6-Port PCI
Express Gen 3 (8 GT/s) Switch, 19 x 19mm FCBGA (rev ca)
(prog-if 00 [Normal decode])
  Physical Slot: 1
  Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0
  Memory at 90a0 (32-bit, non-prefetchable) [size=256K]
  Bus: primary=04, secondary=05, subordinate=0a, sec-latency=0
  I/O behind bridge: 2000-3fff
  Memory behind bridge: 9000-909f
  Prefetchable memory behind bridge: 3880-38bf
  Kernel driver in use: pcieport

It's not clear what the purpose of the memory at 0x90a0 is, and
currently the kernel never actually uses it for anything. In most cases,
it's safely ignored and does not cause a problem.

However, when the kernel assigns the resource addresses (with the
pci=realloc command line parameter, for example) it can inadvertently
disable the struct resource corresponding to the bar. When this happens,
lspci will report this memory as ignored:

   Region 0: Memory at  (32-bit, non-prefetchable) [size=256K]

This is because the kernel reports a zero start address and zero flags
in the corresponding sysfs resource file and in /proc/bus/pci/devices.
Investigation with 'lspci -x', however shows the bios-assigned address
will still be programmed in the device's BAR registers.

In many cases, this still isn't a problem. Nothing uses the memory,
so nothing is affected. However, a big problem shows up when an IOMMU
is in use: the IOMMU will not reserve this space in the IOVA because the
kernel no longer thinks the range is valid. (See
dmar_init_reserved_ranges() for the Intel implementation of this.)

Without the proper reserved range, we have a situation where a DMA
mapping may occasionally allocate an IOVA which the PCI bus will actually
route to a BAR in the PLX switch. This will result in some random DMA
writes not actually writing to the RAM they are supposed to, or random
DMA reads returning all FFs from the PLX BAR when it's supposed to have
read from RAM.

The problem is caused in pci_assign_unassigned_root_bus_resources().
When any resource from a bridge device fails to get assigned, the code
sets the resource's flags to zero. This makes sense for bridge resources,
as they will be re-enabled later, but for regular BARs, it disables them
permanently. To fix the problem, we only set the flags to zero for
bridge resources and treat any other resources like non-bridge devices.

Reported-by: Kit Chow 
Fixes: da7822e5ad71 ("PCI: update bridge resources to get more big ranges when 
allocating space (again)")
Signed-off-by: Logan Gunthorpe 
Cc: Bjorn Helgaas 
Cc: Yinghai Lu 
---
 drivers/pci/setup-bus.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0eb40924169b..7adbd4bedd16 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1784,11 +1784,16 @@ void pci_assign_unassigned_root_bus_resources(struct 
pci_bus *bus)
/* restore size and flags */
list_for_each_entry(fail_res, _head, list) {
struct resource *res = fail_res->res;
+   int idx;
 
res->start = fail_res->start;
res->end = fail_res->end;
res->flags = fail_res->flags;
-   if (fail_res->dev->subordinate)
+
+   idx = res - _res->dev->resource[0];
+   if (fail_res->dev->subordinate &&
+   idx >= PCI_BRIDGE_RESOURCES &&
+   idx <= PCI_BRIDGE_RESOURCE_END)
res->flags = 0;
}
free_list(_head);
-- 
2.20.1



Re: [PATCH v1 1/3] PCI / ACPI: Do not export pci_get_hp_params()

2019-04-22 Thread Alex G

On 4/22/19 3:58 PM, Bjorn Helgaas wrote:

On Fri, Feb 08, 2019 at 10:24:11AM -0600, Alexandru Gagniuc wrote:

This is only used within drivers/pci, and there is no reason to make
it available outside of the PCI core.

Signed-off-by: Alexandru Gagniuc 


Applied the whole series to pci/hotplug for v5.2, thanks!

I dropped the "list" member from struct hpx_type3 because it didn't
seem to be used.


That's a good call. That was a vestigial appendage from when I first 
intended to store a list of registers in memory. I'm glad we didn't end 
up needing a list.


Alex


Re: [PATCH v2] binfmt_elf: Move brk out of mmap when doing direct loader exec

2019-04-22 Thread Andrew Morton
On Mon, 22 Apr 2019 15:57:27 -0700 Kees Cook  wrote:

> Commit eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"),
> made changes in the rare case when the ELF loader was directly invoked
> (e.g to set a non-inheritable LD_LIBRARY_PATH, testing new versions of
> the loader), by moving into the mmap region to avoid both ET_EXEC and PIE
> binaries. This had the effect of also moving the brk region into mmap,
> which could lead to the stack and brk being arbitrarily close to each
> other. An unlucky process wouldn't get its requested stack size and stack
> allocations could end up scribbling on the heap.
>
> ...
>
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1131,16 +1131,18 @@ static int load_elf_binary(struct linux_binprm *bprm)
>   current->mm->end_data = end_data;
>   current->mm->start_stack = bprm->p;
>  
> - /*
> -  * When executing a loader directly (ET_DYN without Interp), move
> -  * the brk area out of the mmap region (since it grows up, and may
> -  * collide early with the stack growing down), and into the unused
> -  * ELF_ET_DYN_BASE region.
> -  */
> - if (!interpreter)
> - current->mm->brk = current->mm->start_brk = ELF_ET_DYN_BASE;

The above bit isn't there any more.  Here's what I queued:

--- 
a/fs/binfmt_elf.c~binfmt_elf-move-brk-out-of-mmap-when-doing-direct-loader-exec
+++ a/fs/binfmt_elf.c
@@ -1134,6 +1134,17 @@ out_free_interp:
current->mm->start_stack = bprm->p;
 
if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) {
+   /*
+* For architectures with ELF randomization, when executing
+* a loader directly (i.e. no interpreter listed in ELF
+* headers), move the brk area out of the mmap region
+* (since it grows up, and may collide early with the stack
+* growing down), and into the unused ELF_ET_DYN_BASE region.
+*/
+   if (IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) && !interpreter)
+   current->mm->brk = current->mm->start_brk =
+   ELF_ET_DYN_BASE;
+
current->mm->brk = current->mm->start_brk =
arch_randomize_brk(current->mm);
 #ifdef compat_brk_randomized
_



Re: [PATCH 0/7] introduce cpu.headroom knob to cpu controller

2019-04-22 Thread Song Liu
Hi Vincent,

> On Apr 17, 2019, at 5:56 AM, Vincent Guittot  
> wrote:
> 
> On Wed, 10 Apr 2019 at 21:43, Song Liu  wrote:
>> 
>> Hi Morten,
>> 
>>> On Apr 10, 2019, at 4:59 AM, Morten Rasmussen  
>>> wrote:
>>> 
> 
>>> 
>>> The bit that isn't clear to me, is _why_ adding idle cycles helps your
>>> workload. I'm not convinced that adding headroom gives any latency
>>> improvements beyond watering down the impact of your side jobs. AFAIK,
>> 
>> We think the latency improvements actually come from watering down the
>> impact of side jobs. It is not just statistically improving average
>> latency numbers, but also reduces resource contention caused by the side
>> workload. I don't know whether it is from reducing contention of ALUs,
>> memory bandwidth, CPU caches, or something else, but we saw reduced
>> latencies when headroom is used.
>> 
>>> the throttling mechanism effectively removes the throttled tasks from
>>> the schedule according to a specific duty cycle. When the side job is
>>> not throttled the main workload is experiencing the same latency issues
>>> as before, but by dynamically tuning the side job throttling you can
>>> achieve a better average latency. Am I missing something?
>>> 
>>> Have you looked at your distribution of main job latency and tried to
>>> compare with when throttling is active/not active?
>> 
>> cfs_bandwidth adjusts allowed runtime for each task_group each period
>> (configurable, 100ms by default). cpu.headroom logic applies gentle
>> throttling, so that the side workload gets some runtime in every period.
>> Therefore, if we look at time window equal to or bigger than 100ms, we
>> don't really see "throttling active time" vs. "throttling inactive time".
>> 
>>> 
>>> I'm wondering if the headroom solution is really the right solution for
>>> your use-case or if what you are really after is something which is
>>> lower priority than just setting the weight to 1. Something that
>> 
>> The experiments show that, cpu.weight does proper work for priority: the
>> main workload gets priority to use the CPU; while the side workload only
>> fill the idle CPU. However, this is not sufficient, as the side workload
>> creates big enough contention to impact the main workload.
>> 
>>> (nearly) always gets pre-empted by your main job (SCHED_BATCH and
>>> SCHED_IDLE might not be enough). If your main job consist
>>> of lots of relatively short wake-ups things like the min_granularity
>>> could have significant latency impact.
>> 
>> cpu.headroom gives benefits in addition to optimizations in pre-empt
>> side. By maintaining some idle time, fewer pre-empt actions are
>> necessary, thus the main workload will get better latency.
> 
> I agree with Morten's proposal, SCHED_IDLE should help your latency
> problem because side job will be directly preempted unlike normal cfs
> task even lowest priority.
> In addition to min_granularity, sched_period also has an impact on the
> time that a task has to wait before preempting the running task. Also,
> some sched_feature like GENTLE_FAIR_SLEEPERS can also impact the
> latency of a task.
> 
> It would be nice to know if the latency problem comes from contention
> on cache resources or if it's mainly because you main load waits
> before running on a CPU
> 
> Regards,
> Vincent

Thanks for these suggestions. Here are some more tests to show the impact 
of scheduler knobs and cpu.headroom.

side-load | cpu.headroom | side/cpu.weight | min_gran | cpu-idle | main/latency

  none|  0   | n/a |1 ms  |  45.20%  |   1.00
 ffmpeg   |  0   |  1  |   10 ms  |   3.38%  |   1.46
 ffmpeg   |  0   |   SCHED_IDLE|1 ms  |   5.69%  |   1.42
 ffmpeg   |20%   |   SCHED_IDLE|1 ms  |  19.00%  |   1.13
 ffmpeg   |30%   |   SCHED_IDLE|1 ms  |  27.60%  |   1.08

In all these cases, the main workload is loaded with same level of 
traffic (request per second). Main workload latency numbers are normalized 
based on the baseline (first row). 

For the baseline, the main workload runs without any side workload, the 
system has about 45.20% idle CPU. 

The next two rows compare the impact of scheduling knobs cpu.weight and 
sched_min_granularity. With cpu.weight of 1 and min_granularity of 10ms, 
we see a latency of 1.46; with SCHED_IDLE and min_granularity of 1ms, we 
see a latency of 1.42. So SCHED_IDLE and min_granularity help protecting 
the main workload. However, it is not sufficient, as the latency overhead 
is high (>40%). 

The last two rows show the benefit of cpu.headroom. With 20% headroom, 
the latency is 1.13; while with 30% headroom, the latency is 1.08. 

We can also see a clear correlation between latency and global idle CPU: 
more idle CPU yields better lower latency. 

Over all, these results show that cpu.headroom provides effective 
mechanism to control the latency impact of side 

Re: [RFC PATCH 60/62] orangefs: make use of ->free_inode()

2019-04-22 Thread Mike Marshall
Hi Linus and Al...

I just wanted Al to know I tested his patch and acked it and that it
there would be
a conflict if our pagecache code got pulled... I wasn't suggesting that I
should get that one part of Al's patch pulled...

>> I can easily handle any trivial conflicts this causes...

Thanks :-)

-Mike

On Mon, Apr 22, 2019 at 7:10 PM Al Viro  wrote:
>
> On Mon, Apr 22, 2019 at 02:56:57PM -0700, Linus Torvalds wrote:
> > On Mon, Apr 22, 2019 at 2:14 PM Mike Marshall  wrote:
> > >
> > > I applied your "new inode method: ->free_inode()" and
> > > "orangefs: make use of ->free_inode()" to our pagecache
> > > branch (I hope to get it pulled in the next merge window).
> >
> > Actually, please don't.
> >
> > Exactly because this needs that common vfs patch, I'd really prefer to
> > get it all through Al's tree, rather than have individual filesystems
> > apply their own copies of the common infrastructure commit, and then
> > apply their changes on top of that.
> >
> > I can easily handle any trivial conflicts this causes, so that's not a
> > reason to have each filesystem do it either.
> >
> > So if this is at the top of your tree, can you just "git reset" it
> > away and I'll get all the filesystems (and the common infrastructure
> > commit) all together from Al.
>
> What's more, seeing the changes in orangefs tree I would rather have
> static void orangefs_free_inode(struct inode *inode)
> {
> struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
> kmem_cache_free(orangefs_inode_cache, orangefs_inode);
> }
>
> in that series; not only less noise on merge, but with additional
> uses of orangefs_inode in the body from orangefs tree changes
> keeping the local variable clearly makes sense...


Re: [RFC PATCH 60/62] orangefs: make use of ->free_inode()

2019-04-22 Thread Al Viro
On Mon, Apr 22, 2019 at 02:56:57PM -0700, Linus Torvalds wrote:
> On Mon, Apr 22, 2019 at 2:14 PM Mike Marshall  wrote:
> >
> > I applied your "new inode method: ->free_inode()" and
> > "orangefs: make use of ->free_inode()" to our pagecache
> > branch (I hope to get it pulled in the next merge window).
> 
> Actually, please don't.
> 
> Exactly because this needs that common vfs patch, I'd really prefer to
> get it all through Al's tree, rather than have individual filesystems
> apply their own copies of the common infrastructure commit, and then
> apply their changes on top of that.
> 
> I can easily handle any trivial conflicts this causes, so that's not a
> reason to have each filesystem do it either.
> 
> So if this is at the top of your tree, can you just "git reset" it
> away and I'll get all the filesystems (and the common infrastructure
> commit) all together from Al.

What's more, seeing the changes in orangefs tree I would rather have
static void orangefs_free_inode(struct inode *inode)
{
struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
kmem_cache_free(orangefs_inode_cache, orangefs_inode);
}

in that series; not only less noise on merge, but with additional
uses of orangefs_inode in the body from orangefs tree changes
keeping the local variable clearly makes sense...


Re: [PATCH v3] signal: trace_signal_deliver when signal_group_exit

2019-04-22 Thread kbuild test robot
Hi Zhenliang,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.1-rc6 next-20190418]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Zhenliang-Wei/signal-trace_signal_deliver-when-signal_group_exit/20190423-062107
config: i386-randconfig-x010-201916 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 


All error/warnings (new ones prefixed by >>):

   In file included from arch/x86/include/uapi/asm/signal.h:94:0,
from arch/x86/include/asm/signal.h:36,
from include/uapi/linux/signal.h:5,
from include/linux/signal_types.h:10,
from include/linux/sched.h:28,
from include/linux/sched/mm.h:7,
from kernel/signal.c:16:
   kernel/signal.c: In function 'get_signal':
>> include/uapi/asm-generic/signal-defs.h:24:17: error: passing argument 3 of 
>> 'trace_signal_deliver' from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
#define SIG_DFL ((__force __sighandler_t)0) /* default signal handling */
^
>> kernel/signal.c:2444:50: note: in expansion of macro 'SIG_DFL'
  trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, SIG_DFL);
 ^~~
   In file included from include/trace/syscall.h:5:0,
from include/linux/syscalls.h:86,
from kernel/signal.c:29:
   include/linux/tracepoint.h:235:21: note: expected 'struct k_sigaction *' but 
argument is of type 'void (*)(int)'
 static inline void trace_##name(proto)\
^
   include/linux/tracepoint.h:398:2: note: in expansion of macro 
'__DECLARE_TRACE'
 __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
 ^~~
   include/linux/tracepoint.h:534:2: note: in expansion of macro 'DECLARE_TRACE'
 DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
 ^
>> include/trace/events/signal.h:96:1: note: in expansion of macro 'TRACE_EVENT'
TRACE_EVENT(signal_deliver,
^~~
   cc1: some warnings being treated as errors

vim +/trace_signal_deliver +24 include/uapi/asm-generic/signal-defs.h

b1ecb4c3 include/asm-generic/signal.h Al Viro 2005-05-04  23  
b1ecb4c3 include/asm-generic/signal.h Al Viro 2005-05-04 @24  #define 
SIG_DFL   ((__force __sighandler_t)0) /* default signal handling */
b1ecb4c3 include/asm-generic/signal.h Al Viro 2005-05-04  25  #define 
SIG_IGN   ((__force __sighandler_t)1) /* ignore signal */
b1ecb4c3 include/asm-generic/signal.h Al Viro 2005-05-04  26  #define 
SIG_ERR   ((__force __sighandler_t)-1)/* error return from signal */
b1ecb4c3 include/asm-generic/signal.h Al Viro 2005-05-04  27  #endif
ad158879 include/asm-generic/signal.h David Woodhouse 2006-04-27  28  

:: The code at line 24 was first introduced by commit
:: b1ecb4c3a9e33cc8b93ac9cb046b535b72a15f68 [PATCH] asm/signal.h unification

:: TO: Al Viro 
:: CC: Linus Torvalds 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v4 07/10] drivers: pinctrl: msm: setup GPIO irqchip hierarchy

2019-04-22 Thread Lina Iyer

On Wed, Apr 17 2019 at 07:59 -0600, Linus Walleij wrote:

On Thu, Mar 21, 2019 at 10:54 PM Stephen Boyd  wrote:

Quoting Marc Zyngier (2019-03-16 04:39:48)> > On Fri, 15 Mar 2019 09:28:31 -0700
> Stephen Boyd  wrote:
>
> > Quoting Lina Iyer (2019-03-13 14:18:41)
> > > @@ -994,6 +1092,22 @@ static int msm_gpio_init(struct msm_pinctrl *pctrl)
> > > pctrl->irq_chip.irq_request_resources = msm_gpio_irq_reqres;
> > > pctrl->irq_chip.irq_release_resources = msm_gpio_irq_relres;
> > >
> > > +   chip->irq.chip = >irq_chip;
> > > +   chip->irq.domain_ops = _gpio_domain_ops;
> > > +   chip->irq.handler = handle_edge_irq;
> > > +   chip->irq.default_type = IRQ_TYPE_EDGE_RISING;
> >
> > This also changed from v3. It used to be IRQ_TYPE_NONE. Specifying this
> > here seems to cause gpiolib to print a WARN.
> >
> >
> > /*
> >  * Specifying a default trigger is a terrible idea if DT or ACPI is
> >  * used to configure the interrupts, as you may end up with
> >  * conflicting triggers. Tell the user, and reset to NONE.
> >  */
> > if (WARN(np && type != IRQ_TYPE_NONE,
> >  "%s: Ignoring %u default trigger\n", np->full_name, type))
> > type = IRQ_TYPE_NONE;
> >
> >
> > So I guess this change should be dropped. Or at the least, it should be
> > split out to it's own patch and the motivations can be discussed in the
> > commit text.
>
> It is something I requested (although I expected this to be a
> different patch, and even a clarification would have been OK).
>
> One way or another, the default trigger must match the flow handler. If
> we set it up with IRQ_TYPE_NONE, what does it mean? The fact that
> IRQ_TYPE_NONE acts as a wildcard doesn't mean the handle_edge_irq flow
> handler is a good match for all interrupt types (it is rarely OK for
> level interrupts).

I think this is a question for Thierry or Linus. I'm not sure why this
check was put in place in the code. I tried to dig into it really quick
but I didn't find anything obvious and then I gave up.

Maybe with hierarchical irqdomains we can drop this check? I don't think
the gpiolib core ever uses this 'default_type' or 'handler' for anything
once we replace the irqdomain that's used for a particular gpiochip with
a custom irqdomain. The only user I see, gpiochip_irq_map(), won't ever
be called so it really ends up being a thing that the driver specific
irqdomains should check for and reject when parsing the DT and it sees
IRQ_TYPE_NONE come out.

--8<---
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 144af0733581..fe2f7888c473 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -1922,7 +1922,7 @@ static int gpiochip_add_irqchip(struct gpio_chip 
*gpiochip,
 * used to configure the interrupts, as you may end up with
 * conflicting triggers. Tell the user, and reset to NONE.
 */
-   if (WARN(np && type != IRQ_TYPE_NONE,
+   if (WARN(!gpiochip->irq.domain_ops && np && type != IRQ_TYPE_NONE,
 "%s: Ignoring %u default trigger\n", np->full_name, type))
type = IRQ_TYPE_NONE;


Sorry for taking long time to answer... this got lost in some mail
storms.

It's a bit of Marc Z question really but I try to answer and
he can correct me.

We are now getting used to ACPI and DT always specifying
the IRQ trigger type on the consumer handle: a device tells
the irqchip what kind of edge or level it wants.

Things weren't always like that.

Some boards in the kernel is still using board files. (Yeah
please help in modernizing them, I am doing my part.)

Old machines with GPIO irqchip jitted to the SoC irq controller
sometimes had a hardcoded behavior such as edge, and the
consumers would only issue something really legacy
like

request_irq(42, myhandler, 0, "myirq", data);

and expect it to work, since 0 means use the default flags,
it might have a platform device with this irq number passed
as a resource, but that is a really dumb platform device still,
and it might not have set any irqflags for the irq number
it passes. It probably doesn't even know that the irq number
is backed by an irq descriptor.

Since the code that e.g. DT has inside drivers/of/platform.c
irq_of_parse_and_map(), will incidentally create an irq
descriptor and set up these flags from the consumer flags in the
device tree and call the irqchip to set up the trigger through
.set_type() whenever the interrupt is requested, this is no
problem for DT. Or ACPI.

But on a board file, the .set_type() will eventually be called
with IRQ_TYPE_NONE, which will cause a bug, or no IRQs
or something like that.

So a bunch of GPIO irqchips are created passing
IRQ_TYPE_EDGE_* or IRQ_TYPE_LEVEL_* to set up a default
trigger, because all the irqs on this chip use the same trigger
anyway, and they only have one flow handler anyway.
Everything is edge, or everything is level or so.
irq_set_irq_type() will be 

[PATCH v2] binfmt_elf: Move brk out of mmap when doing direct loader exec

2019-04-22 Thread Kees Cook
Commit eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"),
made changes in the rare case when the ELF loader was directly invoked
(e.g to set a non-inheritable LD_LIBRARY_PATH, testing new versions of
the loader), by moving into the mmap region to avoid both ET_EXEC and PIE
binaries. This had the effect of also moving the brk region into mmap,
which could lead to the stack and brk being arbitrarily close to each
other. An unlucky process wouldn't get its requested stack size and stack
allocations could end up scribbling on the heap.

This is illustrated here. In the case of using the loader directly, brk
(so helpfully identified as "[heap]") is allocated with the _loader_
not the binary. For example, with ASLR entirely disabled, you can see
this more clearly:

$ /bin/cat /proc/self/maps
4000-c000 r-xp  ... /bin/cat
5575b000-5575c000 r--p 7000 ... /bin/cat
5575c000-5575d000 rw-p 8000 ... /bin/cat
5575d000-5577e000 rw-p  ... [heap]
...
77ff7000-77ffa000 r--p  ... [vvar]
77ffa000-77ffc000 r-xp  ... [vdso]
77ffc000-77ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so
77ffd000-77ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so
77ffe000-77fff000 rw-p  ...
7ffde000-7000 rw-p  ... [stack]

$ /lib/x86_64-linux-gnu/ld-2.27.so /bin/cat /proc/self/maps
...
77bcc000-77bd4000 r-xp  ... /bin/cat
77bd4000-77dd3000 ---p 8000 ... /bin/cat
77dd3000-77dd4000 r--p 7000 ... /bin/cat
77dd4000-77dd5000 rw-p 8000 ... /bin/cat
77dd5000-77dfc000 r-xp  ... /lib/x86_64-linux-gnu/ld-2.27.so
77fb2000-77fd6000 rw-p  ...
77ff7000-77ffa000 r--p  ... [vvar]
77ffa000-77ffc000 r-xp  ... [vdso]
77ffc000-77ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so
77ffd000-77ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so
77ffe000-7802 rw-p  ... [heap]
7ffde000-7000 rw-p  ... [stack]

The solution is to move brk out of mmap and into ELF_ET_DYN_BASE since
nothing is there in the direct loader case (and ET_EXEC is still far
away at 0x40). Anything that ran before should still work (i.e. the
ultimately-launched binary already had the brk very far from its text, so
this should be no different from a COMPAT_BRK standpoint). The only risk
I see here is that if someone started to suddenly depend on the entire
memory space lower than the mmap region being available when launching
binaries via a direct loader execs which seems highly unlikely, I'd hope:
this would mean a binary would _not_ work when exec()ed normally.

(Note that this is only done under CONFIG_ARCH_HAS_ELF_RANDOMIZATION when
randomization is turned on.)

Reported-by: Ali Saidi 
Link: 
https://lkml.kernel.org/r/CAGXu5jJ5sj3emOT2QPxQkNQk0qbU6zEfu9=omfhx_p0nckp...@mail.gmail.com
Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
Signed-off-by: Kees Cook 
---
v2: limit effect to only architectures that are expecting it! (Gunter)
---
 fs/binfmt_elf.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index fe5668a1bbaa..8cec7a97bfb7 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1131,16 +1131,18 @@ static int load_elf_binary(struct linux_binprm *bprm)
current->mm->end_data = end_data;
current->mm->start_stack = bprm->p;
 
-   /*
-* When executing a loader directly (ET_DYN without Interp), move
-* the brk area out of the mmap region (since it grows up, and may
-* collide early with the stack growing down), and into the unused
-* ELF_ET_DYN_BASE region.
-*/
-   if (!interpreter)
-   current->mm->brk = current->mm->start_brk = ELF_ET_DYN_BASE;
-
if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) {
+   /*
+* For architectures with ELF randomization, when executing
+* a loader directly (i.e. no interpreter listed in ELF
+* headers), move the brk area out of the mmap region
+* (since it grows up, and may collide early with the stack
+* growing down), and into the unused ELF_ET_DYN_BASE region.
+*/
+   if (IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) && !interpreter)
+   current->mm->brk = current->mm->start_brk =
+   ELF_ET_DYN_BASE;
+
current->mm->brk = current->mm->start_brk =
arch_randomize_brk(current->mm);
 #ifdef compat_brk_randomized
-- 
2.17.1


-- 
Kees Cook


Re: linux-next: build failure after merge of the imx-mxs tree

2019-04-22 Thread Fabio Estevam
On Mon, Apr 22, 2019 at 7:45 PM Stephen Rothwell  wrote:
>
> Hi all,
>
> After merging the imx-mxs tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
>
> arch/arm/boot/dts/imx7d-zii-rpu2.dts:46.12-50.4: Warning 
> (io_channels_property): /iio-hwmon: Missing property '#io-channel-cells' in 
> node /soc/aips-bus@3040/adc@3061 or bad phandle (referred from 
> io-channels[0])
>
> Caused by commit
>
>   69ab5392f517 ("ARM: dts: Add support for ZII i.MX7 RPU2 board")

Andrey has submitted a fix for this issue:
https://lkml.org/lkml/2019/4/14/168


[PATCH] PCI/LINK: Account for BW notification in vector calculation

2019-04-22 Thread Alex Williamson
On systems that don't support any PCIe services other than bandwidth
notification, pcie_message_numbers() can return zero vectors, causing
the vector reallocation in pcie_port_enable_irq_vec() to retry with
zero, which fails, resulting in fallback to INTx (which might be
broken) for the bandwidth notification service.  This can resolve
spurious interrupt faults due to this service on some systems.

Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth 
notification")
Signed-off-by: Alex Williamson 
---

However, the system is still susceptible to random spew in dmesg
depending on how the root port handles downstream device managed link
speed changes.  For example, GPUs like to scale their link speed for
power management when idle.  A GPU assigned to a VM through vfio-pci
can generate link bandwidth notification every time the link is
scaled down, ex:

[  329.725607] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)
[  708.151488] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)
[  718.262959] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)
[ 1138.124932] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)

What is the value of this nagging?

 drivers/pci/pcie/portdrv_core.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 7d04f9d087a6..1b330129089f 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
 * 7.8.2, 7.10.10, 7.31.2.
 */
 
-   if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) {
+   if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
+   PCIE_PORT_SERVICE_BWNOTIF)) {
pcie_capability_read_word(dev, PCI_EXP_FLAGS, );
*pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9;
nvec = *pme + 1;



Re: [PATCH] binfmt_elf: Move brk out of mmap when doing direct loader exec

2019-04-22 Thread Kees Cook
On Thu, Apr 18, 2019 at 7:57 AM Guenter Roeck  wrote:
>
> On Mon, Apr 15, 2019 at 09:23:20PM -0700, Kees Cook wrote:
> > Commit eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"),
> > made changes in the rare case when the ELF loader was directly invoked
> > (e.g to set a non-inheritable LD_LIBRARY_PATH, testing new versions of
> > the loader), by moving into the mmap region to avoid both ET_EXEC and PIE
> > binaries. This had the effect of also moving the brk region into mmap,
> > which could lead to the stack and brk being arbitrarily close to each
> > other. An unlucky process wouldn't get its requested stack size and stack
> > allocations could end up scribbling on the heap.
> >
>
> This patch results in crashes of my xtensa boot tests.
>
> Run /sbin/init as init process
> Kernel panic - not syncing: Attempted to kill init!  exitcode=0x000b

Thanks for finding this! I *think* the issue is that I needed to be
testing for CONFIG_ARCH_HAS_ELF_RANDOMIZATION, which xtensa lacks.
I'll get this fixed up and resent through -mm.

-- 
Kees Cook


Re: [PATCH net] net/ncsi: handle overflow when incrementing mac address

2019-04-22 Thread Tao Ren
On 4/22/19 2:54 PM, Jakub Kicinski wrote:
> On Mon, 22 Apr 2019 10:27:54 -0700, Tao Ren wrote:
>> Previously BMC's MAC address is calculated by simply adding 1 to the
>> last byte of network controller's MAC address, and it produces incorrect
>> result when network controller's MAC address ends with 0xFF.
>> The problem is fixed by detecting integer overflow when incrementing MAC
>> address and adding the carry bit (if any) to the next/left bytes of the
>> MAC address.
>>
> 
> It'd be good to have a Fixes tag, if it's worth going to the net tree.

Thank you for the quick review Jakub. Sure, I will update the patch description 
with Fixes tag accordingly.

>> Signed-off-by: Tao Ren 
>> ---
>>  net/ncsi/ncsi-rsp.c | 10 --
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
>> index dc07fcc7938e..eb42bbdb7501 100644
>> --- a/net/ncsi/ncsi-rsp.c
>> +++ b/net/ncsi/ncsi-rsp.c
>> @@ -658,7 +658,8 @@ static int ncsi_rsp_handler_oem_bcm_gma(struct 
>> ncsi_request *nr)
>>  const struct net_device_ops *ops = ndev->netdev_ops;
>>  struct ncsi_rsp_oem_pkt *rsp;
>>  struct sockaddr saddr;
>> -int ret = 0;
>> +int ret, offset;
>> +u16 carry = 1;
>>  
>>  /* Get the response header */
>>  rsp = (struct ncsi_rsp_oem_pkt *)skb_network_header(nr->rsp);
>> @@ -667,7 +668,12 @@ static int ncsi_rsp_handler_oem_bcm_gma(struct 
>> ncsi_request *nr)
>>  ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
>>  memcpy(saddr.sa_data, >data[BCM_MAC_ADDR_OFFSET], ETH_ALEN);
>>  /* Increase mac address by 1 for BMC's address */
>> -saddr.sa_data[ETH_ALEN - 1]++;
>> +offset = ETH_ALEN - 1;
>> +do {
>> +carry += (u8)saddr.sa_data[offset];
>> +saddr.sa_data[offset] = (char)carry;
>> +carry = carry >> 8;
>> +} while (carry != 0 && --offset >= 0);
> 
> We have eth_addr_dec(), perhaps it'd be good to add an eth_addr_inc()
> equivalent?  (I'm not sure if it'd have to be in net-next, it's a tiny
> function, and OK for net for my taste, but I had been wrong before).

Make sense. I will split the patch to 2 then: 1) add eth_addr_inc() into 
linux/etherdevice.h 2) fixes overflow when incrementing mac address by calling 
eth_addr_inc() function.

> If I'm allowed to be paranoid I'd also advise checking the resulting
> MAC is a valid ethernet unicast addr.
> 
>>  ret = ops->ndo_set_mac_address(ndev, );
>>  if (ret < 0)
>>  netdev_warn(ndev, "NCSI: 'Writing mac address to device 
>> failed\n");

Thanks for the suggestion. Will add the check.

- Tao


linux-next: build failure after merge of the imx-mxs tree

2019-04-22 Thread Stephen Rothwell
Hi all,

After merging the imx-mxs tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

arch/arm/boot/dts/imx7d-zii-rpu2.dts:46.12-50.4: Warning 
(io_channels_property): /iio-hwmon: Missing property '#io-channel-cells' in 
node /soc/aips-bus@3040/adc@3061 or bad phandle (referred from 
io-channels[0])

Caused by commit

  69ab5392f517 ("ARM: dts: Add support for ZII i.MX7 RPU2 board")

-- 
Cheers,
Stephen Rothwell


pgpJEJwpViMmi.pgp
Description: OpenPGP digital signature


Re: [PATCH] x86_64: uninline TASK_SIZE

2019-04-22 Thread Linus Torvalds
On Sun, Apr 21, 2019 at 9:06 AM Alexey Dobriyan  wrote:
>
> TASK_SIZE macro is quite deceptive: it looks like a constant but in fact
> compiles to 50+ bytes.

Honestly, if you are interested in improving TASK_SIZE, I'd really
like to see you try to go even further than this.

TASK_SIZE _used_ to just be a fixed constant, which is why it has that
name and why the usage patterns are what they are.

But since that isn't true any more, I'd much rather fix the _name_,
and I'd much rather fix the nasty complex hidden behavior, rather than
just keep the name and keep the behavior, but turning it from an
inline macro to a function call.

And as Ingo points out, we should be able to just make it a field of
its own, instead of that complex dance of TIF_ADDR32 etc.

However, I think it would be better if that field would be in "struct
mm_struct" instead of Ingo's suggestion of the thread. Because while
it's currently a per-thread flag, I think it is only set by execve(),
so it always ends up being the same per-mm. No?

Also, we could/should just make the existing *users* of TASK_SIZE know
that it's no longer a simple constant, so all those functions that use
it many times could just do

unsigned long task_size = TASK_SIZE;

rather than re-compute it multiple times like they do now.

In fact, making it a function call in many ways makes things *worse*,
although maybe we could at least mark the function "pure" so that gcc
would be able to cache the end result. But that would actually be
wrong for the sequences that maybe do change the thread flags, so I
hate that idea too.

Much better to just cache it explicitly in the cases where we see that
it's currently generating bad code.

Linus


linux-next: build failure after merge of the at91 tree

2019-04-22 Thread Stephen Rothwell
Hi all,

After merging the at91 tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

arch/arm/mach-at91/pm_suspend.S:17:10: fatal error: pm_data-offsets.h: No such 
file or directory
 #include "pm_data-offsets.h"
  ^~~

Caused by commit

  ab690fa1eb4b ("ARM: at91: move platform-specific asm-offset.h to 
arch/arm/mach-at91")

I used the version of the at91 tree from next-20190418 for today.

-- 
Cheers,
Stephen Rothwell


pgpLj_cOYkO2P.pgp
Description: OpenPGP digital signature


[PATCH] mm: thp: fix false negative of shmem vma's THP eligibility

2019-04-22 Thread Yang Shi
The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
vma") introduced THPeligible bit for processes' smaps. But, when checking
the eligibility for shmem vma, __transparent_hugepage_enabled() is
called to override the result from shmem_huge_enabled().  It may result
in the anonymous vma's THP flag override shmem's.  For example, running a
simple test which create THP for shmem, but with anonymous THP disabled,
when reading the process's smaps, it may show:

7fc92ec0-7fc92f00 rw-s  00:14 27764 /dev/shm/test
Size:   4096 kB
...
[snip]
...
ShmemPmdMapped: 4096 kB
...
[snip]
...
THPeligible:0

And, /proc/meminfo does show THP allocated and PMD mapped too:

ShmemHugePages: 4096 kB
ShmemPmdMapped: 4096 kB

This doesn't make too much sense.  The anonymous THP flag should not
intervene shmem THP.  Calling shmem_huge_enabled() with checking
MMF_DISABLE_THP sounds good enough.  And, we could skip stack and
dax vma check since we already checked if the vma is shmem already.

Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: David Rientjes 
Cc: Kirill A. Shutemov 
Signed-off-by: Yang Shi 
---
 mm/huge_memory.c | 4 ++--
 mm/shmem.c   | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 165ea46..5881e82 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -67,8 +67,8 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
 {
if (vma_is_anonymous(vma))
return __transparent_hugepage_enabled(vma);
-   if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
-   return __transparent_hugepage_enabled(vma);
+   if (vma_is_shmem(vma))
+   return shmem_huge_enabled(vma);
 
return false;
 }
diff --git a/mm/shmem.c b/mm/shmem.c
index 2275a0f..be15e9b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3873,6 +3873,8 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
loff_t i_size;
pgoff_t off;
 
+   if (test_bit(MMF_DISABLE_THP, >vm_mm->flags))
+   return false;
if (shmem_huge == SHMEM_HUGE_FORCE)
return true;
if (shmem_huge == SHMEM_HUGE_DENY)
-- 
1.8.3.1



Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer

2019-04-22 Thread Roman Gushchin
On Sat, Apr 20, 2019 at 12:58:38PM +0200, Oleg Nesterov wrote:
> On 04/19, Roman Gushchin wrote:
> >
> > > > >
> > > > > wake_up_interruptible() ?
> > > >
> > > > Wait_up_interruptible() is supposed to work with a workqueue,
> > > > but here there is nothing like this. Probably, I didn't understand your 
> > > > idea.
> > > > Can you, please, elaborate a bit more?
> > >
> > > Not sure I understand... We need to wake up the task if it sleeps in
> > > do_freezer_trap(), right? do_freezer_trap() uses TASK_INTERRUPTIBLE, so
> > > why can't wake_up_interruptible() == __wake_up(TASK_INTERRUPTIBLE) work?
> >
> > Right, but __wake_up is supposed to wake threads blocked on a waitqueue:
> 
> Ugh sorry ;) of course I meant wake_up_state(task, TASK_INTERRUPTIBLE).

Agh, then it makes total sense to me. I'll master a follow-up patch.

> 
> > > > > > +   if (unlikely(cgroup_task_frozen(current))) {
> > > > > > spin_unlock_irq(>siglock);
> > > > > > +   cgroup_leave_frozen(true);
> > > > > > goto relock;
> > > > > > }
> > > > >
> > > > > afaics cgroup_leave_frozen(false) makes more sense here.
> > > >
> > > > Why? I don't see any reasons why the task should remain in the frozen
> > > > state after this point.
> > >
> > > But cgroup_leave_frozen(false) will equally clear ->frozen if 
> > > !CGRP_FREEZE ?
> > > OTOH, if CGRP_FREEZE is set again, why do we need to clear ->frozen?
> >
> > Hm, it might work too, but I'm not sure I like it more. IMO, the best option
> > is to have a single cgroup_leave_frozen(true) in signal.c, it's just 
> > simpler.
> > If a user changed the desired state of cgroup twice, there is no need to 
> > avoid
> > state transitions. Or maybe I don't see it yet.
> 
> Then why do we need cgroup_leave_frozen(false) in wait_for_vfork_done() ? How
> does it differ from get_signal() ?

We need it because sleeping in vfork is a special state which we want to
account as frozen. And if the parent process wakes up while the cgroup is frozen
(because of the child death, for example), we want to push it into the "proper"
frozen state without changing the state of the cgroup.

> 
> If nothing else. Suppose that wait_for_vfork_done() calls leave(false) and 
> this
> races with freezer, CGRP_FREEZE is already set but JOBCTL_TRAP_FREEZE is not.
> 
> This sets TIF_SIGPENDING to ensure the task won't return to user mode, thus it
> calls get_signal().
> 
> get_signal() doesn't see JOBCTL_TRAP_FREEZE, it notices ->frozen == T and does
> cgroup_leave_frozen(true) which clears ->frozen.
> 
> Then the task calls dequeue_signal(), clears TIF_SIGPENDING and returns to 
> user
> mode?

Got it, a good catch! So if the freezer races with vfork() completion, we might
have a spurious frozen->unfrozen->frozen transition of the cgroup state.

Switching to cgroup_leave_frozen(false) seems to solve it, but I'm slightly
concerned that we're basically putting the task in a busy loop between
the setting CGRP_FREEZE and setting TRAP_FREEZE. Do you think it's ok?
I wonder if there are better solutions.

Thank you!


Re: [PATCH] x86_64: uninline TASK_SIZE

2019-04-22 Thread Alexey Dobriyan
On Mon, Apr 22, 2019 at 07:30:40AM -0700, Andy Lutomirski wrote:
> 
> 
> > On Apr 22, 2019, at 3:34 AM, Ingo Molnar  wrote:
> > 
> > 
> > * Alexey Dobriyan  wrote:
> > 
> > +++ b/arch/x86/kernel/task_size_64.c
> > @@ -0,0 +1,9 @@
> > +#include 
> > +#include 
> > +#include 
> > +
> > +unsigned long _task_size(void)
> > +{
> > +return test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET :
>  TASK_SIZE_MAX;
> > +}
> > +EXPORT_SYMBOL(_task_size);
>  
>  Good idea - but instead of adding yet another compilation unit, why not
>  
>  stick _task_size() into arch/x86/kernel/process_64.c, which is the 
>  canonical place for process management related arch functions?
>  
>  Thanks,
>  
> Ingo
> >>> 
> >>> Better yet... since TIF_ADDR32 isn't something that changes randomly, 
> >>> perhaps this should be a separate variable?
> >> 
> >> Maybe. I only thought about putting every 32-bit related flag under 
> >> CONFIG_COMPAT to further eradicate bloat (and force everyone else to 
> >> keep an eye on it, ha-ha).
> > 
> > Basically TIF_ADDR32 is only set for a task if set_personality_ia32() is 
> > called, which function is called in the following circumstances:
> > 
> > - arch/x86/ia32/ia32_aout.c:load_aout_binary()
> > 
> >   This is in exec(), when a new binary is loaded for the current task, 
> >   via search_binary_handler() and exec_binprm(). Ordering is 
> >   synchronous, AFAICS there can be no race between TASK_SIZE users and 
> >   the set_personality_ia32() call which is always for the current task.
> > 
> > - in COMPAT_SET_PERSONALITY(), which through macro detours ends up being 
> >   in SET_PERSONALITY2(), which is used in fs/compat_binfmt_elf.c's 
> >   load_elf_binary(), used in a similar fashion in exec() as the AOUT 
> >   case above. One particular macro detour of note is that 
> >   fs/compat_binfmt_elf.c #includes fs/binfmt_elf.c and re-defines the 
> >   personality setting method to map to set_personality_ia32().
> > 
> > When set_personality_ia32() is called then TIF_ADDR32 is set 
> > unconditionally, without any Kconfig variations.
> > 
> > TIF_ADDR32 is cleared:
> > 
> > - In set_personality_64bit(), when a 64-bit binary is loaded via 
> >   fs/binfmt_elf.c.
> > 
> > - It also defaults to clear in the init task, which is inherited by the 
> >   initial kernel threads and any user-space task they might end up 
> >   executing.
> > 
> > So the conclusion is that IMO we can safely put TASK_SIZE into a new 
> > thread_info()->task_size field, and:
> > 
> > - change ->task_size to the 32-bit address space in 
> >   set_personality_ia32()
> > 
> > - change ->task_size to teh 64-bit address space in the init task and in 
> >   set_personality_64bit().
> > 
> > This should cover it I think, unless I missed something.
> > 
> 
> Are there really enough TASK_SIZE users to justify any of this?

Saving 2KB on a defconfig is quite a lot.
If put into thread_info, ->task_size can be pulled using just RAX which
in turn allows to do

asm volatile "call %P" ...  "=a" (...)

saving even more space.

But it is late here so don't quote me.


Re: [PATCH] x86_64: uninline TASK_SIZE

2019-04-22 Thread Alexey Dobriyan
On Mon, Apr 22, 2019 at 12:34:49PM +0200, Ingo Molnar wrote:

> When set_personality_ia32() is called then TIF_ADDR32 is set 
> unconditionally, without any Kconfig variations.

Indeed.

personality(PER_LINUX32) = 0 (PER_LINUX)

I only wasted about half an evening ifdefing TIF_ flags.
Thanks for saving a lot of time!


Re: [RFC PATCH 60/62] orangefs: make use of ->free_inode()

2019-04-22 Thread Linus Torvalds
On Mon, Apr 22, 2019 at 2:14 PM Mike Marshall  wrote:
>
> I applied your "new inode method: ->free_inode()" and
> "orangefs: make use of ->free_inode()" to our pagecache
> branch (I hope to get it pulled in the next merge window).

Actually, please don't.

Exactly because this needs that common vfs patch, I'd really prefer to
get it all through Al's tree, rather than have individual filesystems
apply their own copies of the common infrastructure commit, and then
apply their changes on top of that.

I can easily handle any trivial conflicts this causes, so that's not a
reason to have each filesystem do it either.

So if this is at the top of your tree, can you just "git reset" it
away and I'll get all the filesystems (and the common infrastructure
commit) all together from Al.

 Linus


Re: [PATCH v20 15/28] x86/sgx: Add the Linux SGX Enclave Driver

2019-04-22 Thread Sean Christopherson
+Cc Jethro

On Wed, Apr 17, 2019 at 01:39:25PM +0300, Jarkko Sakkinen wrote:
> Intel Software Guard eXtensions (SGX) is a set of CPU instructions that
> can be used by applications to set aside private regions of code and
> data. The code outside the enclave is disallowed to access the memory
> inside the enclave by the CPU access control.
> 
> This commit adds the Linux SGX Enclave Driver that provides an ioctl API
> to manage enclaves. The address range for an enclave, commonly referred
> as ELRANGE in the documentation (e.g. Intel SDM), is reserved with
> mmap() against /dev/sgx/enclave. After that a set ioctls is used to
> build the enclave to the ELRANGE.
> 
> Signed-off-by: Jarkko Sakkinen 
> Co-developed-by: Sean Christopherson 
> Signed-off-by: Sean Christopherson 
> Co-developed-by: Serge Ayoun 
> Signed-off-by: Serge Ayoun 
> Co-developed-by: Shay Katz-zamir 
> Signed-off-by: Shay Katz-zamir 
> Co-developed-by: Suresh Siddha 
> Signed-off-by: Suresh Siddha 
> ---

...

> +#ifdef CONFIG_ACPI
> +static struct acpi_device_id sgx_device_ids[] = {
> + {"INT0E0C", 0},
> + {"", 0},
> +};
> +MODULE_DEVICE_TABLE(acpi, sgx_device_ids);
> +#endif
> +
> +static struct platform_driver sgx_drv = {
> + .probe = sgx_drv_probe,
> + .remove = sgx_drv_remove,
> + .driver = {
> + .name   = "sgx",
> + .acpi_match_table   = ACPI_PTR(sgx_device_ids),
> + },
> +};

Where do we stand on removing the ACPI and platform_driver dependencies?
Can we get rid of them sooner rather than later?

Now that the core SGX code is approaching stability, I'd like to start
sending RFCs for the EPC virtualization and KVM bits to hash out that side
of things.  The ACPI crud is the last chunk of code that would require
non-trivial changes to the core SGX code for the proposed virtualization
implementation.  I'd strongly prefer to get it out of the way before
sending the KVM RFCs.

> +static int __init sgx_drv_subsys_init(void)
> +{
> + int ret;
> +
> + ret = bus_register(_bus_type);
> + if (ret)
> + return ret;
> +
> + ret = alloc_chrdev_region(_devt, 0, SGX_DRV_NR_DEVICES, "sgx");
> + if (ret < 0) {
> + bus_unregister(_bus_type);
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static void sgx_drv_subsys_exit(void)
> +{
> + bus_unregister(_bus_type);
> + unregister_chrdev_region(sgx_devt, SGX_DRV_NR_DEVICES);
> +}
> +
> +static int __init sgx_drv_init(void)
> +{
> + int ret;
> +
> + ret = sgx_drv_subsys_init();
> + if (ret)
> + return ret;
> +
> + ret = platform_driver_register(_drv);
> + if (ret)
> + sgx_drv_subsys_exit();
> +
> + return ret;
> +}
> +module_init(sgx_drv_init);
> +
> +static void __exit sgx_drv_exit(void)
> +{
> + platform_driver_unregister(_drv);
> + sgx_drv_subsys_exit();
> +}
> +module_exit(sgx_drv_exit);


Re: [PATCH 3/6] y2038: linux: Provide __clock_settime64 implementation

2019-04-22 Thread Arnd Bergmann
On Mon, Apr 22, 2019 at 11:07 AM Stepan Golosunov
 wrote:
> 20.04.2019 в 13:21:12 +0200 Lukasz Majewski написал:
> Is it? The kernel (5.1-rc6) code looks to me like
>
> /* Zero out the padding for 32 bit systems or in compat mode */
> if (false && false)
> kts.tv_nsec &= 0xUL;
>
> in 32-bit kernels. And like
>
> if (false && true)
> kts.tv_nsec &= 0xUL;
>
> for COMPAT syscalls in 64-bit kernels.
>
> It should probably be changed into
>
> if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall())
> kts.tv_nsec &= 0xUL;
>
> (Or into something like
>
> if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall() && 
> !COMPAT_USE_64BIT_TIME)
> kts.tv_nsec &= 0xUL;
>
> if x32 should retain 64-bit tv_nsec.)

I think the problem is that at some point CONFIG_64BIT_TIME was
meant to be enabled on both 32-bit and 64-bit kernels, but the
definition got changed along  the way.

We probably just want

if (in_compat_syscall() )
   kts.tv_nsec &= 0xUL;

here, which would then truncate the nanoseconds for all compat
mode including x32. For native mode, we don't need to truncate
it, since timespec64 has a 32-bit 'tv_nsec' field in the kernel.

> > However, I would prefer not to pass random data
> > to the kernel, and hence I do clear it up explicitly in glibc.
>
> If the kernel does not ignore padding on its own, then zeroing it out
> is required everywhere timespec is passed to kernel, including via
> code not known to glibc. (Does anyone promise that there won't be any
> ioctls that accept timespec, for example?) That seems to be
> error-prone (and might requre copying larger structes).
>
> On the other hand, if kernel 5.1+ ignores padding as intended there is
> no need to create additional copy of structs in glibc code that calls
> into clock_settime64 (or into timer_settime64 that accepts larger
> struct, for example).

The intention is that the kernel ignores the padding. If you find
another place in the kernel that forget that, we should fix it.

> > > And, hmm, is CONFIG_64BIT_TIME enabled anywhere?
>
> I guess that the remaining CONFIG_64BIT_TIME in kernel should be
> replaced with CONFIG_COMPAT_32BIT_TIME or removed.

We should remove CONFIG_64BIT_TIME. CONFIG_COMPAT_32BIT_TIME
is still needed to identify architectures that don't have it, in
particular riscv32.

   Arnd


[PATCH] drivers: hv: Add a module description line

2019-04-22 Thread Joseph Salisbury
Signed-off-by: Joseph Salisbury 
---
 drivers/hv/vmbus_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index aa25f3bcbdea..1cb9408b0d40 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2174,6 +2174,7 @@ static void __exit vmbus_exit(void)
 
 
 MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Microsoft Hyper-V VMBus Driver");
 
 subsys_initcall(hv_acpi_init);
 module_exit(vmbus_exit);
-- 
2.17.1



[PATCH] drivers: input: serio: Add a module desription

2019-04-22 Thread Joseph Salisbury
Signed-off-by: Joseph Salisbury 
---
 drivers/input/serio/hyperv-keyboard.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/input/serio/hyperv-keyboard.c 
b/drivers/input/serio/hyperv-keyboard.c
index a8b9be3e28db..7935e52b5435 100644
--- a/drivers/input/serio/hyperv-keyboard.c
+++ b/drivers/input/serio/hyperv-keyboard.c
@@ -440,5 +440,7 @@ static void __exit hv_kbd_exit(void)
 }
 
 MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Microsoft Hyper-V Synthetic Keyboard Driver");
+
 module_init(hv_kbd_init);
 module_exit(hv_kbd_exit);
-- 
2.17.1



Re: [PATCH v20 09/28] x86/sgx: Add ENCLS architectural error codes

2019-04-22 Thread Sean Christopherson
On Wed, Apr 17, 2019 at 01:39:19PM +0300, Jarkko Sakkinen wrote:
> The SGX architecture defines an extensive set of error codes that are
> used by ENCL{S,U,V} instructions to provide software with (somewhat)
> precise error information.  Though they are architectural, define the
> known error codes in a separate file from sgx_arch.h so that they can
> be exposed to userspace.  For some ENCLS leafs, e.g. EINIT, returning
> the exact error code on failure can enable userspace to make informed
> decisions when an operation fails.
> 
> Signed-off-by: Jarkko Sakkinen 
> Co-developed-by: Sean Christopherson 
> Signed-off-by: Sean Christopherson 
> ---

Your SOB needs to be last.  Several other patches have the same issue,
e.g. 10-13, 15 and 17.

See commit 24a2bb90741b ("docs: Clarify the usage and sign-off requirements
for Co-developed-by") in branch docs-next of git://git.lwn.net/linux.git.


linux-next: Fixes tag needs some work in the device-mapper tree

2019-04-22 Thread Stephen Rothwell
Hi all,

In commit

  e28adc3bf34e ("dm cache metadata: Fix loading discard bitset")

Fixes tag

  Fixes: ae4a46a1f6 ("dm cache metadata: use bitset cursor api to load discard 
bitset")

has these problem(s):

  - SHA1 should be at least 12 digits long
Can be fixed by setting core.abbrev to 12 (or more) or (for git v2.11
or later) just making sure it is not set (or set to "auto").

-- 
Cheers,
Stephen Rothwell


pgpQIdkz36aiM.pgp
Description: OpenPGP digital signature


[PATCH] drivers: hid: Add a module description line

2019-04-22 Thread Joseph Salisbury
Signed-off-by: Joseph Salisbury 
---
 drivers/hid/hid-hyperv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
index 704049e62d58..d3311d714d35 100644
--- a/drivers/hid/hid-hyperv.c
+++ b/drivers/hid/hid-hyperv.c
@@ -614,5 +614,7 @@ static void __exit mousevsc_exit(void)
 }
 
 MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Microsoft Hyper-V Synthetic HID Driver");
+
 module_init(mousevsc_init);
 module_exit(mousevsc_exit);
-- 
2.17.1



linux-next: Fixes tag needs some work in the crypto tree

2019-04-22 Thread Stephen Rothwell
Hi all,

In commit

  f5a2aeb8b254 ("crypto: ccp - Do not free psp_master when PLATFORM_INIT fails")

Fixes tag

  Fixes: 200664d5237f ("crypto: ccp: Add SEV support")

has these problem(s):

  - Subject does not match target commit subject
Just use
git log -1 --format='Fixes: %h ("%s")'

-- 
Cheers,
Stephen Rothwell


pgpXE37NnXTUl.pgp
Description: OpenPGP digital signature


linux-next: Fixes tag needs some work in the v4l-dvb tree

2019-04-22 Thread Stephen Rothwell
Hi Mauro,

In commit

  dad7e270ba71 ("media: vivid: use vfree() instead of kfree() for 
dev->bitmap_cap")

Fixes tag

  Fixes: ef834f7836ec0 ("[media] vivid: add the video capture and output

has these problem(s):

  - Subject has leading but no trailing parentheses
  - Subject has leading but no trailing quotes

Please do not split Fixes tags across more than one line.  Also, keep
all the tags on one group.

-- 
Cheers,
Stephen Rothwell


pgpQdMZTedvjo.pgp
Description: OpenPGP digital signature


Re: [RFC PATCH 60/62] orangefs: make use of ->free_inode()

2019-04-22 Thread Mike Marshall
Hi Al...

I applied your "new inode method: ->free_inode()" and
"orangefs: make use of ->free_inode()" to our pagecache
branch (I hope to get it pulled in the next merge window).

I had to modify your "orangefs: make use of ->free_inode()" a
little, since Martin Brandenburg had already modified orangefs_i_callback
for the pagecache branch. I don't know for sure that my modifications
aren't nonsense :-) but I do know for sure that everything runs
with no xfstests regressions. I'll see what Martin thinks about
my changes...

orangefs_destroy_inode is pretty much a no-op now, so I guess
we'll get rid of it...

Acked-by: Mike Marshall 

Thanks...

-Mike


[root@vm1 linux]# git diff HEAD^ fs/orangefs/super.c
diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 8fa30c13b7ed..f82ac9373443 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -125,20 +125,18 @@ static struct inode *orangefs_alloc_inode(struct
super_block *sb)
return _inode->vfs_inode;
 }

-static void orangefs_i_callback(struct rcu_head *head)
+static void orangefs_free_inode(struct inode *inode)
 {
-   struct inode *inode = container_of(head, struct inode, i_rcu);
-   struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
struct orangefs_cached_xattr *cx;
struct hlist_node *tmp;
int i;

-   hash_for_each_safe(orangefs_inode->xattr_cache, i, tmp, cx, node) {
+   hash_for_each_safe(ORANGEFS_I(inode)->xattr_cache, i, tmp, cx, node) {
hlist_del(>node);
kfree(cx);
}

-   kmem_cache_free(orangefs_inode_cache, orangefs_inode);
+   kmem_cache_free(orangefs_inode_cache, ORANGEFS_I(inode));
 }

 static void orangefs_destroy_inode(struct inode *inode)
@@ -148,8 +146,6 @@ static void orangefs_destroy_inode(struct inode *inode)
gossip_debug(GOSSIP_SUPER_DEBUG,
"%s: deallocated %p destroying inode %pU\n",
__func__, orangefs_inode, get_khandle_from_ino(inode));
-
-   call_rcu(>i_rcu, orangefs_i_callback);
 }

 static int orangefs_write_inode(struct inode *inode,
@@ -316,6 +312,7 @@ void fsid_key_table_finalize(void)

 static const struct super_operations orangefs_s_ops = {
.alloc_inode = orangefs_alloc_inode,
+   .free_inode = orangefs_free_inode,
.destroy_inode = orangefs_destroy_inode,
.write_inode = orangefs_write_inode,
.drop_inode = generic_delete_inode,

On Tue, Apr 16, 2019 at 1:55 PM Al Viro  wrote:
>
> From: Al Viro 
>
> Signed-off-by: Al Viro 
> ---
>  fs/orangefs/super.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
> index dfaee90d30bd..3784f7e8b603 100644
> --- a/fs/orangefs/super.c
> +++ b/fs/orangefs/super.c
> @@ -124,11 +124,9 @@ static struct inode *orangefs_alloc_inode(struct 
> super_block *sb)
> return _inode->vfs_inode;
>  }
>
> -static void orangefs_i_callback(struct rcu_head *head)
> +static void orangefs_free_inode(struct inode *inode)
>  {
> -   struct inode *inode = container_of(head, struct inode, i_rcu);
> -   struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
> -   kmem_cache_free(orangefs_inode_cache, orangefs_inode);
> +   kmem_cache_free(orangefs_inode_cache, ORANGEFS_I(inode));
>  }
>
>  static void orangefs_destroy_inode(struct inode *inode)
> @@ -138,8 +136,6 @@ static void orangefs_destroy_inode(struct inode *inode)
> gossip_debug(GOSSIP_SUPER_DEBUG,
> "%s: deallocated %p destroying inode %pU\n",
> __func__, orangefs_inode, 
> get_khandle_from_ino(inode));
> -
> -   call_rcu(>i_rcu, orangefs_i_callback);
>  }
>
>  /*
> @@ -299,6 +295,7 @@ void fsid_key_table_finalize(void)
>
>  static const struct super_operations orangefs_s_ops = {
> .alloc_inode = orangefs_alloc_inode,
> +   .free_inode = orangefs_free_inode,
> .destroy_inode = orangefs_destroy_inode,
> .drop_inode = generic_delete_inode,
> .statfs = orangefs_statfs,
> --
> 2.11.0
>


Re: [PATCH v2] PCI/LINK: bw_notification: Do not leave interrupt handler NULL

2019-04-22 Thread Alex Williamson
On Fri, 19 Apr 2019 15:08:27 -0600
Alex Williamson  wrote:

> On Mon, 25 Mar 2019 17:25:02 -0500
> Bjorn Helgaas  wrote:
> 
> > On Fri, Mar 22, 2019 at 07:36:51PM -0500, Alexandru Gagniuc wrote:  
> > > A threaded IRQ with a NULL handler does not work with level-triggered
> > > interrupts. request_threaded_irq() will return an error:
> > > 
> > >   genirq: Threaded irq requested with handler=NULL and !ONESHOT for irq 16
> > >   pcie_bw_notification: probe of :00:1b.0:pcie010 failed with error 
> > > -22
> > > 
> > > For level interrupts we need to silence the interrupt before exiting
> > > the IRQ handler, so just clear the PCI_EXP_LNKSTA_LBMS bit there.
> > > 
> > > Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth 
> > > notification")
> > > Reported-by: Linus Torvalds 
> > > Signed-off-by: Alexandru Gagniuc 
> > 
> > Applied with the following subject line to for-linus for v5.1, thanks!
> > 
> >   PCI/LINK: Supply IRQ handler so level-triggered IRQs are acked  
> 
> That made it a little tricky to track down this thread.  I get a
> regression bisected back to this when trying to do vfio device
> assignment.  I haven't dug further than the bisection, but I assume bus
> resets are triggering this link bandwidth notifier code and nobody
> thinks it's their interrupt:

I'm not sure what to do with this, I think it bisects back to commit
3e82a7f9031f simply because the interrupt was failing to register prior
to that, so the bandwidth notifier code was never activated (how was
this tested?).  When I assign a GPU to a VM, the VM is manipulating the
device to change the link speed, I would have thought this would
trigger the autonomous bandwidth notification, but I can clearly see
BWMgmt+ ABWMgmt- in lspci.  The root port shows:

  Interrupt: pin A routed to IRQ 25

And the BW notifier interrupt is registered here:

25: 0 ... 0 IR-IO-APIC8-fasteoi   PCIe BW notif

There's no interrupt count for any CPU on this vector.  For all I know,
this IRQ routing has never been exercised and could be broken in the
BIOS, resulting in the a random spurious IRQ victim.  There seems to be
no good way to disable this driver other than manually unbinding root
ports via sysfs.  That's not a great solution.  The system is an Intel
X79 based workstation.  Suggestions for further debugging? Thanks,

Alex

> [  119.910738] irq 16: nobody cared (try booting with the "irqpoll" option)
> [  119.917455] CPU: 18 PID: 0 Comm: swapper/18 Not tainted 5.1.0-rc1+ #29
> [  119.923998] Hardware name: Hewlett-Packard HP Z820 Workstation/158B, BIOS 
> J63 v03.69 03/25/2014
> [  119.932715] Call Trace:
> [  119.935169]  
> [  119.937200]  dump_stack+0x46/0x60
> [  119.940534]  __report_bad_irq+0x37/0xae
> [  119.944380]  note_interrupt.cold.9+0xa/0x69
> [  119.948580]  handle_irq_event_percpu+0x6a/0x80
> [  119.953037]  handle_irq_event+0x3d/0x5a
> [  119.956887]  handle_fasteoi_irq+0x8b/0x140
> [  119.961003]  handle_irq+0xbf/0x100
> [  119.964420]  do_IRQ+0x49/0xd0
> [  119.967398]  common_interrupt+0xf/0xf
> [  119.971074]  
> [  119.973190] RIP: 0010:cpuidle_enter_state+0xb4/0x460
> [  119.978167] Code: 24 0f 1f 44 00 00 31 ff e8 69 bf a3 ff 80 7c 24 13 00 74 
> 12 9c 58 f6 c4 02 0f 85 7d 03 00 00 31 ff e8 60 cf a9 ff fb 45 85 e4 <0f> 88 
> ae 02 00 00 49 63 cc 4c 8b 3c 24 4c 2b 7c 24 08 48 8d 04 49
> [  119.996967] RSP: 0018:b6740330fe98 EFLAGS: 0202 ORIG_RAX: 
> ffda
> [  120.004549] RAX: 9dbfc19a1d80 RBX: 82d2c940 RCX: 
> 001f
> [  120.011700] RDX: 001beb3c9b05 RSI: 315975dc RDI: 
> 
> [  120.018845] RBP: 9dbfc19acc00 R08: 0002 R09: 
> 00021640
> [  120.025990] R10: 027ae2689456 R11: 9dbfc19a0e64 R12: 
> 0004
> [  120.033146] R13: 82d2cad8 R14: 0004 R15: 
> 
> [  120.040303]  ? cpuidle_enter_state+0x97/0x460
> [  120.044679]  do_idle+0x1f1/0x230
> [  120.047918]  cpu_startup_entry+0x19/0x20
> [  120.051856]  start_secondary+0x172/0x1c0
> [  120.055796]  secondary_startup_64+0xb6/0xc0
> [  120.059993] handlers:
> [  120.062283] [<54c59383>] usb_hcd_irq
> [  120.066563] Disabling IRQ #16
> [  122.885627] irq 16: nobody cared (try booting with the "irqpoll" option)
> [  122.892326] CPU: 18 PID: 0 Comm: swapper/18 Not tainted 5.1.0-rc1+ #29
> [  122.898847] Hardware name: Hewlett-Packard HP Z820 Workstation/158B, BIOS 
> J63 v03.69 03/25/2014
> [  122.907532] Call Trace:
> [  122.909985]  
> [  122.912009]  dump_stack+0x46/0x60
> [  122.915325]  __report_bad_irq+0x37/0xae
> [  122.919159]  note_interrupt.cold.9+0xa/0x69
> [  122.923338]  handle_irq_event_percpu+0x6a/0x80
> [  122.927781]  handle_irq_event+0x3d/0x5a
> [  122.931630]  handle_fasteoi_irq+0x8b/0x140
> [  122.935730]  handle_irq+0xbf/0x100
> [  122.939137]  do_IRQ+0x49/0xd0
> [  122.942108]  common_interrupt+0xf/0xf
> [  122.945772]  
> [  122.947881] RIP: 0010:cpuidle_enter_state+0xb4/0x460

  1   2   3   4   5   6   >