date:20180604

Re: [PATCH 4.16 00/47] 4.16.14-stable review

2018-06-04 Thread Guenter Roeck


On 06/03/2018 11:58 PM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.16.14 release.
There are 47 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Jun  6 06:55:34 UTC 2018.
Anything received after that time might be too late.



Build results:
total: 136 pass: 136 fail: 0
Qemu test results:
total: 141 pass: 141 fail: 0

Details are available at http://kerneltests.org/builders/.

Guenter

Re: [PATCH 4.16 00/47] 4.16.14-stable review

2018-06-04 Thread Guenter Roeck


On 06/03/2018 11:58 PM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.16.14 release.
There are 47 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Jun  6 06:55:34 UTC 2018.
Anything received after that time might be too late.



Build results:
total: 136 pass: 136 fail: 0
Qemu test results:
total: 141 pass: 141 fail: 0

Details are available at http://kerneltests.org/builders/.

Guenter

Re: [PATCH 4.14 00/52] 4.14.48-stable review

2018-06-04 Thread Guenter Roeck


On 06/03/2018 11:57 PM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.14.48 release.
There are 52 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Jun  6 06:55:52 UTC 2018.
Anything received after that time might be too late.




Build results:
total: 139 pass: 139 fail: 0
Qemu test results:
total: 143 pass: 143 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

Re: [PATCH 4.14 00/52] 4.14.48-stable review

2018-06-04 Thread Guenter Roeck


On 06/03/2018 11:57 PM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.14.48 release.
There are 52 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Jun  6 06:55:52 UTC 2018.
Anything received after that time might be too late.




Build results:
total: 139 pass: 139 fail: 0
Qemu test results:
total: 143 pass: 143 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

[PATCH v5 4/4] clk: bd71837: Add driver for BD71837 PMIC clock

2018-06-04 Thread Matti Vaittinen

Support BD71837 gateable 32768 Hz clock.

Signed-off-by: Matti Vaittinen 
---
 drivers/clk/Kconfig   |   7 +++
 drivers/clk/Makefile  |   1 +
 drivers/clk/clk-bd71837.c | 146 ++
 3 files changed, 154 insertions(+)
 create mode 100644 drivers/clk/clk-bd71837.c

diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 41492e980ef4..e693496f202a 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -279,6 +279,13 @@ config COMMON_CLK_STM32H7
---help---
  Support for stm32h7 SoC family clocks
 
+config COMMON_CLK_BD71837
+   tristate "Clock driver for ROHM BD71837 PMIC MFD"
+   depends on MFD_BD71837
+   help
+ This driver supports ROHM BD71837 PMIC clock.
+
+
 source "drivers/clk/bcm/Kconfig"
 source "drivers/clk/hisilicon/Kconfig"
 source "drivers/clk/imgtec/Kconfig"
diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile
index de6d06ac790b..8393c4af7d5a 100644
--- a/drivers/clk/Makefile
+++ b/drivers/clk/Makefile
@@ -21,6 +21,7 @@ endif
 obj-$(CONFIG_MACH_ASM9260) += clk-asm9260.o
 obj-$(CONFIG_COMMON_CLK_AXI_CLKGEN)+= clk-axi-clkgen.o
 obj-$(CONFIG_ARCH_AXXIA)   += clk-axm5516.o
+obj-$(CONFIG_COMMON_CLK_BD71837)   += clk-bd71837.o
 obj-$(CONFIG_COMMON_CLK_CDCE706)   += clk-cdce706.o
 obj-$(CONFIG_COMMON_CLK_CDCE925)   += clk-cdce925.o
 obj-$(CONFIG_ARCH_CLPS711X)+= clk-clps711x.o
diff --git a/drivers/clk/clk-bd71837.c b/drivers/clk/clk-bd71837.c
new file mode 100644
index ..5ba6c05c5a98
--- /dev/null
+++ b/drivers/clk/clk-bd71837.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 ROHM Semiconductors
+// bd71837.c  -- ROHM BD71837MWV clock driver
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+struct bd71837_clk {
+   struct clk_hw hw;
+   uint8_t reg;
+   uint8_t mask;
+   unsigned long rate;
+   struct platform_device *pdev;
+   struct bd71837 *mfd;
+};
+
+static int bd71837_clk_set(struct clk_hw *hw, int status)
+{
+   struct bd71837_clk *c = container_of(hw, struct bd71837_clk, hw);
+
+   return bd71837_update_bits(c->mfd, c->reg, c->mask, status);
+}
+
+static void bd71837_clk_disable(struct clk_hw *hw)
+{
+   int rv;
+   struct bd71837_clk *c = container_of(hw, struct bd71837_clk, hw);
+
+   rv = bd71837_clk_set(hw, 0);
+   if (rv)
+   dev_dbg(>pdev->dev, "Failed to disable 32K clk (%d)\n", rv);
+}
+
+static int bd71837_clk_enable(struct clk_hw *hw)
+{
+   return bd71837_clk_set(hw, 1);
+}
+
+static int bd71837_clk_is_enabled(struct clk_hw *hw)
+{
+   struct bd71837_clk *c = container_of(hw, struct bd71837_clk, hw);
+
+   return c->mask & bd71837_reg_read(c->mfd, c->reg);
+}
+
+static unsigned long bd71837_clk_recalc_rate(struct clk_hw *hw,
+unsigned long parent_rate)
+{
+   struct bd71837_clk *c = container_of(hw, struct bd71837_clk, hw);
+
+   return c->rate;
+}
+
+static const struct clk_ops bd71837_clk_ops = {
+   .recalc_rate = _clk_recalc_rate,
+   .prepare = _clk_enable,
+   .unprepare = _clk_disable,
+   .is_prepared = _clk_is_enabled,
+};
+
+static int bd71837_clk_probe(struct platform_device *pdev)
+{
+   struct bd71837_clk *c;
+   int rval = -ENOMEM;
+   struct bd71837 *mfd = dev_get_drvdata(pdev->dev.parent);
+   struct clk_init_data init = {
+   .name = "bd71837-32k-out",
+   .ops = _clk_ops,
+   };
+
+   c = devm_kzalloc(>dev, sizeof(*c), GFP_KERNEL);
+   if (!c)
+   goto err_out;
+
+   c->reg = BD71837_REG_OUT32K;
+   c->mask = BD71837_OUT32K_EN;
+   c->rate = BD71837_CLK_RATE;
+   c->mfd = mfd;
+   c->pdev = pdev;
+
+   of_property_read_string_index(pdev->dev.parent->of_node,
+ "clock-output-names", 0,
+ );
+
+   c->hw.init = 
+
+   rval = devm_clk_hw_register(>dev, >hw);
+   if (rval) {
+   dev_err(>dev, "failed to register 32K clk");
+   goto err_out;
+   }
+
+   if (pdev->dev.parent->of_node) {
+   rval = of_clk_add_hw_provider(pdev->dev.parent->of_node,
+of_clk_hw_simple_get,
+>hw);
+   if (rval) {
+   dev_err(>dev, "adding clk provider failed\n");
+   goto err_out;
+   }
+   }
+
+   rval = clk_hw_register_clkdev(>hw, init.name, NULL);
+   if (rval) {
+   dev_err(>dev, "failed to register clkdev for bd71837");
+   goto err_clean_provider;
+   }
+
+   platform_set_drvdata(pdev, c);
+
+   return 0;
+
+err_clean_provider:
+

[PATCH v5 3/4] clk: bd71837: Devicetree bindings for ROHM BD71837 PMIC

2018-06-04 Thread Matti Vaittinen

Document devicetree bindings for ROHM BD71837 PMIC clock output.

Signed-off-by: Matti Vaittinen 
---
 .../bindings/clock/rohm,bd71837-clock.txt  | 38 ++
 1 file changed, 38 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt

diff --git a/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt 
b/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
new file mode 100644
index ..771acfe34114
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
@@ -0,0 +1,38 @@
+ROHM BD71837 Power Management Integrated Circuit clock bindings
+
+This is a part of device tree bindings of ROHM BD71837 multi-function
+device. See generic BD71837 MFD bindings at:
+   Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
+
+BD71837 contains one 32,768 KHz clock output which can be enabled and
+disabled via i2c.
+
+Following properties should be present in main device node of the MFD chip.
+
+Required properties:
+- clock-frequency  : Should be 32768
+- #clock-cells : Should be 0
+
+Optional properties:
+- clock-output-names   : Should contain name for output clock.
+
+Example:
+
+/* MFD node */
+
+pmic: pmic@4b {
+   compatible = "rohm,bd71837";
+   /* ... */
+   #clock-cells = <0>;
+   clock-frequency  = <32768>;
+   /* ... */
+};
+
+/* Clock consumer node */
+
+foo@0 {
+   compatible = "bar,foo";
+   /* ... */
+   clock-names = "my-clock";
+   clocks = <>;
+};
-- 
2.14.3

[PATCH v5 2/4] mfd: bd71837: Devicetree bindings for ROHM BD71837 PMIC

2018-06-04 Thread Matti Vaittinen

Document devicetree bindings for ROHM BD71837 PMIC MFD.

Signed-off-by: Matti Vaittinen 
---
 .../devicetree/bindings/mfd/rohm,bd71837-pmic.txt  | 76 ++
 1 file changed, 76 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt

diff --git a/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt 
b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
new file mode 100644
index ..ac2b66181f17
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
@@ -0,0 +1,76 @@
+* ROHM BD71837 Power Management Integrated Circuit bindings
+
+BD71837MWV is a programmable Power Management IC for powering single-core,
+dual-core, and quad-core SoC’s such as NXP-i.MX 8M. It is optimized for
+low BOM cost and compact solution footprint. It integrates 8 Buck
+egulators and 7 LDO’s to provide all the power rails required by the SoC and
+the commonly used peripherals.
+
+Required properties:
+ - compatible  : Should be "rohm,bd71837".
+ - reg : I2C slave address.
+ - interrupt-parent: Phandle to the parent interrupt controller.
+ - interrupts  : The interrupt line the device is connected to.
+ - regulators: : List of child nodes that specify the regulators
+ Please see ../regulator/rohm,bd71837-regulator.txt
+ - clock:  : Please see ../clock/rohm,bd71837-clock.txt
+
+Optional properties:
+ - interrupt-controller: Marks the device node as an interrupt 
controller.
+ BD71837MWV can report different power state change
+ events to other drivers. Different events can be seen
+ as separate BD71837 domain interrupts.
+ The BD71837 driver only provides the infrastructure
+ for the IRQs. The users should write own driver to
+ convert the IRQ into the event they wish. The IRQ can
+ be used with the standard
+ request_irq/enable_irq/disable_irq API inside the
+ kernel.
+ - #interrupt-cells: The number of cells to describe an IRQ should be 1.
+   The value in cell is the IRQ number.
+   Meaningfull numbers are:
+ 0 => PMIC_STBY_REQ level change
+ 1 => PMIC_ON_REQ level change
+ 2 => WDOG_B level change
+ 3 => Power Button level change
+ 4 => Power Button Long Push
+ 5 => Power Button Short Push
+ 6 => SWRESET register is written 1
+
+Example:
+
+   pmic: pmic@4b {
+   compatible = "rohm,bd71837";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x4b>;
+   interrupt-parent = <>;
+   interrupts = <29 GPIO_ACTIVE_LOW>;
+   interrupt-names = "irq";
+   #interrupt-cells = <1>;
+   interrupt-controller;
+   #clock-cells = <0>;
+   clock-frequency = <32768>;
+
+   regulators {
+   buck1: BUCK1 {
+   regulator-name = "buck1";
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <130>;
+   regulator-boot-on;
+   regulator-ramp-delay = <1250>;
+   };
+   /* ... */
+   };
+   };
+
+   /* driver consuming PMIC interrupts */
+
+   my-power-button: power-button {
+   compatible = "foo";
+   interrupt-parent = <>;
+   interrupts = <3>, <4>, <5>;
+   interrupt-names = "pwrb", "pwrb-l", "pwrb-s";
+   /* ... */
+   };
+
-- 
2.14.3

[PATCH v5 4/4] clk: bd71837: Add driver for BD71837 PMIC clock

2018-06-04 Thread Matti Vaittinen

Support BD71837 gateable 32768 Hz clock.

Signed-off-by: Matti Vaittinen 
---
 drivers/clk/Kconfig   |   7 +++
 drivers/clk/Makefile  |   1 +
 drivers/clk/clk-bd71837.c | 146 ++
 3 files changed, 154 insertions(+)
 create mode 100644 drivers/clk/clk-bd71837.c

diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 41492e980ef4..e693496f202a 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -279,6 +279,13 @@ config COMMON_CLK_STM32H7
---help---
  Support for stm32h7 SoC family clocks
 
+config COMMON_CLK_BD71837
+   tristate "Clock driver for ROHM BD71837 PMIC MFD"
+   depends on MFD_BD71837
+   help
+ This driver supports ROHM BD71837 PMIC clock.
+
+
 source "drivers/clk/bcm/Kconfig"
 source "drivers/clk/hisilicon/Kconfig"
 source "drivers/clk/imgtec/Kconfig"
diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile
index de6d06ac790b..8393c4af7d5a 100644
--- a/drivers/clk/Makefile
+++ b/drivers/clk/Makefile
@@ -21,6 +21,7 @@ endif
 obj-$(CONFIG_MACH_ASM9260) += clk-asm9260.o
 obj-$(CONFIG_COMMON_CLK_AXI_CLKGEN)+= clk-axi-clkgen.o
 obj-$(CONFIG_ARCH_AXXIA)   += clk-axm5516.o
+obj-$(CONFIG_COMMON_CLK_BD71837)   += clk-bd71837.o
 obj-$(CONFIG_COMMON_CLK_CDCE706)   += clk-cdce706.o
 obj-$(CONFIG_COMMON_CLK_CDCE925)   += clk-cdce925.o
 obj-$(CONFIG_ARCH_CLPS711X)+= clk-clps711x.o
diff --git a/drivers/clk/clk-bd71837.c b/drivers/clk/clk-bd71837.c
new file mode 100644
index ..5ba6c05c5a98
--- /dev/null
+++ b/drivers/clk/clk-bd71837.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 ROHM Semiconductors
+// bd71837.c  -- ROHM BD71837MWV clock driver
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+struct bd71837_clk {
+   struct clk_hw hw;
+   uint8_t reg;
+   uint8_t mask;
+   unsigned long rate;
+   struct platform_device *pdev;
+   struct bd71837 *mfd;
+};
+
+static int bd71837_clk_set(struct clk_hw *hw, int status)
+{
+   struct bd71837_clk *c = container_of(hw, struct bd71837_clk, hw);
+
+   return bd71837_update_bits(c->mfd, c->reg, c->mask, status);
+}
+
+static void bd71837_clk_disable(struct clk_hw *hw)
+{
+   int rv;
+   struct bd71837_clk *c = container_of(hw, struct bd71837_clk, hw);
+
+   rv = bd71837_clk_set(hw, 0);
+   if (rv)
+   dev_dbg(>pdev->dev, "Failed to disable 32K clk (%d)\n", rv);
+}
+
+static int bd71837_clk_enable(struct clk_hw *hw)
+{
+   return bd71837_clk_set(hw, 1);
+}
+
+static int bd71837_clk_is_enabled(struct clk_hw *hw)
+{
+   struct bd71837_clk *c = container_of(hw, struct bd71837_clk, hw);
+
+   return c->mask & bd71837_reg_read(c->mfd, c->reg);
+}
+
+static unsigned long bd71837_clk_recalc_rate(struct clk_hw *hw,
+unsigned long parent_rate)
+{
+   struct bd71837_clk *c = container_of(hw, struct bd71837_clk, hw);
+
+   return c->rate;
+}
+
+static const struct clk_ops bd71837_clk_ops = {
+   .recalc_rate = _clk_recalc_rate,
+   .prepare = _clk_enable,
+   .unprepare = _clk_disable,
+   .is_prepared = _clk_is_enabled,
+};
+
+static int bd71837_clk_probe(struct platform_device *pdev)
+{
+   struct bd71837_clk *c;
+   int rval = -ENOMEM;
+   struct bd71837 *mfd = dev_get_drvdata(pdev->dev.parent);
+   struct clk_init_data init = {
+   .name = "bd71837-32k-out",
+   .ops = _clk_ops,
+   };
+
+   c = devm_kzalloc(>dev, sizeof(*c), GFP_KERNEL);
+   if (!c)
+   goto err_out;
+
+   c->reg = BD71837_REG_OUT32K;
+   c->mask = BD71837_OUT32K_EN;
+   c->rate = BD71837_CLK_RATE;
+   c->mfd = mfd;
+   c->pdev = pdev;
+
+   of_property_read_string_index(pdev->dev.parent->of_node,
+ "clock-output-names", 0,
+ );
+
+   c->hw.init = 
+
+   rval = devm_clk_hw_register(>dev, >hw);
+   if (rval) {
+   dev_err(>dev, "failed to register 32K clk");
+   goto err_out;
+   }
+
+   if (pdev->dev.parent->of_node) {
+   rval = of_clk_add_hw_provider(pdev->dev.parent->of_node,
+of_clk_hw_simple_get,
+>hw);
+   if (rval) {
+   dev_err(>dev, "adding clk provider failed\n");
+   goto err_out;
+   }
+   }
+
+   rval = clk_hw_register_clkdev(>hw, init.name, NULL);
+   if (rval) {
+   dev_err(>dev, "failed to register clkdev for bd71837");
+   goto err_clean_provider;
+   }
+
+   platform_set_drvdata(pdev, c);
+
+   return 0;
+
+err_clean_provider:
+

[PATCH v5 3/4] clk: bd71837: Devicetree bindings for ROHM BD71837 PMIC

2018-06-04 Thread Matti Vaittinen

Document devicetree bindings for ROHM BD71837 PMIC clock output.

Signed-off-by: Matti Vaittinen 
---
 .../bindings/clock/rohm,bd71837-clock.txt  | 38 ++
 1 file changed, 38 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt

diff --git a/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt 
b/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
new file mode 100644
index ..771acfe34114
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
@@ -0,0 +1,38 @@
+ROHM BD71837 Power Management Integrated Circuit clock bindings
+
+This is a part of device tree bindings of ROHM BD71837 multi-function
+device. See generic BD71837 MFD bindings at:
+   Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
+
+BD71837 contains one 32,768 KHz clock output which can be enabled and
+disabled via i2c.
+
+Following properties should be present in main device node of the MFD chip.
+
+Required properties:
+- clock-frequency  : Should be 32768
+- #clock-cells : Should be 0
+
+Optional properties:
+- clock-output-names   : Should contain name for output clock.
+
+Example:
+
+/* MFD node */
+
+pmic: pmic@4b {
+   compatible = "rohm,bd71837";
+   /* ... */
+   #clock-cells = <0>;
+   clock-frequency  = <32768>;
+   /* ... */
+};
+
+/* Clock consumer node */
+
+foo@0 {
+   compatible = "bar,foo";
+   /* ... */
+   clock-names = "my-clock";
+   clocks = <>;
+};
-- 
2.14.3

[PATCH v5 2/4] mfd: bd71837: Devicetree bindings for ROHM BD71837 PMIC

2018-06-04 Thread Matti Vaittinen

Document devicetree bindings for ROHM BD71837 PMIC MFD.

Signed-off-by: Matti Vaittinen 
---
 .../devicetree/bindings/mfd/rohm,bd71837-pmic.txt  | 76 ++
 1 file changed, 76 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt

diff --git a/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt 
b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
new file mode 100644
index ..ac2b66181f17
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
@@ -0,0 +1,76 @@
+* ROHM BD71837 Power Management Integrated Circuit bindings
+
+BD71837MWV is a programmable Power Management IC for powering single-core,
+dual-core, and quad-core SoC’s such as NXP-i.MX 8M. It is optimized for
+low BOM cost and compact solution footprint. It integrates 8 Buck
+egulators and 7 LDO’s to provide all the power rails required by the SoC and
+the commonly used peripherals.
+
+Required properties:
+ - compatible  : Should be "rohm,bd71837".
+ - reg : I2C slave address.
+ - interrupt-parent: Phandle to the parent interrupt controller.
+ - interrupts  : The interrupt line the device is connected to.
+ - regulators: : List of child nodes that specify the regulators
+ Please see ../regulator/rohm,bd71837-regulator.txt
+ - clock:  : Please see ../clock/rohm,bd71837-clock.txt
+
+Optional properties:
+ - interrupt-controller: Marks the device node as an interrupt 
controller.
+ BD71837MWV can report different power state change
+ events to other drivers. Different events can be seen
+ as separate BD71837 domain interrupts.
+ The BD71837 driver only provides the infrastructure
+ for the IRQs. The users should write own driver to
+ convert the IRQ into the event they wish. The IRQ can
+ be used with the standard
+ request_irq/enable_irq/disable_irq API inside the
+ kernel.
+ - #interrupt-cells: The number of cells to describe an IRQ should be 1.
+   The value in cell is the IRQ number.
+   Meaningfull numbers are:
+ 0 => PMIC_STBY_REQ level change
+ 1 => PMIC_ON_REQ level change
+ 2 => WDOG_B level change
+ 3 => Power Button level change
+ 4 => Power Button Long Push
+ 5 => Power Button Short Push
+ 6 => SWRESET register is written 1
+
+Example:
+
+   pmic: pmic@4b {
+   compatible = "rohm,bd71837";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x4b>;
+   interrupt-parent = <>;
+   interrupts = <29 GPIO_ACTIVE_LOW>;
+   interrupt-names = "irq";
+   #interrupt-cells = <1>;
+   interrupt-controller;
+   #clock-cells = <0>;
+   clock-frequency = <32768>;
+
+   regulators {
+   buck1: BUCK1 {
+   regulator-name = "buck1";
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <130>;
+   regulator-boot-on;
+   regulator-ramp-delay = <1250>;
+   };
+   /* ... */
+   };
+   };
+
+   /* driver consuming PMIC interrupts */
+
+   my-power-button: power-button {
+   compatible = "foo";
+   interrupt-parent = <>;
+   interrupts = <3>, <4>, <5>;
+   interrupt-names = "pwrb", "pwrb-l", "pwrb-s";
+   /* ... */
+   };
+
-- 
2.14.3

Re: [PATCH 1/2] HID: multitouch: report MT_TOOL_PALM for non-confident touches

2018-06-04 Thread Benjamin Tissoires

On Fri, Jun 1, 2018 at 8:43 PM, Dmitry Torokhov
 wrote:
> On Fri, Jun 01, 2018 at 04:16:09PM +0200, Benjamin Tissoires wrote:
>> On Fri, Aug 11, 2017 at 2:44 AM, Dmitry Torokhov
>>  wrote:
>> > According to Microsoft specification [1] for Precision Touchpads (and
>> > Touchscreens) the devices use "confidence" reports to signal accidental
>> > touches, or contacts that are "too large to be a finger". Instead of
>> > simply marking contact inactive in this case (which causes issues if
>> > contact was originally proper and we lost confidence in it later, as
>> > this results in accidental clicks, drags, etc), let's report such
>> > contacts as MT_TOOL_PALM and let userspace decide what to do.
>> > Additionally, let's report contact size for such touches as maximum
>> > allowed for major/minor, which should help userspace that is not yet
>> > aware of MT_TOOL_PALM to still perform palm rejection.
>> >
>> > An additional complication, is that some firmwares do not report
>> > non-confident touches as active. To cope with this we delay release of
>> > such contact (i.e. if contact was active we first report it as still
>> > active MT+TOOL_PALM and then synthesize the release event in a separate
>> > frame).
>>
>> I am not sure I agree with this part. The spec says that "Once a
>> device has determined that a contact is unintentional, it should clear
>> the confidence bit for that contact report and all subsequent
>> reports."
>> So in theory the spec says that if a touch has been detected as a
>> palm, the flow of events should not stop (tested on the PTP of the
>> Dell XPS 9360).
>>
>> However, I interpret a firmware that send (confidence 1, tip switch 1)
>> and then (confidence 0, tip switch 0) a simple release, and the
>> confidence bit should not be relayed.
>
> This unfortunately leads to false clicks: you start with finger, so
> confidence is 1, then you transition the same touch to palm (use your
> thumb and "roll" your hand until heel of it comes into contact with the
> screen). The firmware reports "no-confidence" and "release" in the same
> report and userspace seeing release does not pay attention to confidence
> (i.e. it does exactly "simple release" logic) and this results in UI
> interpreting this as a click. With splitting no-confidence
> (MT_TOOL_PALM) and release event into separate frames we help userspace
> to recognize that the contact should be discarded.

After further thoughts, I would consider this to be a firmware bug,
and not how the firmware is supposed to be reporting palm.
For the precision touchpads, the spec says that the device "should
clear the confidence bit for that contact report and all subsequent
reports.". And it is how the Dell device I have here reports palms.
The firmware is not supposed to cut the event stream.

There is a test for that:
https://docs.microsoft.com/en-us/previous-versions/windows/hardware/hck/dn456905%28v%3dvs.85%29
which tells me that I am right here for PTP.

The touchscreen spec is blurrier however.

>
>>
>> Do you have any precise example of reports where you need that feature?
>
> It was observed on Pixelbooks which use Wacom digitizers IIRC.

Pixelbooks + Wacom means that it was likely a touchscreen. I am right
guessing the device did not went through Microsoft certification
process?

I am in favor of splitting the patch in 2. One for the generic
processing of confidence bit, and one for this spurious release. For
the spurious release, I'm more in favor of explicitly quirking the
devices in need of such quirk.

If you agree, I'll rebase your patch on top of my series as rebasing
my series on top of yours will take more effort.

I am trying to be cautious in the generic path because I want to merge
the cleanest multitouch implementation in hid-core/hid-input, and
leave all the quirks in hid-multitouch for the devices in need.

Cheers,
Benjamin

>
> Thanks.
>
> --
> Dmitry

[PATCH v5 1/4] mfd: bd71837: mfd driver for ROHM BD71837 PMIC

2018-06-04 Thread Matti Vaittinen

ROHM BD71837 PMIC MFD driver providing interrupts and support
for two subsystems:
- clk
- Regulators

Signed-off-by: Matti Vaittinen 
---
 drivers/mfd/Kconfig |  13 ++
 drivers/mfd/Makefile|   1 +
 drivers/mfd/bd71837.c   | 223 ++
 include/linux/mfd/bd71837.h | 288 
 4 files changed, 525 insertions(+)
 create mode 100644 drivers/mfd/bd71837.c
 create mode 100644 include/linux/mfd/bd71837.h

diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index b860eb5aa194..7aa05fc9ed8e 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -1787,6 +1787,19 @@ config MFD_STW481X
  in various ST Microelectronics and ST-Ericsson embedded
  Nomadik series.
 
+config MFD_BD71837
+   bool "BD71837 Power Management chip"
+   depends on I2C=y
+   depends on OF
+   select REGMAP_I2C
+   select REGMAP_IRQ
+   select MFD_CORE
+   help
+ Select this option to get support for the ROHM BD71837
+ Power Management chips. BD71837 is designed to power processors like
+ NXP i.MX8. It contains 8 BUCK outputs and 7 LDOs, voltage monitoring
+ and emergency shut down as well as 32,768KHz clock output.
+
 config MFD_STM32_LPTIMER
tristate "Support for STM32 Low-Power Timer"
depends on (ARCH_STM32 && OF) || COMPILE_TEST
diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index e9fd20dba18d..09dc9eb3782c 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -227,4 +227,5 @@ obj-$(CONFIG_MFD_STM32_TIMERS)  += stm32-timers.o
 obj-$(CONFIG_MFD_MXS_LRADC) += mxs-lradc.o
 obj-$(CONFIG_MFD_SC27XX_PMIC)  += sprd-sc27xx-spi.o
 obj-$(CONFIG_RAVE_SP_CORE) += rave-sp.o
+obj-$(CONFIG_MFD_BD71837)  += bd71837.o
 
diff --git a/drivers/mfd/bd71837.c b/drivers/mfd/bd71837.c
new file mode 100644
index ..93930f1f2893
--- /dev/null
+++ b/drivers/mfd/bd71837.c
@@ -0,0 +1,223 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 ROHM Semiconductors
+// bd71837.c -- ROHM BD71837MWV mfd driver
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* bd71837 multi function cells */
+static struct mfd_cell bd71837_mfd_cells[] = {
+   {
+   .name = "bd71837-clk",
+   .of_compatible = "rohm,bd71837-clk",
+   }, {
+   .name = "bd71837-pmic",
+   },
+};
+
+static const struct regmap_irq bd71837_irqs[] = {
+   REGMAP_IRQ_REG(BD71837_INT_SWRST, 0, BD71837_INT_SWRST_MASK),
+   REGMAP_IRQ_REG(BD71837_INT_PWRBTN_S, 0, BD71837_INT_PWRBTN_S_MASK),
+   REGMAP_IRQ_REG(BD71837_INT_PWRBTN_L, 0, BD71837_INT_PWRBTN_L_MASK),
+   REGMAP_IRQ_REG(BD71837_INT_PWRBTN, 0, BD71837_INT_PWRBTN_MASK),
+   REGMAP_IRQ_REG(BD71837_INT_WDOG, 0, BD71837_INT_WDOG_MASK),
+   REGMAP_IRQ_REG(BD71837_INT_ON_REQ, 0, BD71837_INT_ON_REQ_MASK),
+   REGMAP_IRQ_REG(BD71837_INT_STBY_REQ, 0, BD71837_INT_STBY_REQ_MASK),
+};
+
+static struct regmap_irq_chip bd71837_irq_chip = {
+   .name = "bd71837-irq",
+   .irqs = bd71837_irqs,
+   .num_irqs = ARRAY_SIZE(bd71837_irqs),
+   .num_regs = 1,
+   .irq_reg_stride = 1,
+   .status_base = BD71837_REG_IRQ,
+   .mask_base = BD71837_REG_MIRQ,
+   .init_ack_masked = true,
+   .mask_invert = false,
+};
+
+static int bd71837_irq_exit(struct bd71837 *bd71837)
+{
+   if (bd71837->chip_irq > 0)
+   regmap_del_irq_chip(bd71837->chip_irq, bd71837->irq_data);
+   return 0;
+}
+
+static const struct regmap_range pmic_status_range = {
+   .range_min = BD71837_REG_IRQ,
+   .range_max = BD71837_REG_POW_STATE,
+};
+
+static const struct regmap_access_table volatile_regs = {
+   .yes_ranges = _status_range,
+   .n_yes_ranges = 1,
+};
+
+static const struct regmap_config bd71837_regmap_config = {
+   .reg_bits = 8,
+   .val_bits = 8,
+   .volatile_table = _regs,
+   .max_register = BD71837_MAX_REGISTER - 1,
+   .cache_type = REGCACHE_RBTREE,
+};
+
+#ifdef CONFIG_OF
+static const struct of_device_id bd71837_of_match[] = {
+   { .compatible = "rohm,bd71837", .data = (void *)0},
+   { },
+};
+MODULE_DEVICE_TABLE(of, bd71837_of_match);
+
+static int bd71837_parse_dt(struct i2c_client *client, struct bd71837_board 
**b)
+{
+   struct device_node *np = client->dev.of_node;
+   struct bd71837_board *board_info;
+   unsigned int prop;
+   int r;
+   int rv = -ENOMEM;
+
+   board_info = devm_kzalloc(>dev, sizeof(*board_info),
+   GFP_KERNEL);
+   if (!board_info)
+   goto err_out;
+
+   if (client->irq) {
+   dev_dbg(>dev, "Got irq %d\n", client->irq);
+   board_info->gpio_intr = client->irq;
+   } else {
+   dev_err(>dev, "no pmic intr pin available\n");
+   rv

[PATCH v5 0/4] mfd/regulator/clk: bd71837: ROHM BD71837 PMIC driver

2018-06-04 Thread Matti Vaittinen

Patch series adding support for ROHM BD71837 PMIC.

BD71837 is a programmable Power Management IC for powering single-core,
dual-core, and quad-core SoC’s such as NXP-i.MX 8M. It is optimized for
low BOM cost and compact solution footprint. It integrates 8 buck
regulators and 7 LDO’s to provide all the power rails required by the
SoC and the commonly used peripherals.

The driver aims to not limit the usage of PMIC. Thus the buck and LDO
naming is generic and not tied to any specific purposes. However there
is following limitations which make it mostly suitable for use cases
where the processor where PMIC driver is running is powered by the PMIC:

- The PMIC is not re-initialized if it resets. PMIC may reset as a
  result of voltage monitoring (over/under voltage) or due to reset
  request. Driver is only initializing PMIC at probe. This is not
  problem as long as processor controlling PMIC is powered by PMIC.

- The PMIC internal state machine is ignored by driver. Driver assumes
  the PMIC is wired so that it is always in "run" state when controlled
  by the driver.

Changelog v5
- dropped regulator patches which are already applied to Mark's tree
Based on feedback from Rob Herring and Stephen Boyd
- mfd bindings: explain why this can be interrupt-controller
- mfd bindings: describe interrupts better
- mfd bindings: require one cell interrupt specifier
- mfd bindings: use generic node names in example
- mfd driver:   ack masked interrupt once at init
- clk bindings: use generic node names in example
- clk driver:   use devm
- clk driver:   use of_clk_add_hw_provider
- clk driver:   change severity of print and how prints are emitted at
probe error path.
- clk driver:   dropped forward declared functions
- clk configs:  drop unnecessary dependencies
- clk driver:   other styling issues
- mfd/clk DT:   drop clk node.

Changelog v4
- remove mutex from regulator state check as core prevents simultaneous
  accesses
- allow voltage change for bucks 1 to 4 when regulator is enabled
- fix indentiation problems
- properly correct SPDX comments

Changelog v3
- kill unused variable
- kill unused definitions
- use REGMAP_IRQ_REG

Changelog v2
Based on feedback from Mark Brown
- Squashed code and buildfile changes to same patch
- Fixed some styling issues
- Changed SPDX comments to CPP style
- Error out if voltage is changed when regulator is enabled instead of
  Disabling the regulator for duration of change
- Use devm_regulator_register
- Remove compatible usage from regulators - use parent dev for config
- Add a note about using regulator-boot-on for BUCK6 and 7
- fixed warnings from kbuild test robot

patch 1: 
MFD driver and definitions bringing interrupt support and
enabling clk and regulator subsystems.
Patches 2 and 3

This patch series is based on for-mfd-next

---

Matti Vaittinen (4):
  mfd: bd71837: mfd driver for ROHM BD71837 PMIC
  mfd: bd71837: Devicetree bindings for ROHM BD71837 PMIC
  clk: bd71837: Devicetree bindings for ROHM BD71837 PMIC
  clk: bd71837: Add driver for BD71837 PMIC clock

 .../bindings/clock/rohm,bd71837-clock.txt  |  38 +++
 .../devicetree/bindings/mfd/rohm,bd71837-pmic.txt  |  76 ++
 drivers/clk/Kconfig|   7 +
 drivers/clk/Makefile   |   1 +
 drivers/clk/clk-bd71837.c  | 146 +++
 drivers/mfd/Kconfig|  13 +
 drivers/mfd/Makefile   |   1 +
 drivers/mfd/bd71837.c  | 223 
 include/linux/mfd/bd71837.h| 288 +
 9 files changed, 793 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
 create mode 100644 Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
 create mode 100644 drivers/clk/clk-bd71837.c
 create mode 100644 drivers/mfd/bd71837.c
 create mode 100644 include/linux/mfd/bd71837.h

-- 
2.14.3

Re: [PATCH 1/2] HID: multitouch: report MT_TOOL_PALM for non-confident touches

2018-06-04 Thread Benjamin Tissoires

On Fri, Jun 1, 2018 at 8:43 PM, Dmitry Torokhov
 wrote:
> On Fri, Jun 01, 2018 at 04:16:09PM +0200, Benjamin Tissoires wrote:
>> On Fri, Aug 11, 2017 at 2:44 AM, Dmitry Torokhov
>>  wrote:
>> > According to Microsoft specification [1] for Precision Touchpads (and
>> > Touchscreens) the devices use "confidence" reports to signal accidental
>> > touches, or contacts that are "too large to be a finger". Instead of
>> > simply marking contact inactive in this case (which causes issues if
>> > contact was originally proper and we lost confidence in it later, as
>> > this results in accidental clicks, drags, etc), let's report such
>> > contacts as MT_TOOL_PALM and let userspace decide what to do.
>> > Additionally, let's report contact size for such touches as maximum
>> > allowed for major/minor, which should help userspace that is not yet
>> > aware of MT_TOOL_PALM to still perform palm rejection.
>> >
>> > An additional complication, is that some firmwares do not report
>> > non-confident touches as active. To cope with this we delay release of
>> > such contact (i.e. if contact was active we first report it as still
>> > active MT+TOOL_PALM and then synthesize the release event in a separate
>> > frame).
>>
>> I am not sure I agree with this part. The spec says that "Once a
>> device has determined that a contact is unintentional, it should clear
>> the confidence bit for that contact report and all subsequent
>> reports."
>> So in theory the spec says that if a touch has been detected as a
>> palm, the flow of events should not stop (tested on the PTP of the
>> Dell XPS 9360).
>>
>> However, I interpret a firmware that send (confidence 1, tip switch 1)
>> and then (confidence 0, tip switch 0) a simple release, and the
>> confidence bit should not be relayed.
>
> This unfortunately leads to false clicks: you start with finger, so
> confidence is 1, then you transition the same touch to palm (use your
> thumb and "roll" your hand until heel of it comes into contact with the
> screen). The firmware reports "no-confidence" and "release" in the same
> report and userspace seeing release does not pay attention to confidence
> (i.e. it does exactly "simple release" logic) and this results in UI
> interpreting this as a click. With splitting no-confidence
> (MT_TOOL_PALM) and release event into separate frames we help userspace
> to recognize that the contact should be discarded.

After further thoughts, I would consider this to be a firmware bug,
and not how the firmware is supposed to be reporting palm.
For the precision touchpads, the spec says that the device "should
clear the confidence bit for that contact report and all subsequent
reports.". And it is how the Dell device I have here reports palms.
The firmware is not supposed to cut the event stream.

There is a test for that:
https://docs.microsoft.com/en-us/previous-versions/windows/hardware/hck/dn456905%28v%3dvs.85%29
which tells me that I am right here for PTP.

The touchscreen spec is blurrier however.

>
>>
>> Do you have any precise example of reports where you need that feature?
>
> It was observed on Pixelbooks which use Wacom digitizers IIRC.

Pixelbooks + Wacom means that it was likely a touchscreen. I am right
guessing the device did not went through Microsoft certification
process?

I am in favor of splitting the patch in 2. One for the generic
processing of confidence bit, and one for this spurious release. For
the spurious release, I'm more in favor of explicitly quirking the
devices in need of such quirk.

If you agree, I'll rebase your patch on top of my series as rebasing
my series on top of yours will take more effort.

I am trying to be cautious in the generic path because I want to merge
the cleanest multitouch implementation in hid-core/hid-input, and
leave all the quirks in hid-multitouch for the devices in need.

Cheers,
Benjamin

>
> Thanks.
>
> --
> Dmitry

[PATCH v5 1/4] mfd: bd71837: mfd driver for ROHM BD71837 PMIC

2018-06-04 Thread Matti Vaittinen

ROHM BD71837 PMIC MFD driver providing interrupts and support
for two subsystems:
- clk
- Regulators

Signed-off-by: Matti Vaittinen 
---
 drivers/mfd/Kconfig |  13 ++
 drivers/mfd/Makefile|   1 +
 drivers/mfd/bd71837.c   | 223 ++
 include/linux/mfd/bd71837.h | 288 
 4 files changed, 525 insertions(+)
 create mode 100644 drivers/mfd/bd71837.c
 create mode 100644 include/linux/mfd/bd71837.h

diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index b860eb5aa194..7aa05fc9ed8e 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -1787,6 +1787,19 @@ config MFD_STW481X
  in various ST Microelectronics and ST-Ericsson embedded
  Nomadik series.
 
+config MFD_BD71837
+   bool "BD71837 Power Management chip"
+   depends on I2C=y
+   depends on OF
+   select REGMAP_I2C
+   select REGMAP_IRQ
+   select MFD_CORE
+   help
+ Select this option to get support for the ROHM BD71837
+ Power Management chips. BD71837 is designed to power processors like
+ NXP i.MX8. It contains 8 BUCK outputs and 7 LDOs, voltage monitoring
+ and emergency shut down as well as 32,768KHz clock output.
+
 config MFD_STM32_LPTIMER
tristate "Support for STM32 Low-Power Timer"
depends on (ARCH_STM32 && OF) || COMPILE_TEST
diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index e9fd20dba18d..09dc9eb3782c 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -227,4 +227,5 @@ obj-$(CONFIG_MFD_STM32_TIMERS)  += stm32-timers.o
 obj-$(CONFIG_MFD_MXS_LRADC) += mxs-lradc.o
 obj-$(CONFIG_MFD_SC27XX_PMIC)  += sprd-sc27xx-spi.o
 obj-$(CONFIG_RAVE_SP_CORE) += rave-sp.o
+obj-$(CONFIG_MFD_BD71837)  += bd71837.o
 
diff --git a/drivers/mfd/bd71837.c b/drivers/mfd/bd71837.c
new file mode 100644
index ..93930f1f2893
--- /dev/null
+++ b/drivers/mfd/bd71837.c
@@ -0,0 +1,223 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 ROHM Semiconductors
+// bd71837.c -- ROHM BD71837MWV mfd driver
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* bd71837 multi function cells */
+static struct mfd_cell bd71837_mfd_cells[] = {
+   {
+   .name = "bd71837-clk",
+   .of_compatible = "rohm,bd71837-clk",
+   }, {
+   .name = "bd71837-pmic",
+   },
+};
+
+static const struct regmap_irq bd71837_irqs[] = {
+   REGMAP_IRQ_REG(BD71837_INT_SWRST, 0, BD71837_INT_SWRST_MASK),
+   REGMAP_IRQ_REG(BD71837_INT_PWRBTN_S, 0, BD71837_INT_PWRBTN_S_MASK),
+   REGMAP_IRQ_REG(BD71837_INT_PWRBTN_L, 0, BD71837_INT_PWRBTN_L_MASK),
+   REGMAP_IRQ_REG(BD71837_INT_PWRBTN, 0, BD71837_INT_PWRBTN_MASK),
+   REGMAP_IRQ_REG(BD71837_INT_WDOG, 0, BD71837_INT_WDOG_MASK),
+   REGMAP_IRQ_REG(BD71837_INT_ON_REQ, 0, BD71837_INT_ON_REQ_MASK),
+   REGMAP_IRQ_REG(BD71837_INT_STBY_REQ, 0, BD71837_INT_STBY_REQ_MASK),
+};
+
+static struct regmap_irq_chip bd71837_irq_chip = {
+   .name = "bd71837-irq",
+   .irqs = bd71837_irqs,
+   .num_irqs = ARRAY_SIZE(bd71837_irqs),
+   .num_regs = 1,
+   .irq_reg_stride = 1,
+   .status_base = BD71837_REG_IRQ,
+   .mask_base = BD71837_REG_MIRQ,
+   .init_ack_masked = true,
+   .mask_invert = false,
+};
+
+static int bd71837_irq_exit(struct bd71837 *bd71837)
+{
+   if (bd71837->chip_irq > 0)
+   regmap_del_irq_chip(bd71837->chip_irq, bd71837->irq_data);
+   return 0;
+}
+
+static const struct regmap_range pmic_status_range = {
+   .range_min = BD71837_REG_IRQ,
+   .range_max = BD71837_REG_POW_STATE,
+};
+
+static const struct regmap_access_table volatile_regs = {
+   .yes_ranges = _status_range,
+   .n_yes_ranges = 1,
+};
+
+static const struct regmap_config bd71837_regmap_config = {
+   .reg_bits = 8,
+   .val_bits = 8,
+   .volatile_table = _regs,
+   .max_register = BD71837_MAX_REGISTER - 1,
+   .cache_type = REGCACHE_RBTREE,
+};
+
+#ifdef CONFIG_OF
+static const struct of_device_id bd71837_of_match[] = {
+   { .compatible = "rohm,bd71837", .data = (void *)0},
+   { },
+};
+MODULE_DEVICE_TABLE(of, bd71837_of_match);
+
+static int bd71837_parse_dt(struct i2c_client *client, struct bd71837_board 
**b)
+{
+   struct device_node *np = client->dev.of_node;
+   struct bd71837_board *board_info;
+   unsigned int prop;
+   int r;
+   int rv = -ENOMEM;
+
+   board_info = devm_kzalloc(>dev, sizeof(*board_info),
+   GFP_KERNEL);
+   if (!board_info)
+   goto err_out;
+
+   if (client->irq) {
+   dev_dbg(>dev, "Got irq %d\n", client->irq);
+   board_info->gpio_intr = client->irq;
+   } else {
+   dev_err(>dev, "no pmic intr pin available\n");
+   rv

[PATCH v5 0/4] mfd/regulator/clk: bd71837: ROHM BD71837 PMIC driver

2018-06-04 Thread Matti Vaittinen

Patch series adding support for ROHM BD71837 PMIC.

BD71837 is a programmable Power Management IC for powering single-core,
dual-core, and quad-core SoC’s such as NXP-i.MX 8M. It is optimized for
low BOM cost and compact solution footprint. It integrates 8 buck
regulators and 7 LDO’s to provide all the power rails required by the
SoC and the commonly used peripherals.

The driver aims to not limit the usage of PMIC. Thus the buck and LDO
naming is generic and not tied to any specific purposes. However there
is following limitations which make it mostly suitable for use cases
where the processor where PMIC driver is running is powered by the PMIC:

- The PMIC is not re-initialized if it resets. PMIC may reset as a
  result of voltage monitoring (over/under voltage) or due to reset
  request. Driver is only initializing PMIC at probe. This is not
  problem as long as processor controlling PMIC is powered by PMIC.

- The PMIC internal state machine is ignored by driver. Driver assumes
  the PMIC is wired so that it is always in "run" state when controlled
  by the driver.

Changelog v5
- dropped regulator patches which are already applied to Mark's tree
Based on feedback from Rob Herring and Stephen Boyd
- mfd bindings: explain why this can be interrupt-controller
- mfd bindings: describe interrupts better
- mfd bindings: require one cell interrupt specifier
- mfd bindings: use generic node names in example
- mfd driver:   ack masked interrupt once at init
- clk bindings: use generic node names in example
- clk driver:   use devm
- clk driver:   use of_clk_add_hw_provider
- clk driver:   change severity of print and how prints are emitted at
probe error path.
- clk driver:   dropped forward declared functions
- clk configs:  drop unnecessary dependencies
- clk driver:   other styling issues
- mfd/clk DT:   drop clk node.

Changelog v4
- remove mutex from regulator state check as core prevents simultaneous
  accesses
- allow voltage change for bucks 1 to 4 when regulator is enabled
- fix indentiation problems
- properly correct SPDX comments

Changelog v3
- kill unused variable
- kill unused definitions
- use REGMAP_IRQ_REG

Changelog v2
Based on feedback from Mark Brown
- Squashed code and buildfile changes to same patch
- Fixed some styling issues
- Changed SPDX comments to CPP style
- Error out if voltage is changed when regulator is enabled instead of
  Disabling the regulator for duration of change
- Use devm_regulator_register
- Remove compatible usage from regulators - use parent dev for config
- Add a note about using regulator-boot-on for BUCK6 and 7
- fixed warnings from kbuild test robot

patch 1: 
MFD driver and definitions bringing interrupt support and
enabling clk and regulator subsystems.
Patches 2 and 3

This patch series is based on for-mfd-next

---

Matti Vaittinen (4):
  mfd: bd71837: mfd driver for ROHM BD71837 PMIC
  mfd: bd71837: Devicetree bindings for ROHM BD71837 PMIC
  clk: bd71837: Devicetree bindings for ROHM BD71837 PMIC
  clk: bd71837: Add driver for BD71837 PMIC clock

 .../bindings/clock/rohm,bd71837-clock.txt  |  38 +++
 .../devicetree/bindings/mfd/rohm,bd71837-pmic.txt  |  76 ++
 drivers/clk/Kconfig|   7 +
 drivers/clk/Makefile   |   1 +
 drivers/clk/clk-bd71837.c  | 146 +++
 drivers/mfd/Kconfig|  13 +
 drivers/mfd/Makefile   |   1 +
 drivers/mfd/bd71837.c  | 223 
 include/linux/mfd/bd71837.h| 288 +
 9 files changed, 793 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
 create mode 100644 Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
 create mode 100644 drivers/clk/clk-bd71837.c
 create mode 100644 drivers/mfd/bd71837.c
 create mode 100644 include/linux/mfd/bd71837.h

-- 
2.14.3

Re: [PATCH] module: exclude SHN_UNDEF symbols from kallsyms api

2018-06-04 Thread Josh Poimboeuf

On Mon, Jun 04, 2018 at 03:01:31PM +0200, Jessica Yu wrote:
> +++ Jessica Yu [04/06/18 11:54 +0200]:
> > +++ Jessica Yu [04/06/18 10:05 +0200]:
> > > +++ Josh Poimboeuf [02/06/18 12:32 -0500]:
> > > > Hi Jessica,
> > > > 
> > > > I found a bug:
> > > > 
> > > > [root@f25 ~]# modprobe livepatch-sample
> > > > [root@f25 ~]# grep ' u ' /proc/kallsyms
> > > > 81161080 u klp_enable_patch [livepatch_sample]
> > > > 81a01800 u __fentry__   [livepatch_sample]
> > > > 81161250 u klp_unregister_patch [livepatch_sample]
> > > > 81161870 u klp_register_patch   [livepatch_sample]
> > > > 8131f0b0 u seq_printf   [livepatch_sample]
> > > > 
> > > > Notice that livepatch modules' undefined symbols are showing up in
> > > > /proc/kallsyms.  This can confuse klp_find_object_symbol() which can
> > > > cause subtle bugs in livepatch.
> > > > 
> > > > I stared at the module kallsyms code for a bit, but I don't see the bug.
> > > > Maybe it has something to do with how we save the symbol table in
> > > > copy_module_elf().  Any ideas?
> > > 
> > > Hi Josh!
> > > 
> > > This is because we preserve the entire symbol table for livepatch
> > > modules, including the SHN_UNDEF symbols. IIRC, this is so that we can
> > > still apply relocations properly with apply_relocate_add() after a
> > > to-be-patched object is loaded. Normally we don't save these SHN_UNDEF
> > > symbols for modules so they do not appear in /proc/kallsyms.
> > 
> > Hm, if having the full symtab in kallsyms is causing trouble, one
> > possibility would be to just have the module kallsyms code simply
> > skip/ignore undef symbols. That's what we technically do for normal
> > modules anyway (we normally cut undef syms out of the symtab). Haven't
> > tested this idea but does that sound like it'd help?
> 
> See if the following patch (untested) helps. It does not fix the
> /proc/kallsyms lookup, that requires a separate patch. But it should
> exclude the undef symbols from module_kallsyms_on_each_symbol() and
> thus also from klp_find_object_symbol().

That seems like it would work.  But wouldn't it be more robust if we
don't store the SHN_UNDEF symbols to start with?  Really it's only the
SHN_LIVEPATCH symbols that we need to keep, right?

What do you think about the following (untested)?

diff --git a/kernel/module.c b/kernel/module.c
index c9bea7f2b43e..78ec9de856e3 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2586,6 +2586,9 @@ static bool is_core_symbol(const Elf_Sym *src, const 
Elf_Shdr *sechdrs,
 {
const Elf_Shdr *sec;
 
+   if (src->st_shndx == SHN_LIVEPATCH)
+   return true;
+
if (src->st_shndx == SHN_UNDEF
|| src->st_shndx >= shnum
|| !src->st_name)
@@ -2632,9 +2635,9 @@ static void layout_symtab(struct module *mod, struct 
load_info *info)
 
/* Compute total space required for the core symbols' strtab. */
for (ndst = i = 0; i < nsrc; i++) {
-   if (i == 0 || is_livepatch_module(mod) ||
-   is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum,
-  info->index.pcpu)) {
+   if (i == 0 || is_core_symbol(src+i, info->sechdrs,
+info->hdr->e_shnum,
+info->index.pcpu)) {
strtab_size += strlen(>strtab[src[i].st_name])+1;
ndst++;
}
@@ -2691,9 +2694,9 @@ static void add_kallsyms(struct module *mod, const struct 
load_info *info)
mod->core_kallsyms.strtab = s = mod->core_layout.base + info->stroffs;
src = mod->kallsyms->symtab;
for (ndst = i = 0; i < mod->kallsyms->num_symtab; i++) {
-   if (i == 0 || is_livepatch_module(mod) ||
-   is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum,
-  info->index.pcpu)) {
+   if (i == 0 || is_core_symbol(src+i, info->sechdrs,
+info->hdr->e_shnum,
+info->index.pcpu)) {
dst[ndst] = src[i];
dst[ndst++].st_name = s - mod->core_kallsyms.strtab;
s += strlcpy(s, >kallsyms->strtab[src[i].st_name],

Re: [PATCH] module: exclude SHN_UNDEF symbols from kallsyms api

2018-06-04 Thread Josh Poimboeuf

On Mon, Jun 04, 2018 at 03:01:31PM +0200, Jessica Yu wrote:
> +++ Jessica Yu [04/06/18 11:54 +0200]:
> > +++ Jessica Yu [04/06/18 10:05 +0200]:
> > > +++ Josh Poimboeuf [02/06/18 12:32 -0500]:
> > > > Hi Jessica,
> > > > 
> > > > I found a bug:
> > > > 
> > > > [root@f25 ~]# modprobe livepatch-sample
> > > > [root@f25 ~]# grep ' u ' /proc/kallsyms
> > > > 81161080 u klp_enable_patch [livepatch_sample]
> > > > 81a01800 u __fentry__   [livepatch_sample]
> > > > 81161250 u klp_unregister_patch [livepatch_sample]
> > > > 81161870 u klp_register_patch   [livepatch_sample]
> > > > 8131f0b0 u seq_printf   [livepatch_sample]
> > > > 
> > > > Notice that livepatch modules' undefined symbols are showing up in
> > > > /proc/kallsyms.  This can confuse klp_find_object_symbol() which can
> > > > cause subtle bugs in livepatch.
> > > > 
> > > > I stared at the module kallsyms code for a bit, but I don't see the bug.
> > > > Maybe it has something to do with how we save the symbol table in
> > > > copy_module_elf().  Any ideas?
> > > 
> > > Hi Josh!
> > > 
> > > This is because we preserve the entire symbol table for livepatch
> > > modules, including the SHN_UNDEF symbols. IIRC, this is so that we can
> > > still apply relocations properly with apply_relocate_add() after a
> > > to-be-patched object is loaded. Normally we don't save these SHN_UNDEF
> > > symbols for modules so they do not appear in /proc/kallsyms.
> > 
> > Hm, if having the full symtab in kallsyms is causing trouble, one
> > possibility would be to just have the module kallsyms code simply
> > skip/ignore undef symbols. That's what we technically do for normal
> > modules anyway (we normally cut undef syms out of the symtab). Haven't
> > tested this idea but does that sound like it'd help?
> 
> See if the following patch (untested) helps. It does not fix the
> /proc/kallsyms lookup, that requires a separate patch. But it should
> exclude the undef symbols from module_kallsyms_on_each_symbol() and
> thus also from klp_find_object_symbol().

That seems like it would work.  But wouldn't it be more robust if we
don't store the SHN_UNDEF symbols to start with?  Really it's only the
SHN_LIVEPATCH symbols that we need to keep, right?

What do you think about the following (untested)?

diff --git a/kernel/module.c b/kernel/module.c
index c9bea7f2b43e..78ec9de856e3 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2586,6 +2586,9 @@ static bool is_core_symbol(const Elf_Sym *src, const 
Elf_Shdr *sechdrs,
 {
const Elf_Shdr *sec;
 
+   if (src->st_shndx == SHN_LIVEPATCH)
+   return true;
+
if (src->st_shndx == SHN_UNDEF
|| src->st_shndx >= shnum
|| !src->st_name)
@@ -2632,9 +2635,9 @@ static void layout_symtab(struct module *mod, struct 
load_info *info)
 
/* Compute total space required for the core symbols' strtab. */
for (ndst = i = 0; i < nsrc; i++) {
-   if (i == 0 || is_livepatch_module(mod) ||
-   is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum,
-  info->index.pcpu)) {
+   if (i == 0 || is_core_symbol(src+i, info->sechdrs,
+info->hdr->e_shnum,
+info->index.pcpu)) {
strtab_size += strlen(>strtab[src[i].st_name])+1;
ndst++;
}
@@ -2691,9 +2694,9 @@ static void add_kallsyms(struct module *mod, const struct 
load_info *info)
mod->core_kallsyms.strtab = s = mod->core_layout.base + info->stroffs;
src = mod->kallsyms->symtab;
for (ndst = i = 0; i < mod->kallsyms->num_symtab; i++) {
-   if (i == 0 || is_livepatch_module(mod) ||
-   is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum,
-  info->index.pcpu)) {
+   if (i == 0 || is_core_symbol(src+i, info->sechdrs,
+info->hdr->e_shnum,
+info->index.pcpu)) {
dst[ndst] = src[i];
dst[ndst++].st_name = s - mod->core_kallsyms.strtab;
s += strlcpy(s, >kallsyms->strtab[src[i].st_name],

Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks

2018-06-04 Thread Michal Hocko

On Thu 31-05-18 11:10:22, Michal Hocko wrote:
> On Thu 31-05-18 10:55:32, Michal Hocko wrote:
> > On Thu 31-05-18 04:35:31, Eric Dumazet wrote:
> [...]
> > > I merely copied/pasted from alloc_skb_with_frags() :/
> > 
> > I will have a look at it. Thanks!
> 
> OK, so this is an example of an incremental development ;).
> 
> __GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for
> high order allocations") to prevent from OOM killer. Yet this was
> not enough because fb05e7a89f50 ("net: don't wait for order-3 page
> allocation") didn't want an excessive reclaim for non-costly orders
> so it made it completely NOWAIT while it preserved __GFP_NORETRY in
> place which is now redundant. Should I send a patch?

Just in case you are interested
---
>From 5010543ed6f73e4c00367801486dca8d5c63b2ce Mon Sep 17 00:00:00 2001
From: Michal Hocko 
Date: Mon, 4 Jun 2018 15:07:37 +0200
Subject: [PATCH] net: cleanup gfp mask in alloc_skb_with_frags

alloc_skb_with_frags uses __GFP_NORETRY for non-sleeping allocations
which is just a noop and a little bit confusing.

__GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for
high order allocations") to prevent from the OOM killer. Yet this was
not enough because fb05e7a89f50 ("net: don't wait for order-3 page
allocation") didn't want an excessive reclaim for non-costly orders
so it made it completely NOWAIT while it preserved __GFP_NORETRY in
place which is now redundant.

Drop the pointless __GFP_NORETRY because this function is used as
copy source for other places.

Signed-off-by: Michal Hocko 
---
 net/core/skbuff.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 857e4e6f751a..c1f22adc30de 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5239,8 +5239,7 @@ struct sk_buff *alloc_skb_with_frags(unsigned long 
header_len,
if (npages >= 1 << order) {
page = alloc_pages((gfp_mask & 
~__GFP_DIRECT_RECLAIM) |
   __GFP_COMP |
-  __GFP_NOWARN |
-  __GFP_NORETRY,
+  __GFP_NOWARN,
   order);
if (page)
goto fill_page;
-- 
2.17.0

-- 
Michal Hocko
SUSE Labs

Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks

2018-06-04 Thread Michal Hocko

On Thu 31-05-18 11:10:22, Michal Hocko wrote:
> On Thu 31-05-18 10:55:32, Michal Hocko wrote:
> > On Thu 31-05-18 04:35:31, Eric Dumazet wrote:
> [...]
> > > I merely copied/pasted from alloc_skb_with_frags() :/
> > 
> > I will have a look at it. Thanks!
> 
> OK, so this is an example of an incremental development ;).
> 
> __GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for
> high order allocations") to prevent from OOM killer. Yet this was
> not enough because fb05e7a89f50 ("net: don't wait for order-3 page
> allocation") didn't want an excessive reclaim for non-costly orders
> so it made it completely NOWAIT while it preserved __GFP_NORETRY in
> place which is now redundant. Should I send a patch?

Just in case you are interested
---
>From 5010543ed6f73e4c00367801486dca8d5c63b2ce Mon Sep 17 00:00:00 2001
From: Michal Hocko 
Date: Mon, 4 Jun 2018 15:07:37 +0200
Subject: [PATCH] net: cleanup gfp mask in alloc_skb_with_frags

alloc_skb_with_frags uses __GFP_NORETRY for non-sleeping allocations
which is just a noop and a little bit confusing.

__GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for
high order allocations") to prevent from the OOM killer. Yet this was
not enough because fb05e7a89f50 ("net: don't wait for order-3 page
allocation") didn't want an excessive reclaim for non-costly orders
so it made it completely NOWAIT while it preserved __GFP_NORETRY in
place which is now redundant.

Drop the pointless __GFP_NORETRY because this function is used as
copy source for other places.

Signed-off-by: Michal Hocko 
---
 net/core/skbuff.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 857e4e6f751a..c1f22adc30de 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5239,8 +5239,7 @@ struct sk_buff *alloc_skb_with_frags(unsigned long 
header_len,
if (npages >= 1 << order) {
page = alloc_pages((gfp_mask & 
~__GFP_DIRECT_RECLAIM) |
   __GFP_COMP |
-  __GFP_NOWARN |
-  __GFP_NORETRY,
+  __GFP_NOWARN,
   order);
if (page)
goto fill_page;
-- 
2.17.0

-- 
Michal Hocko
SUSE Labs

Re: [PATCH 05/19] sched/numa: Use task faults only if numa_group is not yet setup

2018-06-04 Thread Srikar Dronamraju

> > Testcase   Time: Min Max Avg  StdDev
> >  %Change
> > numa01.sh  Real:  478.45  565.90  515.11   30.87
> >  16.29%
> > numa01.sh   Sys:  207.79  271.04  232.94   21.33
> >  -15.8%
> > numa01.sh  User:39763.9347303.1243210.73 2644.86
> >  14.04%
> > numa02.sh  Real:   60.00   61.46   60.780.49
> >  0.871%
> > numa02.sh   Sys:   15.71   25.31   20.693.42
> >  17.35%
> > numa02.sh  User: 5175.92 5265.86 5235.97   32.82
> >  0.464%
> > numa03.sh  Real:  776.42  834.85  806.01   23.22
> >  -7.47%
> > numa03.sh   Sys:  114.43  128.75  121.655.49
> >  -19.5%
> > numa03.sh  User:60773.9364855.2562616.91 1576.39
> >  -5.36%
> > numa04.sh  Real:  456.93  511.95  482.91   20.88
> >  2.930%
> > numa04.sh   Sys:  178.09  460.89  356.86   94.58
> >  -11.3%
> > numa04.sh  User:36312.0942553.2439623.21 2247.96
> >  0.246%
> > numa05.sh  Real:  393.98  493.48  436.61   35.59
> >  0.677%
> > numa05.sh   Sys:  164.49  329.15  265.87   61.78
> >  38.92%
> > numa05.sh  User:33182.6536654.5335074.51 1187.71
> >  3.368%
> > 
> > Ideally this change shouldn't have affected performance.
> 
> Ideally you go on here to explain why it does in fact do affect
> performance.. :-)

I know it looks bad, but I have been unable to figure out why this patch
affects performance. I repeated the experiment multiple times to recheck
if it was not a one off problem. While there is a variance in different
runs, we do see a change in numbers before and after this patch atleast
on my machine.

Re: [PATCH 05/19] sched/numa: Use task faults only if numa_group is not yet setup

2018-06-04 Thread Srikar Dronamraju

> > Testcase   Time: Min Max Avg  StdDev
> >  %Change
> > numa01.sh  Real:  478.45  565.90  515.11   30.87
> >  16.29%
> > numa01.sh   Sys:  207.79  271.04  232.94   21.33
> >  -15.8%
> > numa01.sh  User:39763.9347303.1243210.73 2644.86
> >  14.04%
> > numa02.sh  Real:   60.00   61.46   60.780.49
> >  0.871%
> > numa02.sh   Sys:   15.71   25.31   20.693.42
> >  17.35%
> > numa02.sh  User: 5175.92 5265.86 5235.97   32.82
> >  0.464%
> > numa03.sh  Real:  776.42  834.85  806.01   23.22
> >  -7.47%
> > numa03.sh   Sys:  114.43  128.75  121.655.49
> >  -19.5%
> > numa03.sh  User:60773.9364855.2562616.91 1576.39
> >  -5.36%
> > numa04.sh  Real:  456.93  511.95  482.91   20.88
> >  2.930%
> > numa04.sh   Sys:  178.09  460.89  356.86   94.58
> >  -11.3%
> > numa04.sh  User:36312.0942553.2439623.21 2247.96
> >  0.246%
> > numa05.sh  Real:  393.98  493.48  436.61   35.59
> >  0.677%
> > numa05.sh   Sys:  164.49  329.15  265.87   61.78
> >  38.92%
> > numa05.sh  User:33182.6536654.5335074.51 1187.71
> >  3.368%
> > 
> > Ideally this change shouldn't have affected performance.
> 
> Ideally you go on here to explain why it does in fact do affect
> performance.. :-)

I know it looks bad, but I have been unable to figure out why this patch
affects performance. I repeated the experiment multiple times to recheck
if it was not a one off problem. While there is a variance in different
runs, we do see a change in numbers before and after this patch atleast
on my machine.

Re: [PATCH 32/32] [RFC] fsinfo: Add a system call to allow querying of filesystem information [ver #8]

2018-06-04 Thread Arnd Bergmann

On Fri, May 25, 2018 at 2:08 AM, David Howells  wrote:

> +
> +static int fsinfo_generic_timestamp_info(struct dentry *dentry,
> +struct fsinfo_timestamp_info *ts)
> +{
> +   struct super_block *sb = dentry->d_sb;
> +
> +   /* If unset, assume 1s granularity */
> +   u16 mantissa = 1;
> +   s8 exponent = 0;
> +
> +   ts->minimum_timestamp = S64_MIN;
> +   ts->maximum_timestamp = S64_MAX;
> +   if (sb->s_time_gran < 10) {
> +   if (sb->s_time_gran < 1000)
> +   exponent = -9;
> +   else if (sb->s_time_gran < 100)
> +   exponent = -6;
> +   else
> +   exponent = -3;
> +   }

ntfs has sb->s_time_gran=100, and vfat should really have
sb->s_time_gran=20 but that doesn't seem to be set right
at the moment.

> +/*
> + * Optional fsinfo() parameter structure.
> + *
> + * If this is not given, it is assumed that fsinfo_attr_statfs instance 0 is
> + * desired.
> + */
> +struct fsinfo_params {
> +   enum fsinfo_attribute   request;/* What is being asking for */
> +   __u32   Nth;/* Instance of it (some may 
> have multiple) */
> +   __u32   at_flags;   /* AT_SYMLINK_NOFOLLOW and 
> similar flags */
> +   __u32   __spare[6]; /* Spare params; all must be 
> 0 */
> +};

I fear the 'enum' in the uapi structure may have a different size depending
on the architecture. Maybe turn that into a __u32 as well?

> +struct fsinfo_capabilities {
> +   __u64   supported_stx_attributes;   /* What statx::stx_attributes 
> are supported */
> +   __u32   supported_stx_mask; /* What statx::stx_mask bits 
> are supported */
> +   __u32   supported_ioc_flags;/* What FS_IOC_* flags are 
> supported */
> +   __u8capabilities[(fsinfo_cap__nr + 7) / 8];
> +};

This looks a bit odd: with the 44 capabilities, you end up having a
six-byte array
followed by two bytes of implicit padding. If the number of
capabilities grows beyond
64, you have a nine byte array with more padding to the next alignof(__u64). Is
that intentional?

How about making it a fixed size with either 64 or 128 capability bits?

> +/*
> + * Information struct for fsinfo(fsinfo_attr_timestamp_info).
> + */
> +struct fsinfo_timestamp_info {
> +   __s64   minimum_timestamp;  /* Minimum timestamp value in seconds 
> */
> +   __s64   maximum_timestamp;  /* Maximum timestamp value in seconds 
> */
> +   __u16   atime_gran_mantissa;/* Granularity(secs) = mant * 10^exp 
> */
> +   __u16   btime_gran_mantissa;
> +   __u16   ctime_gran_mantissa;
> +   __u16   mtime_gran_mantissa;
> +   __s8atime_gran_exponent;
> +   __s8btime_gran_exponent;
> +   __s8ctime_gran_exponent;
> +   __s8mtime_gran_exponent;
> +};

This structure has a slightly inconsistent amount of padding at the end:
on x86-32 it has no padding, everywhere else it has 32 bits of padding
to make it 64-bit aligned. Maybe add a __u32 reserved field?

> +
> +#define __NR_fsinfo 326

Hardcoding the syscall number in the example makes it architecture specific.
Could you include  to get the real number?

  Arnd

Re: [PATCH 32/32] [RFC] fsinfo: Add a system call to allow querying of filesystem information [ver #8]

2018-06-04 Thread Arnd Bergmann

On Fri, May 25, 2018 at 2:08 AM, David Howells  wrote:

> +
> +static int fsinfo_generic_timestamp_info(struct dentry *dentry,
> +struct fsinfo_timestamp_info *ts)
> +{
> +   struct super_block *sb = dentry->d_sb;
> +
> +   /* If unset, assume 1s granularity */
> +   u16 mantissa = 1;
> +   s8 exponent = 0;
> +
> +   ts->minimum_timestamp = S64_MIN;
> +   ts->maximum_timestamp = S64_MAX;
> +   if (sb->s_time_gran < 10) {
> +   if (sb->s_time_gran < 1000)
> +   exponent = -9;
> +   else if (sb->s_time_gran < 100)
> +   exponent = -6;
> +   else
> +   exponent = -3;
> +   }

ntfs has sb->s_time_gran=100, and vfat should really have
sb->s_time_gran=20 but that doesn't seem to be set right
at the moment.

> +/*
> + * Optional fsinfo() parameter structure.
> + *
> + * If this is not given, it is assumed that fsinfo_attr_statfs instance 0 is
> + * desired.
> + */
> +struct fsinfo_params {
> +   enum fsinfo_attribute   request;/* What is being asking for */
> +   __u32   Nth;/* Instance of it (some may 
> have multiple) */
> +   __u32   at_flags;   /* AT_SYMLINK_NOFOLLOW and 
> similar flags */
> +   __u32   __spare[6]; /* Spare params; all must be 
> 0 */
> +};

I fear the 'enum' in the uapi structure may have a different size depending
on the architecture. Maybe turn that into a __u32 as well?

> +struct fsinfo_capabilities {
> +   __u64   supported_stx_attributes;   /* What statx::stx_attributes 
> are supported */
> +   __u32   supported_stx_mask; /* What statx::stx_mask bits 
> are supported */
> +   __u32   supported_ioc_flags;/* What FS_IOC_* flags are 
> supported */
> +   __u8capabilities[(fsinfo_cap__nr + 7) / 8];
> +};

This looks a bit odd: with the 44 capabilities, you end up having a
six-byte array
followed by two bytes of implicit padding. If the number of
capabilities grows beyond
64, you have a nine byte array with more padding to the next alignof(__u64). Is
that intentional?

How about making it a fixed size with either 64 or 128 capability bits?

> +/*
> + * Information struct for fsinfo(fsinfo_attr_timestamp_info).
> + */
> +struct fsinfo_timestamp_info {
> +   __s64   minimum_timestamp;  /* Minimum timestamp value in seconds 
> */
> +   __s64   maximum_timestamp;  /* Maximum timestamp value in seconds 
> */
> +   __u16   atime_gran_mantissa;/* Granularity(secs) = mant * 10^exp 
> */
> +   __u16   btime_gran_mantissa;
> +   __u16   ctime_gran_mantissa;
> +   __u16   mtime_gran_mantissa;
> +   __s8atime_gran_exponent;
> +   __s8btime_gran_exponent;
> +   __s8ctime_gran_exponent;
> +   __s8mtime_gran_exponent;
> +};

This structure has a slightly inconsistent amount of padding at the end:
on x86-32 it has no padding, everywhere else it has 32 bits of padding
to make it 64-bit aligned. Maybe add a __u32 reserved field?

> +
> +#define __NR_fsinfo 326

Hardcoding the syscall number in the example makes it architecture specific.
Could you include  to get the real number?

  Arnd

Re: [PATCH v6 6/9] dt-bindings: counter: Document stm32 quadrature encoder

2018-06-04 Thread Benjamin Gaignard

2018-05-18 18:28 GMT+02:00 Rob Herring :
> On Thu, May 17, 2018 at 08:59:40PM +0200, Benjamin Gaignard wrote:
>> 2018-05-17 18:23 GMT+02:00 Rob Herring :
>> > On Wed, May 16, 2018 at 12:51 PM, William Breathitt Gray
>> >  wrote:
>> >> From: Benjamin Gaignard 
>> >
>> > v6? Where's v1-v5?
>> >
>> >> Add bindings for STM32 Timer quadrature encoder.
>> >> It is a sub-node of STM32 Timer which implement the
>> >> counter part of the hardware.
>> >>
>> >> Cc: Rob Herring 
>> >> Cc: Mark Rutland 
>> >> Signed-off-by: Benjamin Gaignard 
>> >> Signed-off-by: William Breathitt Gray 
>> >> ---
>> >>  .../bindings/counter/stm32-timer-cnt.txt  | 26 +++
>> >>  .../devicetree/bindings/mfd/stm32-timers.txt  |  7 +
>> >>  2 files changed, 33 insertions(+)
>> >>  create mode 100644 
>> >> Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt
>> >>
>> >> diff --git 
>> >> a/Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt 
>> >> b/Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt
>> >> new file mode 100644
>> >> index ..377728128bef
>> >> --- /dev/null
>> >> +++ b/Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt
>> >> @@ -0,0 +1,26 @@
>> >> +STMicroelectronics STM32 Timer quadrature encoder
>> >> +
>> >> +STM32 Timer provides quadrature encoder counter mode to detect
>> >
>> > 'mode' does not sound like a sub-block of the timers block.
>>
>> quadrature encoding is one of the counting modes of this hardware
>> block which is enable to count on other signals/triggers
>
> You don't need a child node and compatible to set a mode.

"mode" isn't the good word here because quadratic encoder enable a
sub-block of this hardware.
Timer internal counter input could be internal or external clocks,
some IIO triggers
or the output of the quadratic encoder sub-block.
It is a child like pwm or IIO trigger.

>
>> >> +angular position and direction of rotary elements,
>> >> +from IN1 and IN2 input signals.
>> >> +
>> >> +Must be a sub-node of an STM32 Timer device tree node.
>> >> +See ../mfd/stm32-timers.txt for details about the parent node.
>> >> +
>> >> +Required properties:
>> >> +- compatible:  Must be "st,stm32-timer-counter".
>> >> +- pinctrl-names:   Set to "default".
>> >> +- pinctrl-0:   List of phandles pointing to pin configuration 
>> >> nodes,
>> >> +   to set IN1/IN2 pins in mode of operation for 
>> >> Low-Power
>> >> +   Timer input on external pin.
>> >> +
>> >> +Example:
>> >> +   timers@4001  {
>> >> +   compatible = "st,stm32-timers";
>> >> +   ...
>> >> +   counter {
>> >> +   compatible = "st,stm32-timer-counter";
>> >
>> > Is there only 1? How is the counter addressed?
>>
>> Yes there is only one counter per hardware block.
>> Counter is addressed like the two others sub-nodes and the details
>> about parent mode are describe in stm32-timers.txt
>> Should I add them here too ? so example will be like that:
>
> No, you should drop the child node and add pinctrl to the parent.
>
> Any other functions this block has that you plan on adding? Please make
> bindings as complete as possible, not what you currently have drivers
> for.

Counter framework didn't exist when I pushed timer node but thanks to
William's effort
it will allow us to use this kindf of hardware

Benjamin

>
>> timers@4001  {
>>   #address-cells = <1>;
>>   #size-cells = <0>;
>>   compatible = "st,stm32-timers";
>>   reg = <0x4001 0x400>;
>>   clocks = < 0 160>;
>>   clock-names = "int";
>>   counter {
>> compatible = "st,stm32-timer-counter";
>> pinctrl-names = "default";
>> pinctrl-0 = <_in_pins>;
>> };
>>  };
>>
>> Benjamin
>> >
>> > ___
>> > linux-arm-kernel mailing list
>> > linux-arm-ker...@lists.infradead.org
>> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



-- 
Benjamin Gaignard

Graphic Study Group

Linaro.org │ Open source software for ARM SoCs

Follow Linaro: Facebook | Twitter | Blog

Re: [PATCH v6 6/9] dt-bindings: counter: Document stm32 quadrature encoder

2018-06-04 Thread Benjamin Gaignard

2018-05-18 18:28 GMT+02:00 Rob Herring :
> On Thu, May 17, 2018 at 08:59:40PM +0200, Benjamin Gaignard wrote:
>> 2018-05-17 18:23 GMT+02:00 Rob Herring :
>> > On Wed, May 16, 2018 at 12:51 PM, William Breathitt Gray
>> >  wrote:
>> >> From: Benjamin Gaignard 
>> >
>> > v6? Where's v1-v5?
>> >
>> >> Add bindings for STM32 Timer quadrature encoder.
>> >> It is a sub-node of STM32 Timer which implement the
>> >> counter part of the hardware.
>> >>
>> >> Cc: Rob Herring 
>> >> Cc: Mark Rutland 
>> >> Signed-off-by: Benjamin Gaignard 
>> >> Signed-off-by: William Breathitt Gray 
>> >> ---
>> >>  .../bindings/counter/stm32-timer-cnt.txt  | 26 +++
>> >>  .../devicetree/bindings/mfd/stm32-timers.txt  |  7 +
>> >>  2 files changed, 33 insertions(+)
>> >>  create mode 100644 
>> >> Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt
>> >>
>> >> diff --git 
>> >> a/Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt 
>> >> b/Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt
>> >> new file mode 100644
>> >> index ..377728128bef
>> >> --- /dev/null
>> >> +++ b/Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt
>> >> @@ -0,0 +1,26 @@
>> >> +STMicroelectronics STM32 Timer quadrature encoder
>> >> +
>> >> +STM32 Timer provides quadrature encoder counter mode to detect
>> >
>> > 'mode' does not sound like a sub-block of the timers block.
>>
>> quadrature encoding is one of the counting modes of this hardware
>> block which is enable to count on other signals/triggers
>
> You don't need a child node and compatible to set a mode.

"mode" isn't the good word here because quadratic encoder enable a
sub-block of this hardware.
Timer internal counter input could be internal or external clocks,
some IIO triggers
or the output of the quadratic encoder sub-block.
It is a child like pwm or IIO trigger.

>
>> >> +angular position and direction of rotary elements,
>> >> +from IN1 and IN2 input signals.
>> >> +
>> >> +Must be a sub-node of an STM32 Timer device tree node.
>> >> +See ../mfd/stm32-timers.txt for details about the parent node.
>> >> +
>> >> +Required properties:
>> >> +- compatible:  Must be "st,stm32-timer-counter".
>> >> +- pinctrl-names:   Set to "default".
>> >> +- pinctrl-0:   List of phandles pointing to pin configuration 
>> >> nodes,
>> >> +   to set IN1/IN2 pins in mode of operation for 
>> >> Low-Power
>> >> +   Timer input on external pin.
>> >> +
>> >> +Example:
>> >> +   timers@4001  {
>> >> +   compatible = "st,stm32-timers";
>> >> +   ...
>> >> +   counter {
>> >> +   compatible = "st,stm32-timer-counter";
>> >
>> > Is there only 1? How is the counter addressed?
>>
>> Yes there is only one counter per hardware block.
>> Counter is addressed like the two others sub-nodes and the details
>> about parent mode are describe in stm32-timers.txt
>> Should I add them here too ? so example will be like that:
>
> No, you should drop the child node and add pinctrl to the parent.
>
> Any other functions this block has that you plan on adding? Please make
> bindings as complete as possible, not what you currently have drivers
> for.

Counter framework didn't exist when I pushed timer node but thanks to
William's effort
it will allow us to use this kindf of hardware

Benjamin

>
>> timers@4001  {
>>   #address-cells = <1>;
>>   #size-cells = <0>;
>>   compatible = "st,stm32-timers";
>>   reg = <0x4001 0x400>;
>>   clocks = < 0 160>;
>>   clock-names = "int";
>>   counter {
>> compatible = "st,stm32-timer-counter";
>> pinctrl-names = "default";
>> pinctrl-0 = <_in_pins>;
>> };
>>  };
>>
>> Benjamin
>> >
>> > ___
>> > linux-arm-kernel mailing list
>> > linux-arm-ker...@lists.infradead.org
>> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



-- 
Benjamin Gaignard

Graphic Study Group

Linaro.org │ Open source software for ARM SoCs

Follow Linaro: Facebook | Twitter | Blog

Re: [RFC][PATCH 1/2] memcg: Ensure every task that uses an mm is in the same memory cgroup

2018-06-04 Thread Michal Hocko

[dropping Kirill Tkhai from the CC because I get rejection from the mail
 server]

On Fri 01-06-18 12:16:52, Tejun Heo wrote:
> Hello,
> 
> On Fri, Jun 01, 2018 at 01:11:59PM -0500, Eric W. Biederman wrote:
> > Widening the definition of a process sounds good.  The memory control
> > group code would still need a way to forbid these in cgroup v1 mode,
> > when someone uses the task file.
> 
> Yeap, you're right.  We'll need memcg's can_attach rejecting for v1.

Do we really need? I mean, do we know about any existing usecase that
would need this weird threading concept and depend on memory migration
which doesn't really work?
-- 
Michal Hocko
SUSE Labs

Re: [RFC][PATCH 1/2] memcg: Ensure every task that uses an mm is in the same memory cgroup

2018-06-04 Thread Michal Hocko

[dropping Kirill Tkhai from the CC because I get rejection from the mail
 server]

On Fri 01-06-18 12:16:52, Tejun Heo wrote:
> Hello,
> 
> On Fri, Jun 01, 2018 at 01:11:59PM -0500, Eric W. Biederman wrote:
> > Widening the definition of a process sounds good.  The memory control
> > group code would still need a way to forbid these in cgroup v1 mode,
> > when someone uses the task file.
> 
> Yeap, you're right.  We'll need memcg's can_attach rejecting for v1.

Do we really need? I mean, do we know about any existing usecase that
would need this weird threading concept and depend on memory migration
which doesn't really work?
-- 
Michal Hocko
SUSE Labs

[PATCH] module: exclude SHN_UNDEF symbols from kallsyms api

2018-06-04 Thread Jessica Yu


+++ Jessica Yu [04/06/18 11:54 +0200]:

+++ Jessica Yu [04/06/18 10:05 +0200]:

+++ Josh Poimboeuf [02/06/18 12:32 -0500]:

Hi Jessica,

I found a bug:

[root@f25 ~]# modprobe livepatch-sample
[root@f25 ~]# grep ' u ' /proc/kallsyms
81161080 u klp_enable_patch [livepatch_sample]
81a01800 u __fentry__   [livepatch_sample]
81161250 u klp_unregister_patch [livepatch_sample]
81161870 u klp_register_patch   [livepatch_sample]
8131f0b0 u seq_printf   [livepatch_sample]

Notice that livepatch modules' undefined symbols are showing up in
/proc/kallsyms.  This can confuse klp_find_object_symbol() which can
cause subtle bugs in livepatch.

I stared at the module kallsyms code for a bit, but I don't see the bug.
Maybe it has something to do with how we save the symbol table in
copy_module_elf().  Any ideas?


Hi Josh!

This is because we preserve the entire symbol table for livepatch
modules, including the SHN_UNDEF symbols. IIRC, this is so that we can
still apply relocations properly with apply_relocate_add() after a
to-be-patched object is loaded. Normally we don't save these SHN_UNDEF
symbols for modules so they do not appear in /proc/kallsyms.


Hm, if having the full symtab in kallsyms is causing trouble, one
possibility would be to just have the module kallsyms code simply
skip/ignore undef symbols. That's what we technically do for normal
modules anyway (we normally cut undef syms out of the symtab). Haven't
tested this idea but does that sound like it'd help?


See if the following patch (untested) helps. It does not fix the
/proc/kallsyms lookup, that requires a separate patch. But it should
exclude the undef symbols from module_kallsyms_on_each_symbol() and
thus also from klp_find_object_symbol().


From 9cfd14675206adf55a85e5f5322b36ea89a523e4 Mon Sep 17 00:00:00 2001

From: Jessica Yu 
Date: Mon, 4 Jun 2018 14:35:56 +0200
Subject: [PATCH] module: exclude SHN_UNDEF symbols from kallsyms api

---
kernel/module.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/module.c b/kernel/module.c
index c9bea7f2b43e..dfa61490b37d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4070,7 +4070,7 @@ static unsigned long mod_find_symname(struct module *mod, 
const char *name)

for (i = 0; i < kallsyms->num_symtab; i++)
if (strcmp(name, symname(kallsyms, i)) == 0 &&
-   kallsyms->symtab[i].st_info != 'U')
+   kallsyms->symtab[i].st_shndx != SHN_UNDEF)
return kallsyms->symtab[i].st_value;
return 0;
}
@@ -4116,6 +4116,10 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, 
const char *,
if (mod->state == MODULE_STATE_UNFORMED)
continue;
for (i = 0; i < kallsyms->num_symtab; i++) {
+
+   if (kallsyms->symtab[i].st_shndx == SHN_UNDEF)
+   continue;
+
ret = fn(data, symname(kallsyms, i),
 mod, kallsyms->symtab[i].st_value);
if (ret != 0)
--
2.12.3

[PATCH] module: exclude SHN_UNDEF symbols from kallsyms api

2018-06-04 Thread Jessica Yu


+++ Jessica Yu [04/06/18 11:54 +0200]:

+++ Jessica Yu [04/06/18 10:05 +0200]:

+++ Josh Poimboeuf [02/06/18 12:32 -0500]:

Hi Jessica,

I found a bug:

[root@f25 ~]# modprobe livepatch-sample
[root@f25 ~]# grep ' u ' /proc/kallsyms
81161080 u klp_enable_patch [livepatch_sample]
81a01800 u __fentry__   [livepatch_sample]
81161250 u klp_unregister_patch [livepatch_sample]
81161870 u klp_register_patch   [livepatch_sample]
8131f0b0 u seq_printf   [livepatch_sample]

Notice that livepatch modules' undefined symbols are showing up in
/proc/kallsyms.  This can confuse klp_find_object_symbol() which can
cause subtle bugs in livepatch.

I stared at the module kallsyms code for a bit, but I don't see the bug.
Maybe it has something to do with how we save the symbol table in
copy_module_elf().  Any ideas?


Hi Josh!

This is because we preserve the entire symbol table for livepatch
modules, including the SHN_UNDEF symbols. IIRC, this is so that we can
still apply relocations properly with apply_relocate_add() after a
to-be-patched object is loaded. Normally we don't save these SHN_UNDEF
symbols for modules so they do not appear in /proc/kallsyms.


Hm, if having the full symtab in kallsyms is causing trouble, one
possibility would be to just have the module kallsyms code simply
skip/ignore undef symbols. That's what we technically do for normal
modules anyway (we normally cut undef syms out of the symtab). Haven't
tested this idea but does that sound like it'd help?


See if the following patch (untested) helps. It does not fix the
/proc/kallsyms lookup, that requires a separate patch. But it should
exclude the undef symbols from module_kallsyms_on_each_symbol() and
thus also from klp_find_object_symbol().


From 9cfd14675206adf55a85e5f5322b36ea89a523e4 Mon Sep 17 00:00:00 2001

From: Jessica Yu 
Date: Mon, 4 Jun 2018 14:35:56 +0200
Subject: [PATCH] module: exclude SHN_UNDEF symbols from kallsyms api

---
kernel/module.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/module.c b/kernel/module.c
index c9bea7f2b43e..dfa61490b37d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4070,7 +4070,7 @@ static unsigned long mod_find_symname(struct module *mod, 
const char *name)

for (i = 0; i < kallsyms->num_symtab; i++)
if (strcmp(name, symname(kallsyms, i)) == 0 &&
-   kallsyms->symtab[i].st_info != 'U')
+   kallsyms->symtab[i].st_shndx != SHN_UNDEF)
return kallsyms->symtab[i].st_value;
return 0;
}
@@ -4116,6 +4116,10 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, 
const char *,
if (mod->state == MODULE_STATE_UNFORMED)
continue;
for (i = 0; i < kallsyms->num_symtab; i++) {
+
+   if (kallsyms->symtab[i].st_shndx == SHN_UNDEF)
+   continue;
+
ret = fn(data, symname(kallsyms, i),
 mod, kallsyms->symtab[i].st_value);
if (ret != 0)
--
2.12.3

Re: [PATCH 04/19] sched/numa: Set preferred_node based on best_cpu

2018-06-04 Thread Srikar Dronamraju

* Peter Zijlstra  [2018-06-04 14:23:36]:

> OK, the above matches the description, but I'm puzzled by the remainder:
>
> >
> > -   if (ng->active_nodes > 1 && numa_is_active_node(env.dst_nid, 
> > ng))
> > -   sched_setnuma(p, env.dst_nid);
> > +   if (nid != p->numa_preferred_nid)
> > +   sched_setnuma(p, nid);
> > }
>
> That seems to entirely loose the active_node thing, or are you saying
> best_cpu already includes that? (Changelog could use a little help there
> I suppose)

I think checking for active_nodes before calling sched_setnuma was a
mistake.

Before this change, we may be retaining numa_preferred_nid to be the
source node while we select another node with better numa affinity to
run on. So we are creating a situation where we force a thread to run on
a node which is not going to be its preferred_node. So in the course of
regular load balancing, this task might then be moved to set
preferred_node which is actually not the preferred_node.

Re: [PATCH 04/19] sched/numa: Set preferred_node based on best_cpu

2018-06-04 Thread Srikar Dronamraju

* Peter Zijlstra  [2018-06-04 14:23:36]:

> OK, the above matches the description, but I'm puzzled by the remainder:
>
> >
> > -   if (ng->active_nodes > 1 && numa_is_active_node(env.dst_nid, 
> > ng))
> > -   sched_setnuma(p, env.dst_nid);
> > +   if (nid != p->numa_preferred_nid)
> > +   sched_setnuma(p, nid);
> > }
>
> That seems to entirely loose the active_node thing, or are you saying
> best_cpu already includes that? (Changelog could use a little help there
> I suppose)

I think checking for active_nodes before calling sched_setnuma was a
mistake.

Before this change, we may be retaining numa_preferred_nid to be the
source node while we select another node with better numa affinity to
run on. So we are creating a situation where we force a thread to run on
a node which is not going to be its preferred_node. So in the course of
regular load balancing, this task might then be moved to set
preferred_node which is actually not the preferred_node.

Re: [PATCH v2 3/5] venus: add check to make scm calls

2018-06-04 Thread Tomasz Figa

Hi Vikash,

On Sat, Jun 2, 2018 at 5:27 AM Vikash Garodia  wrote:
[snip]
> +int venus_boot(struct venus_core *core)
> +{
> +   phys_addr_t mem_phys;
> +   size_t mem_size;
> +   int ret;
> +   struct device *dev;
> +
> +   if (!IS_ENABLED(CONFIG_QCOM_MDT_LOADER))
> +   return -EPROBE_DEFER;

Why are we deferring probe here? The option will not magically become
enabled after probe is retried.

Best regards,
Tomasz

[PATCH 1/2] perf tests kmod-path: Add tests for vdso32 and vdsox32

2018-06-04 Thread Adrian Hunter

Add tests for vdso32 and vdsox32. This will cause the overall test to fail
because __kmod_path__parse() does not handle vdso32 or vdsox32.

Fixes: 1f121b03d058 ("perf tools: Deal with kernel module names in '[]' 
correctly")
Signed-off-by: Adrian Hunter 
---
 tools/perf/tests/kmod-path.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/tools/perf/tests/kmod-path.c b/tools/perf/tests/kmod-path.c
index 8e57d46109de..148dd31cc201 100644
--- a/tools/perf/tests/kmod-path.c
+++ b/tools/perf/tests/kmod-path.c
@@ -127,6 +127,22 @@ int test__kmod_path__parse(struct test *t __maybe_unused, 
int subtest __maybe_un
M("[vdso]", PERF_RECORD_MISC_KERNEL, false);
M("[vdso]", PERF_RECORD_MISC_USER, false);
 
+   T("[vdso32]", true  , true , false, false, "[vdso32]", NULL);
+   T("[vdso32]", false , true , false, false, NULL, NULL);
+   T("[vdso32]", true  , false, false, false, "[vdso32]", NULL);
+   T("[vdso32]", false , false, false, false, NULL, NULL);
+   M("[vdso32]", PERF_RECORD_MISC_CPUMODE_UNKNOWN, false);
+   M("[vdso32]", PERF_RECORD_MISC_KERNEL, false);
+   M("[vdso32]", PERF_RECORD_MISC_USER, false);
+
+   T("[vdsox32]", true  , true , false, false, "[vdsox32]", NULL);
+   T("[vdsox32]", false , true , false, false, NULL, NULL);
+   T("[vdsox32]", true  , false, false, false, "[vdsox32]", NULL);
+   T("[vdsox32]", false , false, false, false, NULL, NULL);
+   M("[vdsox32]", PERF_RECORD_MISC_CPUMODE_UNKNOWN, false);
+   M("[vdsox32]", PERF_RECORD_MISC_KERNEL, false);
+   M("[vdsox32]", PERF_RECORD_MISC_USER, false);
+
/* path alloc_name  alloc_ext  kmod   comp   name  ext 
*/
T("[vsyscall]", true  , true , false, false, "[vsyscall]", 
NULL);
T("[vsyscall]", false , true , false, false, NULL, 
NULL);
-- 
1.9.1

Re: [PATCH v2 3/5] venus: add check to make scm calls

2018-06-04 Thread Tomasz Figa

Hi Vikash,

On Sat, Jun 2, 2018 at 5:27 AM Vikash Garodia  wrote:
[snip]
> +int venus_boot(struct venus_core *core)
> +{
> +   phys_addr_t mem_phys;
> +   size_t mem_size;
> +   int ret;
> +   struct device *dev;
> +
> +   if (!IS_ENABLED(CONFIG_QCOM_MDT_LOADER))
> +   return -EPROBE_DEFER;

Why are we deferring probe here? The option will not magically become
enabled after probe is retried.

Best regards,
Tomasz

[PATCH 1/2] perf tests kmod-path: Add tests for vdso32 and vdsox32

2018-06-04 Thread Adrian Hunter

Add tests for vdso32 and vdsox32. This will cause the overall test to fail
because __kmod_path__parse() does not handle vdso32 or vdsox32.

Fixes: 1f121b03d058 ("perf tools: Deal with kernel module names in '[]' 
correctly")
Signed-off-by: Adrian Hunter 
---
 tools/perf/tests/kmod-path.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/tools/perf/tests/kmod-path.c b/tools/perf/tests/kmod-path.c
index 8e57d46109de..148dd31cc201 100644
--- a/tools/perf/tests/kmod-path.c
+++ b/tools/perf/tests/kmod-path.c
@@ -127,6 +127,22 @@ int test__kmod_path__parse(struct test *t __maybe_unused, 
int subtest __maybe_un
M("[vdso]", PERF_RECORD_MISC_KERNEL, false);
M("[vdso]", PERF_RECORD_MISC_USER, false);
 
+   T("[vdso32]", true  , true , false, false, "[vdso32]", NULL);
+   T("[vdso32]", false , true , false, false, NULL, NULL);
+   T("[vdso32]", true  , false, false, false, "[vdso32]", NULL);
+   T("[vdso32]", false , false, false, false, NULL, NULL);
+   M("[vdso32]", PERF_RECORD_MISC_CPUMODE_UNKNOWN, false);
+   M("[vdso32]", PERF_RECORD_MISC_KERNEL, false);
+   M("[vdso32]", PERF_RECORD_MISC_USER, false);
+
+   T("[vdsox32]", true  , true , false, false, "[vdsox32]", NULL);
+   T("[vdsox32]", false , true , false, false, NULL, NULL);
+   T("[vdsox32]", true  , false, false, false, "[vdsox32]", NULL);
+   T("[vdsox32]", false , false, false, false, NULL, NULL);
+   M("[vdsox32]", PERF_RECORD_MISC_CPUMODE_UNKNOWN, false);
+   M("[vdsox32]", PERF_RECORD_MISC_KERNEL, false);
+   M("[vdsox32]", PERF_RECORD_MISC_USER, false);
+
/* path alloc_name  alloc_ext  kmod   comp   name  ext 
*/
T("[vsyscall]", true  , true , false, false, "[vsyscall]", 
NULL);
T("[vsyscall]", false , true , false, false, NULL, 
NULL);
-- 
1.9.1

[PATCH 0/2] perf tools: Fix symbol and object code resolution for vdso32 and vdsox32

2018-06-04 Thread Adrian Hunter

Hi

Here are a couple of small fixes for tracing 32-bit binaries on a 64-bit
kernel.


Adrian Hunter (2):
  perf tests kmod-path: Add tests for vdso32 and vdsox32
  perf tools: Fix symbol and object code resolution for vdso32 and vdsox32

 tools/perf/tests/kmod-path.c | 16 
 tools/perf/util/dso.c|  2 ++
 2 files changed, 18 insertions(+)


Regards
Adrian

[PATCH 0/2] perf tools: Fix symbol and object code resolution for vdso32 and vdsox32

2018-06-04 Thread Adrian Hunter

Hi

Here are a couple of small fixes for tracing 32-bit binaries on a 64-bit
kernel.


Adrian Hunter (2):
  perf tests kmod-path: Add tests for vdso32 and vdsox32
  perf tools: Fix symbol and object code resolution for vdso32 and vdsox32

 tools/perf/tests/kmod-path.c | 16 
 tools/perf/util/dso.c|  2 ++
 2 files changed, 18 insertions(+)


Regards
Adrian

[PATCH 2/2] perf tools: Fix symbol and object code resolution for vdso32 and vdsox32

2018-06-04 Thread Adrian Hunter

Fix __kmod_path__parse() so that perf tools does not treat vdso32 and
vdsox32 as kernel modules and fail to find the object.

Fixes: 1f121b03d058 ("perf tools: Deal with kernel module names in '[]' 
correctly")
Cc: sta...@vger.kernel.org
Signed-off-by: Adrian Hunter 
---
 tools/perf/util/dso.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index cdfc2e5f55f5..51cf82cf1882 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -354,6 +354,8 @@ int __kmod_path__parse(struct kmod_path *m, const char 
*path,
if ((strncmp(name, "[kernel.kallsyms]", 17) == 0) ||
(strncmp(name, "[guest.kernel.kallsyms", 22) == 0) ||
(strncmp(name, "[vdso]", 6) == 0) ||
+   (strncmp(name, "[vdso32]", 8) == 0) ||
+   (strncmp(name, "[vdsox32]", 9) == 0) ||
(strncmp(name, "[vsyscall]", 10) == 0)) {
m->kmod = false;
 
-- 
1.9.1

[PATCH 2/2] perf tools: Fix symbol and object code resolution for vdso32 and vdsox32

2018-06-04 Thread Adrian Hunter

Fix __kmod_path__parse() so that perf tools does not treat vdso32 and
vdsox32 as kernel modules and fail to find the object.

Fixes: 1f121b03d058 ("perf tools: Deal with kernel module names in '[]' 
correctly")
Cc: sta...@vger.kernel.org
Signed-off-by: Adrian Hunter 
---
 tools/perf/util/dso.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index cdfc2e5f55f5..51cf82cf1882 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -354,6 +354,8 @@ int __kmod_path__parse(struct kmod_path *m, const char 
*path,
if ((strncmp(name, "[kernel.kallsyms]", 17) == 0) ||
(strncmp(name, "[guest.kernel.kallsyms", 22) == 0) ||
(strncmp(name, "[vdso]", 6) == 0) ||
+   (strncmp(name, "[vdso32]", 8) == 0) ||
+   (strncmp(name, "[vdsox32]", 9) == 0) ||
(strncmp(name, "[vsyscall]", 10) == 0)) {
m->kmod = false;
 
-- 
1.9.1

Re: [PATCH 1/2] HID: multitouch: report MT_TOOL_PALM for non-confident touches

2018-06-04 Thread Benjamin Tissoires

On Fri, Jun 1, 2018 at 9:03 PM, Henrik Rydberg  wrote:
>
>>> However, I interpret a firmware that send (confidence 1, tip switch 1)
>>> and then (confidence 0, tip switch 0) a simple release, and the
>>> confidence bit should not be relayed.
>>
>> This unfortunately leads to false clicks: you start with finger, so
>> confidence is 1, then you transition the same touch to palm (use your
>> thumb and "roll" your hand until heel of it comes into contact with the
>> screen). The firmware reports "no-confidence" and "release" in the same
>> report and userspace seeing release does not pay attention to confidence
>> (i.e. it does exactly "simple release" logic) and this results in UI
>> interpreting this as a click. With splitting no-confidence
>> (MT_TOOL_PALM) and release event into separate frames we help userspace
>> to recognize that the contact should be discarded.
>
> This is in part why I objected to this patch on August 11th, 2017.
> Logically, the confidence state is a property of a contact, not a new type
> of contact. Trying to use it in any other way is bound to lead to confusion.

Problem is that MT_TOOL_PALM has been introduced in the kernel since
v4.0 (late 2015 by a736775db683 "Input: add MT_TOOL_PALM").
It's been used in the Synaptics RMI4 driver since and by hid-asus in late 2016.
I can't find any other users in the current upstream tree, but those
two are already making a precedent and changing the semantic is a
little bit late :/

Cheers,
Benjamin

>
> Henrik
>

Re: [PATCH 1/2] HID: multitouch: report MT_TOOL_PALM for non-confident touches

2018-06-04 Thread Benjamin Tissoires

On Fri, Jun 1, 2018 at 9:03 PM, Henrik Rydberg  wrote:
>
>>> However, I interpret a firmware that send (confidence 1, tip switch 1)
>>> and then (confidence 0, tip switch 0) a simple release, and the
>>> confidence bit should not be relayed.
>>
>> This unfortunately leads to false clicks: you start with finger, so
>> confidence is 1, then you transition the same touch to palm (use your
>> thumb and "roll" your hand until heel of it comes into contact with the
>> screen). The firmware reports "no-confidence" and "release" in the same
>> report and userspace seeing release does not pay attention to confidence
>> (i.e. it does exactly "simple release" logic) and this results in UI
>> interpreting this as a click. With splitting no-confidence
>> (MT_TOOL_PALM) and release event into separate frames we help userspace
>> to recognize that the contact should be discarded.
>
> This is in part why I objected to this patch on August 11th, 2017.
> Logically, the confidence state is a property of a contact, not a new type
> of contact. Trying to use it in any other way is bound to lead to confusion.

Problem is that MT_TOOL_PALM has been introduced in the kernel since
v4.0 (late 2015 by a736775db683 "Input: add MT_TOOL_PALM").
It's been used in the Synaptics RMI4 driver since and by hid-asus in late 2016.
I can't find any other users in the current upstream tree, but those
two are already making a precedent and changing the semantic is a
little bit late :/

Cheers,
Benjamin

>
> Henrik
>

Re: [PATCH 04/19] sched/numa: Set preferred_node based on best_cpu

2018-06-04 Thread Srikar Dronamraju

* Peter Zijlstra  [2018-06-04 14:18:00]:

> On Mon, Jun 04, 2018 at 03:30:13PM +0530, Srikar Dronamraju wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index ea32a66..94091e6 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1725,8 +1725,9 @@ static int task_numa_migrate(struct task_struct *p)
> >  * Tasks that are "trapped" in such domains cannot be migrated
> >  * elsewhere, so there is no point in (re)trying.
> >  */
> > -   if (unlikely(!sd)) {
> > -   p->numa_preferred_nid = task_node(p);
> > +   if (unlikely(!sd) && p->numa_preferred_nid != task_node(p)) {
> > +   /* Set the new preferred node */
> > +   sched_setnuma(p, task_node(p));
> > return -EINVAL;
> > }
> >  
> 
> That looks dodgy.. this would allow things to continue with !sd.

Okay so are we suggesting something like the below?

if (unlikely(!sd)) {
/* Set the new preferred node */
sched_setnuma(p, task_node(p));
return -EINVAL;
}

The reason for using sched_setnuma was to make sure we account numa
tasks correctly.

Re: [PATCH 04/19] sched/numa: Set preferred_node based on best_cpu

2018-06-04 Thread Srikar Dronamraju

* Peter Zijlstra  [2018-06-04 14:18:00]:

> On Mon, Jun 04, 2018 at 03:30:13PM +0530, Srikar Dronamraju wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index ea32a66..94091e6 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1725,8 +1725,9 @@ static int task_numa_migrate(struct task_struct *p)
> >  * Tasks that are "trapped" in such domains cannot be migrated
> >  * elsewhere, so there is no point in (re)trying.
> >  */
> > -   if (unlikely(!sd)) {
> > -   p->numa_preferred_nid = task_node(p);
> > +   if (unlikely(!sd) && p->numa_preferred_nid != task_node(p)) {
> > +   /* Set the new preferred node */
> > +   sched_setnuma(p, task_node(p));
> > return -EINVAL;
> > }
> >  
> 
> That looks dodgy.. this would allow things to continue with !sd.

Okay so are we suggesting something like the below?

if (unlikely(!sd)) {
/* Set the new preferred node */
sched_setnuma(p, task_node(p));
return -EINVAL;
}

The reason for using sched_setnuma was to make sure we account numa
tasks correctly.

Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-06-04 Thread Johannes Thumshirn

On Mon, Jun 04, 2018 at 02:46:47PM +0300, Sagi Grimberg wrote:
> I agree with Christoph that changing personality on the fly is going to
> be painful. This opt-in will need to be one-host at connect time. For
> that, we will probably need to also expose an argument in nvme-cli too.
> Changing the mpath personality will need to involve disconnecting the
> controller and connecting again with the argument toggled. I think this
> is the only sane way to do this.

If we still want to make it dynamically, yes. I've raised this concern
while working on the patch as well.

> Another path we can make progress in is user visibility. We have
> topology in place and you mentioned primary path (which we could
> probably add). What else do you need for multipath-tools to support
> nvme?

I think the first priority is getting nvme notion into multipath-tools
like I said elsewhere and then see. Martin Wilck was already working
on patches for this.

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-06-04 Thread Johannes Thumshirn

On Mon, Jun 04, 2018 at 02:46:47PM +0300, Sagi Grimberg wrote:
> I agree with Christoph that changing personality on the fly is going to
> be painful. This opt-in will need to be one-host at connect time. For
> that, we will probably need to also expose an argument in nvme-cli too.
> Changing the mpath personality will need to involve disconnecting the
> controller and connecting again with the argument toggled. I think this
> is the only sane way to do this.

If we still want to make it dynamically, yes. I've raised this concern
while working on the patch as well.

> Another path we can make progress in is user visibility. We have
> topology in place and you mentioned primary path (which we could
> probably add). What else do you need for multipath-tools to support
> nvme?

I think the first priority is getting nvme notion into multipath-tools
like I said elsewhere and then see. Martin Wilck was already working
on patches for this.

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v7 00/17] Improve shrink_slab() scalability (old complexity was O(n^2), new is O(n))

2018-06-04 Thread Kirill Tkhai

Hi, Andrew!

This patchset is reviewed by Vladimir Davydov. I see, there is
minor change in current linux-next.git, which makes the second
patch to apply not completely clean.

Could you tell what should I do with this? Is this OK or should
I rebase it on top of linux.next or do something else?

Thanks,
Kirill

On 22.05.2018 13:07, Kirill Tkhai wrote:
> Hi,
> 
> this patches solves the problem with slow shrink_slab() occuring
> on the machines having many shrinkers and memory cgroups (i.e.,
> with many containers). The problem is complexity of shrink_slab()
> is O(n^2) and it grows too fast with the growth of containers
> numbers.
> 
> Let we have 200 containers, and every container has 10 mounts
> and 10 cgroups. All container tasks are isolated, and they don't
> touch foreign containers mounts.
> 
> In case of global reclaim, a task has to iterate all over the memcgs
> and to call all the memcg-aware shrinkers for all of them. This means,
> the task has to visit 200 * 10 = 2000 shrinkers for every memcg,
> and since there are 2000 memcgs, the total calls of do_shrink_slab()
> are 2000 * 2000 = 400.
> 
> 4 million calls are not a number operations, which can takes 1 cpu cycle.
> E.g., super_cache_count() accesses at least two lists, and makes arifmetical
> calculations. Even, if there are no charged objects, we do these calculations,
> and replaces cpu caches by read memory. I observed nodes spending almost 100%
> time in kernel, in case of intensive writing and global reclaim. The writer
> consumes pages fast, but it's need to shrink_slab() before the reclaimer
> reached shrink pages function (and frees SWAP_CLUSTER_MAX pages). Even if
> there is no writing, the iterations just waste the time, and slows reclaim 
> down.
> 
> Let's see the small test below:
> 
> $echo 1 > /sys/fs/cgroup/memory/memory.use_hierarchy
> $mkdir /sys/fs/cgroup/memory/ct
> $echo 4000M > /sys/fs/cgroup/memory/ct/memory.kmem.limit_in_bytes
> $for i in `seq 0 4000`;
>   do mkdir /sys/fs/cgroup/memory/ct/$i;
>   echo $$ > /sys/fs/cgroup/memory/ct/$i/cgroup.procs;
>   mkdir -p s/$i; mount -t tmpfs $i s/$i; touch s/$i/file;
> done
> 
> Then, let's see drop caches time (5 sequential calls):
> $time echo 3 > /proc/sys/vm/drop_caches
> 
> 0.00user 13.78system 0:13.78elapsed 99%CPU
> 0.00user 5.59system 0:05.60elapsed 99%CPU
> 0.00user 5.48system 0:05.48elapsed 99%CPU
> 0.00user 8.35system 0:08.35elapsed 99%CPU
> 0.00user 8.34system 0:08.35elapsed 99%CPU
> 
> 
> Last four calls don't actually shrink something. So, the iterations
> over slab shrinkers take 5.48 seconds. Not so good for scalability.
> 
> The patchset solves the problem by making shrink_slab() of O(n)
> complexity. There are following functional actions:
> 
> 1)Assign id to every registered memcg-aware shrinker.
> 2)Maintain per-memcgroup bitmap of memcg-aware shrinkers,
>   and set a shrinker-related bit after the first element
>   is added to lru list (also, when removed child memcg
>   elements are reparanted).
> 3)Split memcg-aware shrinkers and !memcg-aware shrinkers,
>   and call a shrinker if its bit is set in memcg's shrinker
>   bitmap.
>   (Also, there is a functionality to clear the bit, after
>   last element is shrinked).
> 
> This gives signify performance increase. The result after patchset is applied:
> 
> $time echo 3 > /proc/sys/vm/drop_caches
> 
> 0.00user 1.10system 0:01.10elapsed 99%CPU
> 0.00user 0.00system 0:00.01elapsed 64%CPU
> 0.00user 0.01system 0:00.01elapsed 82%CPU
> 0.00user 0.00system 0:00.01elapsed 64%CPU
> 0.00user 0.01system 0:00.01elapsed 82%CPU
> 
> The results show the performance increases at least in 548 times.
> 
> So, the patchset makes shrink_slab() of less complexity and improves
> the performance in such types of load I pointed. This will give a profit
> in case of !global reclaim case, since there also will be less
> do_shrink_slab() calls.
> 
> This patchset is made against linux-next.git tree.
> 
> v7: Refactorings and readability improvements.
> 
> v6: Added missed rcu_dereference() to memcg_set_shrinker_bit().
> Use different functions for allocation and expanding map.
> Use new memcg_shrinker_map_size variable in memcontrol.c.
> Refactorings.
> 
> v5: Make the optimizing logic under CONFIG_MEMCG_SHRINKER instead of MEMCG && 
> !SLOB
> 
> v4: Do not use memcg mem_cgroup_idr for iteration over mem cgroups
> 
> v3: Many changes requested in commentaries to v2:
> 
> 1)rebase on prealloc_shrinker() code base
> 2)root_mem_cgroup is made out of memcg maps
> 3)rwsem replaced with shrinkers_nr_max_mutex
> 4)changes around assignment of shrinker id to list lru
> 5)everything renamed
> 
> v2: Many changes requested in commentaries to v1:
> 
> 1)the code mostly moved to mm/memcontrol.c;
> 2)using IDR instead of array of shrinkers;
> 3)added a possibility to assign list_lru shrinker id
>   at the time of shrinker registering;
> 4)reorginized locking and renamed functions and variables.
> 
> ---
> 
> Kirill

Re: [PATCH v7 00/17] Improve shrink_slab() scalability (old complexity was O(n^2), new is O(n))

2018-06-04 Thread Kirill Tkhai

Hi, Andrew!

This patchset is reviewed by Vladimir Davydov. I see, there is
minor change in current linux-next.git, which makes the second
patch to apply not completely clean.

Could you tell what should I do with this? Is this OK or should
I rebase it on top of linux.next or do something else?

Thanks,
Kirill

On 22.05.2018 13:07, Kirill Tkhai wrote:
> Hi,
> 
> this patches solves the problem with slow shrink_slab() occuring
> on the machines having many shrinkers and memory cgroups (i.e.,
> with many containers). The problem is complexity of shrink_slab()
> is O(n^2) and it grows too fast with the growth of containers
> numbers.
> 
> Let we have 200 containers, and every container has 10 mounts
> and 10 cgroups. All container tasks are isolated, and they don't
> touch foreign containers mounts.
> 
> In case of global reclaim, a task has to iterate all over the memcgs
> and to call all the memcg-aware shrinkers for all of them. This means,
> the task has to visit 200 * 10 = 2000 shrinkers for every memcg,
> and since there are 2000 memcgs, the total calls of do_shrink_slab()
> are 2000 * 2000 = 400.
> 
> 4 million calls are not a number operations, which can takes 1 cpu cycle.
> E.g., super_cache_count() accesses at least two lists, and makes arifmetical
> calculations. Even, if there are no charged objects, we do these calculations,
> and replaces cpu caches by read memory. I observed nodes spending almost 100%
> time in kernel, in case of intensive writing and global reclaim. The writer
> consumes pages fast, but it's need to shrink_slab() before the reclaimer
> reached shrink pages function (and frees SWAP_CLUSTER_MAX pages). Even if
> there is no writing, the iterations just waste the time, and slows reclaim 
> down.
> 
> Let's see the small test below:
> 
> $echo 1 > /sys/fs/cgroup/memory/memory.use_hierarchy
> $mkdir /sys/fs/cgroup/memory/ct
> $echo 4000M > /sys/fs/cgroup/memory/ct/memory.kmem.limit_in_bytes
> $for i in `seq 0 4000`;
>   do mkdir /sys/fs/cgroup/memory/ct/$i;
>   echo $$ > /sys/fs/cgroup/memory/ct/$i/cgroup.procs;
>   mkdir -p s/$i; mount -t tmpfs $i s/$i; touch s/$i/file;
> done
> 
> Then, let's see drop caches time (5 sequential calls):
> $time echo 3 > /proc/sys/vm/drop_caches
> 
> 0.00user 13.78system 0:13.78elapsed 99%CPU
> 0.00user 5.59system 0:05.60elapsed 99%CPU
> 0.00user 5.48system 0:05.48elapsed 99%CPU
> 0.00user 8.35system 0:08.35elapsed 99%CPU
> 0.00user 8.34system 0:08.35elapsed 99%CPU
> 
> 
> Last four calls don't actually shrink something. So, the iterations
> over slab shrinkers take 5.48 seconds. Not so good for scalability.
> 
> The patchset solves the problem by making shrink_slab() of O(n)
> complexity. There are following functional actions:
> 
> 1)Assign id to every registered memcg-aware shrinker.
> 2)Maintain per-memcgroup bitmap of memcg-aware shrinkers,
>   and set a shrinker-related bit after the first element
>   is added to lru list (also, when removed child memcg
>   elements are reparanted).
> 3)Split memcg-aware shrinkers and !memcg-aware shrinkers,
>   and call a shrinker if its bit is set in memcg's shrinker
>   bitmap.
>   (Also, there is a functionality to clear the bit, after
>   last element is shrinked).
> 
> This gives signify performance increase. The result after patchset is applied:
> 
> $time echo 3 > /proc/sys/vm/drop_caches
> 
> 0.00user 1.10system 0:01.10elapsed 99%CPU
> 0.00user 0.00system 0:00.01elapsed 64%CPU
> 0.00user 0.01system 0:00.01elapsed 82%CPU
> 0.00user 0.00system 0:00.01elapsed 64%CPU
> 0.00user 0.01system 0:00.01elapsed 82%CPU
> 
> The results show the performance increases at least in 548 times.
> 
> So, the patchset makes shrink_slab() of less complexity and improves
> the performance in such types of load I pointed. This will give a profit
> in case of !global reclaim case, since there also will be less
> do_shrink_slab() calls.
> 
> This patchset is made against linux-next.git tree.
> 
> v7: Refactorings and readability improvements.
> 
> v6: Added missed rcu_dereference() to memcg_set_shrinker_bit().
> Use different functions for allocation and expanding map.
> Use new memcg_shrinker_map_size variable in memcontrol.c.
> Refactorings.
> 
> v5: Make the optimizing logic under CONFIG_MEMCG_SHRINKER instead of MEMCG && 
> !SLOB
> 
> v4: Do not use memcg mem_cgroup_idr for iteration over mem cgroups
> 
> v3: Many changes requested in commentaries to v2:
> 
> 1)rebase on prealloc_shrinker() code base
> 2)root_mem_cgroup is made out of memcg maps
> 3)rwsem replaced with shrinkers_nr_max_mutex
> 4)changes around assignment of shrinker id to list lru
> 5)everything renamed
> 
> v2: Many changes requested in commentaries to v1:
> 
> 1)the code mostly moved to mm/memcontrol.c;
> 2)using IDR instead of array of shrinkers;
> 3)added a possibility to assign list_lru shrinker id
>   at the time of shrinker registering;
> 4)reorginized locking and renamed functions and variables.
> 
> ---
> 
> Kirill

[PATCH] ath10k: use dma_zalloc_coherent instead of allocator/memset

2018-06-04 Thread YueHaibing

Use dma_zalloc_coherent instead of dma_alloc_coherent
followed by memset 0.

Signed-off-by: YueHaibing 
---
 drivers/net/wireless/ath/ath10k/wmi.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/wmi.c 
b/drivers/net/wireless/ath/ath10k/wmi.c
index f97ab79..72db3bd 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -5018,13 +5018,11 @@ static int ath10k_wmi_alloc_chunk(struct ath10k *ar, 
u32 req_id,
void *vaddr;
 
pool_size = num_units * round_up(unit_len, 4);
-   vaddr = dma_alloc_coherent(ar->dev, pool_size, , GFP_KERNEL);
+   vaddr = dma_zalloc_coherent(ar->dev, pool_size, , GFP_KERNEL);
 
if (!vaddr)
return -ENOMEM;
 
-   memset(vaddr, 0, pool_size);
-
ar->wmi.mem_chunks[idx].vaddr = vaddr;
ar->wmi.mem_chunks[idx].paddr = paddr;
ar->wmi.mem_chunks[idx].len = pool_size;
-- 
2.7.0

[PATCH] ath10k: use dma_zalloc_coherent instead of allocator/memset

2018-06-04 Thread YueHaibing

Use dma_zalloc_coherent instead of dma_alloc_coherent
followed by memset 0.

Signed-off-by: YueHaibing 
---
 drivers/net/wireless/ath/ath10k/wmi.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/wmi.c 
b/drivers/net/wireless/ath/ath10k/wmi.c
index f97ab79..72db3bd 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -5018,13 +5018,11 @@ static int ath10k_wmi_alloc_chunk(struct ath10k *ar, 
u32 req_id,
void *vaddr;
 
pool_size = num_units * round_up(unit_len, 4);
-   vaddr = dma_alloc_coherent(ar->dev, pool_size, , GFP_KERNEL);
+   vaddr = dma_zalloc_coherent(ar->dev, pool_size, , GFP_KERNEL);
 
if (!vaddr)
return -ENOMEM;
 
-   memset(vaddr, 0, pool_size);
-
ar->wmi.mem_chunks[idx].vaddr = vaddr;
ar->wmi.mem_chunks[idx].paddr = paddr;
ar->wmi.mem_chunks[idx].len = pool_size;
-- 
2.7.0

Re: [PATCH v5 05/10] cpufreq/schedutil: get max utilization

2018-06-04 Thread Vincent Guittot

On 4 June 2018 at 12:12, Juri Lelli  wrote:
> On 04/06/18 09:14, Vincent Guittot wrote:
>> On 4 June 2018 at 09:04, Juri Lelli  wrote:
>> > Hi Vincent,
>> >
>> > On 04/06/18 08:41, Vincent Guittot wrote:
>> >> On 1 June 2018 at 19:45, Joel Fernandes  wrote:
>> >> > On Fri, Jun 01, 2018 at 03:53:07PM +0200, Vincent Guittot wrote:
>> >
>> > [...]
>> >
>> >> > IMO I feel its overkill to account dl_avg when we already have DL's 
>> >> > running
>> >> > bandwidth we can use. I understand it may be too instanenous, but 
>> >> > perhaps we
>> >>
>> >> We keep using dl bandwidth which is quite correct for dl needs but
>> >> doesn't reflect how it has disturbed other classes
>> >>
>> >> > can fix CFS's problems within CFS itself and not have to do this kind of
>> >> > extra external accounting ?
>> >
>> > I would also keep accounting for waiting time due to higher prio classes
>> > all inside CFS. My impression, when discussing it with you on IRC, was
>> > that we should be able to do that by not decaying cfs.util_avg when CFS
>> > is preempted (creating a new signal for it). Is not this enough?
>>
>> We don't just want to not decay a signal but increase the signal to
>> reflect the amount of preemption
>
> OK.
>
>> Then, we can't do that in a current signal. So you would like to add
>> another metrics in cfs_rq ?
>
> Since it's CFS related, I'd say it should fit in CFS.

It's dl and cfs as the goal is to track cfs preempted by dl
This means creating a new struct whereas some fields are unused in avg_dl struct
And duplicate some call to ___update_load_sum as we track avg_dl for
removing sched_rt_avg_update
and update_dl/rt_rq_load_avg are already call in fair.c for updating
blocked load

>
>> The place doesn't really matter to be honest in cfs_rq or in dl_rq but
>> you will not prevent to add call in dl class to start/stop the
>> accounting of the preemption
>>
>> >
>> > I feel we should try to keep cross-class accounting/interaction at a
>> > minimum.
>>
>> accounting for cross class preemption can't be done without
>> cross-class accounting
>
> Mmm, can't we distinguish in, say, pick_next_task_fair() if prev was of
> higher prio class and act accordingly?

we will not be able to make the difference between rt/dl/stop
preemption by using only pick_next_task_fair

Thanks

>
> Thanks,
>
> - Juri

Re: [PATCH v5 05/10] cpufreq/schedutil: get max utilization

2018-06-04 Thread Vincent Guittot

On 4 June 2018 at 12:12, Juri Lelli  wrote:
> On 04/06/18 09:14, Vincent Guittot wrote:
>> On 4 June 2018 at 09:04, Juri Lelli  wrote:
>> > Hi Vincent,
>> >
>> > On 04/06/18 08:41, Vincent Guittot wrote:
>> >> On 1 June 2018 at 19:45, Joel Fernandes  wrote:
>> >> > On Fri, Jun 01, 2018 at 03:53:07PM +0200, Vincent Guittot wrote:
>> >
>> > [...]
>> >
>> >> > IMO I feel its overkill to account dl_avg when we already have DL's 
>> >> > running
>> >> > bandwidth we can use. I understand it may be too instanenous, but 
>> >> > perhaps we
>> >>
>> >> We keep using dl bandwidth which is quite correct for dl needs but
>> >> doesn't reflect how it has disturbed other classes
>> >>
>> >> > can fix CFS's problems within CFS itself and not have to do this kind of
>> >> > extra external accounting ?
>> >
>> > I would also keep accounting for waiting time due to higher prio classes
>> > all inside CFS. My impression, when discussing it with you on IRC, was
>> > that we should be able to do that by not decaying cfs.util_avg when CFS
>> > is preempted (creating a new signal for it). Is not this enough?
>>
>> We don't just want to not decay a signal but increase the signal to
>> reflect the amount of preemption
>
> OK.
>
>> Then, we can't do that in a current signal. So you would like to add
>> another metrics in cfs_rq ?
>
> Since it's CFS related, I'd say it should fit in CFS.

It's dl and cfs as the goal is to track cfs preempted by dl
This means creating a new struct whereas some fields are unused in avg_dl struct
And duplicate some call to ___update_load_sum as we track avg_dl for
removing sched_rt_avg_update
and update_dl/rt_rq_load_avg are already call in fair.c for updating
blocked load

>
>> The place doesn't really matter to be honest in cfs_rq or in dl_rq but
>> you will not prevent to add call in dl class to start/stop the
>> accounting of the preemption
>>
>> >
>> > I feel we should try to keep cross-class accounting/interaction at a
>> > minimum.
>>
>> accounting for cross class preemption can't be done without
>> cross-class accounting
>
> Mmm, can't we distinguish in, say, pick_next_task_fair() if prev was of
> higher prio class and act accordingly?

we will not be able to make the difference between rt/dl/stop
preemption by using only pick_next_task_fair

Thanks

>
> Thanks,
>
> - Juri

[PATCH 2/2] platform/x86: asus-wmi: Add keyboard backlight toggle support

2018-06-04 Thread Chris Chiu

Some ASUS laptops like UX550GE has hotkey (Fn+F7) for keyboard
backlight toggle which would emit the scan code 0xc7 each keypress.
On the UX550GE, the max keyboard brightness level is 3 so the
toggle would not be simply on/off the led but need to be cyclic.
Per ASUS spec, it should increment the brightness for each keypress,
then toggle(off) the LED when it already reached the max level.

Signed-off-by: Chris Chiu 
---
 drivers/platform/x86/asus-wmi.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index b4915b7718c1..100e13e0817e 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -67,6 +67,7 @@ MODULE_LICENSE("GPL");
 #define NOTIFY_BRNDOWN_MAX 0x2e
 #define NOTIFY_KBD_BRTUP   0xc4
 #define NOTIFY_KBD_BRTDWN  0xc5
+#define NOTIFY_KBD_BRTTOGGLE   0xc7
 
 /* WMI Methods */
 #define ASUS_WMI_METHODID_SPEC 0x43455053 /* BIOS SPECification */
@@ -1704,7 +1705,9 @@ static int is_display_toggle(int code)
 
 static int is_kbd_led_event(int code)
 {
-   if (code == NOTIFY_KBD_BRTUP || code == NOTIFY_KBD_BRTDWN)
+   if (code == NOTIFY_KBD_BRTUP ||
+   code ==  NOTIFY_KBD_BRTDWN ||
+   code ==  NOTIFY_KBD_BRTTOGGLE)
return 1;
return 0;
 }
@@ -1755,7 +1758,10 @@ static void asus_wmi_notify(u32 value, void *context)
}
 
if (is_kbd_led_event(code)) {
-   if (code == NOTIFY_KBD_BRTDWN)
+   if (code == NOTIFY_KBD_BRTTOGGLE &&
+   asus->kbd_led_wk == asus->kbd_led.max_brightness)
+   kbd_led_set(>kbd_led, 0);
+   else if (code == NOTIFY_KBD_BRTDWN)
kbd_led_set(>kbd_led, asus->kbd_led_wk - 1);
else
kbd_led_set(>kbd_led, asus->kbd_led_wk + 1);
-- 
2.11.0

[PATCH 2/2] platform/x86: asus-wmi: Add keyboard backlight toggle support

2018-06-04 Thread Chris Chiu

Some ASUS laptops like UX550GE has hotkey (Fn+F7) for keyboard
backlight toggle which would emit the scan code 0xc7 each keypress.
On the UX550GE, the max keyboard brightness level is 3 so the
toggle would not be simply on/off the led but need to be cyclic.
Per ASUS spec, it should increment the brightness for each keypress,
then toggle(off) the LED when it already reached the max level.

Signed-off-by: Chris Chiu 
---
 drivers/platform/x86/asus-wmi.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index b4915b7718c1..100e13e0817e 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -67,6 +67,7 @@ MODULE_LICENSE("GPL");
 #define NOTIFY_BRNDOWN_MAX 0x2e
 #define NOTIFY_KBD_BRTUP   0xc4
 #define NOTIFY_KBD_BRTDWN  0xc5
+#define NOTIFY_KBD_BRTTOGGLE   0xc7
 
 /* WMI Methods */
 #define ASUS_WMI_METHODID_SPEC 0x43455053 /* BIOS SPECification */
@@ -1704,7 +1705,9 @@ static int is_display_toggle(int code)
 
 static int is_kbd_led_event(int code)
 {
-   if (code == NOTIFY_KBD_BRTUP || code == NOTIFY_KBD_BRTDWN)
+   if (code == NOTIFY_KBD_BRTUP ||
+   code ==  NOTIFY_KBD_BRTDWN ||
+   code ==  NOTIFY_KBD_BRTTOGGLE)
return 1;
return 0;
 }
@@ -1755,7 +1758,10 @@ static void asus_wmi_notify(u32 value, void *context)
}
 
if (is_kbd_led_event(code)) {
-   if (code == NOTIFY_KBD_BRTDWN)
+   if (code == NOTIFY_KBD_BRTTOGGLE &&
+   asus->kbd_led_wk == asus->kbd_led.max_brightness)
+   kbd_led_set(>kbd_led, 0);
+   else if (code == NOTIFY_KBD_BRTDWN)
kbd_led_set(>kbd_led, asus->kbd_led_wk - 1);
else
kbd_led_set(>kbd_led, asus->kbd_led_wk + 1);
-- 
2.11.0

[PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-04 Thread Chris Chiu

Make asus-wmi notify on hotkey kbd brightness changes, listen for
brightness events and update the brightness directly in the driver.
For this purpose, bound check on brightness in kbd_led_set must be
based on the same data type to prevent illegal value been set.

Update the brightness by led_classdev_notify_brightness_hw_changed.
This will allow userspace to monitor (poll) for brightness changes
on the LED without reporting via input keymapping.

Signed-off-by: Chris Chiu 
---
 drivers/platform/x86/asus-nb-wmi.c |  2 --
 drivers/platform/x86/asus-wmi.c| 21 +++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/platform/x86/asus-nb-wmi.c 
b/drivers/platform/x86/asus-nb-wmi.c
index 136ff2b4cce5..7ce80e4bb5a3 100644
--- a/drivers/platform/x86/asus-nb-wmi.c
+++ b/drivers/platform/x86/asus-nb-wmi.c
@@ -493,8 +493,6 @@ static const struct key_entry asus_nb_wmi_keymap[] = {
{ KE_KEY, 0xA6, { KEY_SWITCHVIDEOMODE } }, /* SDSP CRT + TV + HDMI */
{ KE_KEY, 0xA7, { KEY_SWITCHVIDEOMODE } }, /* SDSP LCD + CRT + TV + 
HDMI */
{ KE_KEY, 0xB5, { KEY_CALC } },
-   { KE_KEY, 0xC4, { KEY_KBDILLUMUP } },
-   { KE_KEY, 0xC5, { KEY_KBDILLUMDOWN } },
{ KE_IGNORE, 0xC6, },  /* Ambient Light Sensor notification */
{ KE_END, 0},
 };
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index 1f6e68f0b646..b4915b7718c1 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -460,6 +460,7 @@ static void kbd_led_update(struct work_struct *work)
ctrl_param = 0x80 | (asus->kbd_led_wk & 0x7F);
 
asus_wmi_set_devstate(ASUS_WMI_DEVID_KBD_BACKLIGHT, ctrl_param, NULL);
+   led_classdev_notify_brightness_hw_changed(>kbd_led, 
asus->kbd_led_wk);
 }
 
 static int kbd_led_read(struct asus_wmi *asus, int *level, int *env)
@@ -497,9 +498,9 @@ static void kbd_led_set(struct led_classdev *led_cdev,
 
asus = container_of(led_cdev, struct asus_wmi, kbd_led);
 
-   if (value > asus->kbd_led.max_brightness)
+   if ((int)value > (int)asus->kbd_led.max_brightness)
value = asus->kbd_led.max_brightness;
-   else if (value < 0)
+   else if ((int)value < 0)
value = 0;
 
asus->kbd_led_wk = value;
@@ -656,6 +657,7 @@ static int asus_wmi_led_init(struct asus_wmi *asus)
 
asus->kbd_led_wk = led_val;
asus->kbd_led.name = "asus::kbd_backlight";
+   asus->kbd_led.flags = LED_BRIGHT_HW_CHANGED;
asus->kbd_led.brightness_set = kbd_led_set;
asus->kbd_led.brightness_get = kbd_led_get;
asus->kbd_led.max_brightness = 3;
@@ -1700,6 +1702,13 @@ static int is_display_toggle(int code)
return 0;
 }
 
+static int is_kbd_led_event(int code)
+{
+   if (code == NOTIFY_KBD_BRTUP || code == NOTIFY_KBD_BRTDWN)
+   return 1;
+   return 0;
+}
+
 static void asus_wmi_notify(u32 value, void *context)
 {
struct asus_wmi *asus = context;
@@ -1745,6 +1754,14 @@ static void asus_wmi_notify(u32 value, void *context)
}
}
 
+   if (is_kbd_led_event(code)) {
+   if (code == NOTIFY_KBD_BRTDWN)
+   kbd_led_set(>kbd_led, asus->kbd_led_wk - 1);
+   else
+   kbd_led_set(>kbd_led, asus->kbd_led_wk + 1);
+   goto exit;
+   }
+
if (is_display_toggle(code) &&
asus->driver->quirks->no_display_toggle)
goto exit;
-- 
2.11.0

[PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-04 Thread Chris Chiu

Make asus-wmi notify on hotkey kbd brightness changes, listen for
brightness events and update the brightness directly in the driver.
For this purpose, bound check on brightness in kbd_led_set must be
based on the same data type to prevent illegal value been set.

Update the brightness by led_classdev_notify_brightness_hw_changed.
This will allow userspace to monitor (poll) for brightness changes
on the LED without reporting via input keymapping.

Signed-off-by: Chris Chiu 
---
 drivers/platform/x86/asus-nb-wmi.c |  2 --
 drivers/platform/x86/asus-wmi.c| 21 +++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/platform/x86/asus-nb-wmi.c 
b/drivers/platform/x86/asus-nb-wmi.c
index 136ff2b4cce5..7ce80e4bb5a3 100644
--- a/drivers/platform/x86/asus-nb-wmi.c
+++ b/drivers/platform/x86/asus-nb-wmi.c
@@ -493,8 +493,6 @@ static const struct key_entry asus_nb_wmi_keymap[] = {
{ KE_KEY, 0xA6, { KEY_SWITCHVIDEOMODE } }, /* SDSP CRT + TV + HDMI */
{ KE_KEY, 0xA7, { KEY_SWITCHVIDEOMODE } }, /* SDSP LCD + CRT + TV + 
HDMI */
{ KE_KEY, 0xB5, { KEY_CALC } },
-   { KE_KEY, 0xC4, { KEY_KBDILLUMUP } },
-   { KE_KEY, 0xC5, { KEY_KBDILLUMDOWN } },
{ KE_IGNORE, 0xC6, },  /* Ambient Light Sensor notification */
{ KE_END, 0},
 };
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index 1f6e68f0b646..b4915b7718c1 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -460,6 +460,7 @@ static void kbd_led_update(struct work_struct *work)
ctrl_param = 0x80 | (asus->kbd_led_wk & 0x7F);
 
asus_wmi_set_devstate(ASUS_WMI_DEVID_KBD_BACKLIGHT, ctrl_param, NULL);
+   led_classdev_notify_brightness_hw_changed(>kbd_led, 
asus->kbd_led_wk);
 }
 
 static int kbd_led_read(struct asus_wmi *asus, int *level, int *env)
@@ -497,9 +498,9 @@ static void kbd_led_set(struct led_classdev *led_cdev,
 
asus = container_of(led_cdev, struct asus_wmi, kbd_led);
 
-   if (value > asus->kbd_led.max_brightness)
+   if ((int)value > (int)asus->kbd_led.max_brightness)
value = asus->kbd_led.max_brightness;
-   else if (value < 0)
+   else if ((int)value < 0)
value = 0;
 
asus->kbd_led_wk = value;
@@ -656,6 +657,7 @@ static int asus_wmi_led_init(struct asus_wmi *asus)
 
asus->kbd_led_wk = led_val;
asus->kbd_led.name = "asus::kbd_backlight";
+   asus->kbd_led.flags = LED_BRIGHT_HW_CHANGED;
asus->kbd_led.brightness_set = kbd_led_set;
asus->kbd_led.brightness_get = kbd_led_get;
asus->kbd_led.max_brightness = 3;
@@ -1700,6 +1702,13 @@ static int is_display_toggle(int code)
return 0;
 }
 
+static int is_kbd_led_event(int code)
+{
+   if (code == NOTIFY_KBD_BRTUP || code == NOTIFY_KBD_BRTDWN)
+   return 1;
+   return 0;
+}
+
 static void asus_wmi_notify(u32 value, void *context)
 {
struct asus_wmi *asus = context;
@@ -1745,6 +1754,14 @@ static void asus_wmi_notify(u32 value, void *context)
}
}
 
+   if (is_kbd_led_event(code)) {
+   if (code == NOTIFY_KBD_BRTDWN)
+   kbd_led_set(>kbd_led, asus->kbd_led_wk - 1);
+   else
+   kbd_led_set(>kbd_led, asus->kbd_led_wk + 1);
+   goto exit;
+   }
+
if (is_display_toggle(code) &&
asus->driver->quirks->no_display_toggle)
goto exit;
-- 
2.11.0

Re: [PATCH] rtc: sunxi: fix possible race condition

2018-06-04 Thread Maxime Ripard

On Mon, Jun 04, 2018 at 02:05:40PM +0200, Alexandre Belloni wrote:
> The IRQ is requested before the struct rtc is allocated and registered, but
> this struct is used in the IRQ handler. This may lead to a NULL pointer
> dereference.
> 
> Switch to devm_rtc_allocate_device/rtc_register_device to allocate the rtc
> before requesting the IRQ.
> 
> Signed-off-by: Alexandre Belloni 

Acked-by: Maxime Ripard 

Maxime

-- 
Maxime Ripard, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com


signature.asc
Description: PGP signature

Re: [PATCH] rtc: sunxi: fix possible race condition

2018-06-04 Thread Maxime Ripard

On Mon, Jun 04, 2018 at 02:05:40PM +0200, Alexandre Belloni wrote:
> The IRQ is requested before the struct rtc is allocated and registered, but
> this struct is used in the IRQ handler. This may lead to a NULL pointer
> dereference.
> 
> Switch to devm_rtc_allocate_device/rtc_register_device to allocate the rtc
> before requesting the IRQ.
> 
> Signed-off-by: Alexandre Belloni 

Acked-by: Maxime Ripard 

Maxime

-- 
Maxime Ripard, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com


signature.asc
Description: PGP signature

Re: [PATCH 2/2] mm: don't skip memory guarantee calculations

2018-06-04 Thread Michal Hocko

On Tue 22-05-18 14:25:28, Roman Gushchin wrote:
> There are two cases when effective memory guarantee calculation
> is mistakenly skipped:
> 
> 1) If memcg is a child of the root cgroup, and the root
> cgroup is not root_mem_cgroup (in other words, if the reclaim
> is targeted). Top-level memory cgroups are handled specially
> in mem_cgroup_protected(), because the root memory cgroup doesn't
> have memory guarantee and can't limit its children guarantees.
> So, all effective guarantee calculation is skipped.
> But in case of targeted reclaim things are different:
> cgroups, which parent exceeded its memory limit aren't special.
> 
> 2) If memcg has no charged memory (memory usage is 0). In this
> case mem_cgroup_protected() always returns MEMCG_PROT_NONE, which
> is correct and prevents to generate fake memory low events for
> empty cgroups. But skipping memory emin/elow calculation is wrong:
> if there is no global memory pressure there might be no good
> chance again, so we can end up with effective guarantees set to 0
> without any reason.

Roman, so these two patches are on top of the min limit patches, right?
The fact that they come after just makes me feel this whole thing is not
completely thought through and I would like to see all 4 patch in one
series describing the whole design. We are getting really close to the
merge window and last minute updates makes me really nervouse. Can you
please repost the whole thing after the merge window, please?

As I've said earlier I am not even sure we really want to have a hard
guarantee once we decided to go with low limit. So a very good reasoning
should be added for the whole thing.

Thanks!
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 2/2] mm: don't skip memory guarantee calculations

2018-06-04 Thread Michal Hocko

On Tue 22-05-18 14:25:28, Roman Gushchin wrote:
> There are two cases when effective memory guarantee calculation
> is mistakenly skipped:
> 
> 1) If memcg is a child of the root cgroup, and the root
> cgroup is not root_mem_cgroup (in other words, if the reclaim
> is targeted). Top-level memory cgroups are handled specially
> in mem_cgroup_protected(), because the root memory cgroup doesn't
> have memory guarantee and can't limit its children guarantees.
> So, all effective guarantee calculation is skipped.
> But in case of targeted reclaim things are different:
> cgroups, which parent exceeded its memory limit aren't special.
> 
> 2) If memcg has no charged memory (memory usage is 0). In this
> case mem_cgroup_protected() always returns MEMCG_PROT_NONE, which
> is correct and prevents to generate fake memory low events for
> empty cgroups. But skipping memory emin/elow calculation is wrong:
> if there is no global memory pressure there might be no good
> chance again, so we can end up with effective guarantees set to 0
> without any reason.

Roman, so these two patches are on top of the min limit patches, right?
The fact that they come after just makes me feel this whole thing is not
completely thought through and I would like to see all 4 patch in one
series describing the whole design. We are getting really close to the
merge window and last minute updates makes me really nervouse. Can you
please repost the whole thing after the merge window, please?

As I've said earlier I am not even sure we really want to have a hard
guarantee once we decided to go with low limit. So a very good reasoning
should be added for the whole thing.

Thanks!
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 4/8] serial: 8250: Handle case port doesn't have TEMT interrupt using em485.

2018-06-04 Thread Andy Shevchenko

On Mon, 2018-06-04 at 13:50 +0200, Giulio Benetti wrote:
> Hi,
> 
> Il 04/06/2018 13:38, Andy Shevchenko ha scritto:
> > On Mon, 2018-06-04 at 12:50 +0200, Giulio Benetti wrote:
> > > Hi,
> > > 
> > > Il 04/06/2018 12:17, Andy Shevchenko ha scritto:
> > > > On Fri, 2018-06-01 at 14:40 +0200, Giulio Benetti wrote:
> > > > > Some 8250 ports only have TEMT interrupt, so current
> > > > > implementation
> > > > > can't work for ports without it. The only chance to make it
> > > > > work
> > > > > is to
> > > > > loop-read on LSR register.
> > > > > 
> > > > > With NO TEMT interrupt check if both TEMT and THRE are set
> > > > > looping
> > > > > on
> > > > > LSR register.
> > > > > --- a/drivers/tty/serial/8250/8250_dw.c
> > > > > +++ b/drivers/tty/serial/8250/8250_dw.c
> > > > > - int ret = serial8250_em485_init(up);
> > > > > + int ret = serial8250_em485_init(up, false);
> > > > 
> > > > Is true for all possible DW configured types? Or it's your
> > > > particular
> > > > case?
> > > > 
> > > 
> > > I've checked on Synopsis Designware 8250 datasheet and it's not
> > > supported.
> > > Here is datasheet I went through:
> > > https://linux-sunxi.org/images/d/d2/Dw_apb_uart_db.pdf
> > > 
> > > There seems not to be TEMT interrupt, I use it under sunxi SoC and
> > > on
> > > their datasheet(A20 for example), they don't report that interrupt
> > > too.
> > > So it seems to be valid for all DW configured types, anyway I
> > > don't
> > > know
> > > how many IP reviews there could be of that peripheral.
> > 
> > This is an excerpt from the document you referred to:
> > 
> > --- 8< --- 8< ---
> > 
> > 6 TEMT R Transmitter Empty bit. If in FIFO mode (FIFO_MODE != NONE)
> > and
> > FIFOs enabled (FCR[0] set to one), this bit is set whenever the
> > Transmitter Shift Register and the FIFO are both empty. If in non-
> > FIFO
> > mode or FIFOs are disabled, this bit is set whenever the Transmitter
> > Holding Register and the Transmitter Shift Register are both empty.
> > 
> > Reset Value: 0x1
> > 
> > --- 8< --- 8< ---
> > 
> > 
> > If I'm reading this correctly the support is there. Or otherwise,
> > care
> > to point exact paragraph needs to be read and checked?
> 
> In the beginning I thought the same as you but
> unfortunately LSR is only a status register and IER doesn't have 
> corresponding TEMT bit to enable an interrupt on TEMT triggering.
> On OMAP instead there is a specific interrupt bound to TEMT LSR flag.
> And THRE interrupt is not enough because shift register won't be
> empty 
> when it triggers, so you would loose some bit of last byte to be 
> transmitted.

Hmm... Okay, it's something you and Matwey better to discuss.

P.S. Latest version of document I have does describe RS485 HW supported
mode. I don't know if it was added recently to the IP itself, or just
missed documentation. That's what you need to clarify with Synopsys.

-- 
Andy Shevchenko 
Intel Finland Oy

Re: [PATCH 4/8] serial: 8250: Handle case port doesn't have TEMT interrupt using em485.

2018-06-04 Thread Andy Shevchenko

On Mon, 2018-06-04 at 13:50 +0200, Giulio Benetti wrote:
> Hi,
> 
> Il 04/06/2018 13:38, Andy Shevchenko ha scritto:
> > On Mon, 2018-06-04 at 12:50 +0200, Giulio Benetti wrote:
> > > Hi,
> > > 
> > > Il 04/06/2018 12:17, Andy Shevchenko ha scritto:
> > > > On Fri, 2018-06-01 at 14:40 +0200, Giulio Benetti wrote:
> > > > > Some 8250 ports only have TEMT interrupt, so current
> > > > > implementation
> > > > > can't work for ports without it. The only chance to make it
> > > > > work
> > > > > is to
> > > > > loop-read on LSR register.
> > > > > 
> > > > > With NO TEMT interrupt check if both TEMT and THRE are set
> > > > > looping
> > > > > on
> > > > > LSR register.
> > > > > --- a/drivers/tty/serial/8250/8250_dw.c
> > > > > +++ b/drivers/tty/serial/8250/8250_dw.c
> > > > > - int ret = serial8250_em485_init(up);
> > > > > + int ret = serial8250_em485_init(up, false);
> > > > 
> > > > Is true for all possible DW configured types? Or it's your
> > > > particular
> > > > case?
> > > > 
> > > 
> > > I've checked on Synopsis Designware 8250 datasheet and it's not
> > > supported.
> > > Here is datasheet I went through:
> > > https://linux-sunxi.org/images/d/d2/Dw_apb_uart_db.pdf
> > > 
> > > There seems not to be TEMT interrupt, I use it under sunxi SoC and
> > > on
> > > their datasheet(A20 for example), they don't report that interrupt
> > > too.
> > > So it seems to be valid for all DW configured types, anyway I
> > > don't
> > > know
> > > how many IP reviews there could be of that peripheral.
> > 
> > This is an excerpt from the document you referred to:
> > 
> > --- 8< --- 8< ---
> > 
> > 6 TEMT R Transmitter Empty bit. If in FIFO mode (FIFO_MODE != NONE)
> > and
> > FIFOs enabled (FCR[0] set to one), this bit is set whenever the
> > Transmitter Shift Register and the FIFO are both empty. If in non-
> > FIFO
> > mode or FIFOs are disabled, this bit is set whenever the Transmitter
> > Holding Register and the Transmitter Shift Register are both empty.
> > 
> > Reset Value: 0x1
> > 
> > --- 8< --- 8< ---
> > 
> > 
> > If I'm reading this correctly the support is there. Or otherwise,
> > care
> > to point exact paragraph needs to be read and checked?
> 
> In the beginning I thought the same as you but
> unfortunately LSR is only a status register and IER doesn't have 
> corresponding TEMT bit to enable an interrupt on TEMT triggering.
> On OMAP instead there is a specific interrupt bound to TEMT LSR flag.
> And THRE interrupt is not enough because shift register won't be
> empty 
> when it triggers, so you would loose some bit of last byte to be 
> transmitted.

Hmm... Okay, it's something you and Matwey better to discuss.

P.S. Latest version of document I have does describe RS485 HW supported
mode. I don't know if it was added recently to the IP itself, or just
missed documentation. That's what you need to clarify with Synopsys.

-- 
Andy Shevchenko 
Intel Finland Oy

Re: [PATCH 1/2] mm: propagate memory effective protection on setting memory.min/low

2018-06-04 Thread Michal Hocko

On Tue 22-05-18 14:25:27, Roman Gushchin wrote:
> Explicitly propagate effective memory min/low values down by the tree.
> 
> If there is the global memory pressure, it's not really necessary.
> Effective memory guarantees will be propagated automatically
> as we traverse memory cgroup tree in the reclaim path.
> 
> But if there is no global memory pressure, effective memory protection
> still matters for local (memcg-scoped) memory pressure.
> So, we have to update effective limits in the subtree,
> if a user changes memory.min and memory.low values.

Please be explicit about the exact problem. Ideally with a memcg tree example.

> Signed-off-by: Roman Gushchin 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Vladimir Davydov 
> Cc: Greg Thelen 
> Cc: Tejun Heo 
> Cc: Andrew Morton 
> ---
>  mm/memcontrol.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ab5673dbfc4e..b9cd0bb63759 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5374,7 +5374,7 @@ static int memory_min_show(struct seq_file *m, void *v)
>  static ssize_t memory_min_write(struct kernfs_open_file *of,
>   char *buf, size_t nbytes, loff_t off)
>  {
> - struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> + struct mem_cgroup *iter, *memcg = mem_cgroup_from_css(of_css(of));
>   unsigned long min;
>   int err;
>  
> @@ -5385,6 +5385,11 @@ static ssize_t memory_min_write(struct 
> kernfs_open_file *of,
>  
>   page_counter_set_min(>memory, min);
>  
> + rcu_read_lock();
> + for_each_mem_cgroup_tree(iter, memcg)
> + mem_cgroup_protected(NULL, iter);
> + rcu_read_unlock();
> +
>   return nbytes;
>  }
>  
> @@ -5404,7 +5409,7 @@ static int memory_low_show(struct seq_file *m, void *v)
>  static ssize_t memory_low_write(struct kernfs_open_file *of,
>   char *buf, size_t nbytes, loff_t off)
>  {
> - struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> + struct mem_cgroup *iter, *memcg = mem_cgroup_from_css(of_css(of));
>   unsigned long low;
>   int err;
>  
> @@ -5415,6 +5420,11 @@ static ssize_t memory_low_write(struct 
> kernfs_open_file *of,
>  
>   page_counter_set_low(>memory, low);
>  
> + rcu_read_lock();
> + for_each_mem_cgroup_tree(iter, memcg)
> + mem_cgroup_protected(NULL, iter);
> + rcu_read_unlock();
> +
>   return nbytes;
>  }
>  
> -- 
> 2.14.3

-- 
Michal Hocko
SUSE Labs

Re: [PATCH 1/2] mm: propagate memory effective protection on setting memory.min/low

2018-06-04 Thread Michal Hocko

On Tue 22-05-18 14:25:27, Roman Gushchin wrote:
> Explicitly propagate effective memory min/low values down by the tree.
> 
> If there is the global memory pressure, it's not really necessary.
> Effective memory guarantees will be propagated automatically
> as we traverse memory cgroup tree in the reclaim path.
> 
> But if there is no global memory pressure, effective memory protection
> still matters for local (memcg-scoped) memory pressure.
> So, we have to update effective limits in the subtree,
> if a user changes memory.min and memory.low values.

Please be explicit about the exact problem. Ideally with a memcg tree example.

> Signed-off-by: Roman Gushchin 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Vladimir Davydov 
> Cc: Greg Thelen 
> Cc: Tejun Heo 
> Cc: Andrew Morton 
> ---
>  mm/memcontrol.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ab5673dbfc4e..b9cd0bb63759 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5374,7 +5374,7 @@ static int memory_min_show(struct seq_file *m, void *v)
>  static ssize_t memory_min_write(struct kernfs_open_file *of,
>   char *buf, size_t nbytes, loff_t off)
>  {
> - struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> + struct mem_cgroup *iter, *memcg = mem_cgroup_from_css(of_css(of));
>   unsigned long min;
>   int err;
>  
> @@ -5385,6 +5385,11 @@ static ssize_t memory_min_write(struct 
> kernfs_open_file *of,
>  
>   page_counter_set_min(>memory, min);
>  
> + rcu_read_lock();
> + for_each_mem_cgroup_tree(iter, memcg)
> + mem_cgroup_protected(NULL, iter);
> + rcu_read_unlock();
> +
>   return nbytes;
>  }
>  
> @@ -5404,7 +5409,7 @@ static int memory_low_show(struct seq_file *m, void *v)
>  static ssize_t memory_low_write(struct kernfs_open_file *of,
>   char *buf, size_t nbytes, loff_t off)
>  {
> - struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> + struct mem_cgroup *iter, *memcg = mem_cgroup_from_css(of_css(of));
>   unsigned long low;
>   int err;
>  
> @@ -5415,6 +5420,11 @@ static ssize_t memory_low_write(struct 
> kernfs_open_file *of,
>  
>   page_counter_set_low(>memory, low);
>  
> + rcu_read_lock();
> + for_each_mem_cgroup_tree(iter, memcg)
> + mem_cgroup_protected(NULL, iter);
> + rcu_read_unlock();
> +
>   return nbytes;
>  }
>  
> -- 
> 2.14.3

-- 
Michal Hocko
SUSE Labs

Re: [PATCH 05/19] sched/numa: Use task faults only if numa_group is not yet setup

2018-06-04 Thread Peter Zijlstra

On Mon, Jun 04, 2018 at 03:30:14PM +0530, Srikar Dronamraju wrote:
> When numa_group faults are available, task_numa_placement only uses
> numa_group faults to evaluate preferred node. However it still accounts
> task faults and even evaluates the preferred node just based on task
> faults just to discard it in favour of preferred node chosen on the
> basis of numa_group.
> 
> Instead use task faults only if numa_group is not set.
> 
> Testcase   Time: Min Max Avg  StdDev
> numa01.sh  Real:  506.35  794.46  599.06  104.26
> numa01.sh   Sys:  150.37  223.56  195.99   24.94
> numa01.sh  User:43450.6961752.0449281.50 6635.33
> numa02.sh  Real:   60.33   62.40   61.310.90
> numa02.sh   Sys:   18.12   31.66   24.285.89
> numa02.sh  User: 5203.91 5325.32 5260.29   49.98
> numa03.sh  Real:  696.47  853.62  745.80   57.28
> numa03.sh   Sys:   85.68  123.71   97.89   13.48
> numa03.sh  User:55978.4566418.6359254.94 3737.97
> numa04.sh  Real:  444.05  514.83  497.06   26.85
> numa04.sh   Sys:  230.39  375.79  316.23   48.58
> numa04.sh  User:35403.1241004.1039720.80 2163.08
> numa05.sh  Real:  423.09  460.41  439.57   13.92
> numa05.sh   Sys:  287.38  480.15  369.37   68.52
> numa05.sh  User:34732.1238016.8036255.85 1070.51
> 
> Testcase   Time: Min Max Avg  StdDev   %Change
> numa01.sh  Real:  478.45  565.90  515.11   30.87   16.29%
> numa01.sh   Sys:  207.79  271.04  232.94   21.33   -15.8%
> numa01.sh  User:39763.9347303.1243210.73 2644.86   14.04%
> numa02.sh  Real:   60.00   61.46   60.780.49   0.871%
> numa02.sh   Sys:   15.71   25.31   20.693.42   17.35%
> numa02.sh  User: 5175.92 5265.86 5235.97   32.82   0.464%
> numa03.sh  Real:  776.42  834.85  806.01   23.22   -7.47%
> numa03.sh   Sys:  114.43  128.75  121.655.49   -19.5%
> numa03.sh  User:60773.9364855.2562616.91 1576.39   -5.36%
> numa04.sh  Real:  456.93  511.95  482.91   20.88   2.930%
> numa04.sh   Sys:  178.09  460.89  356.86   94.58   -11.3%
> numa04.sh  User:36312.0942553.2439623.21 2247.96   0.246%
> numa05.sh  Real:  393.98  493.48  436.61   35.59   0.677%
> numa05.sh   Sys:  164.49  329.15  265.87   61.78   38.92%
> numa05.sh  User:33182.6536654.5335074.51 1187.71   3.368%
> 
> Ideally this change shouldn't have affected performance.

Ideally you go on here to explain why it does in fact do affect
performance.. :-)

Re: [PATCH 05/19] sched/numa: Use task faults only if numa_group is not yet setup

2018-06-04 Thread Peter Zijlstra

On Mon, Jun 04, 2018 at 03:30:14PM +0530, Srikar Dronamraju wrote:
> When numa_group faults are available, task_numa_placement only uses
> numa_group faults to evaluate preferred node. However it still accounts
> task faults and even evaluates the preferred node just based on task
> faults just to discard it in favour of preferred node chosen on the
> basis of numa_group.
> 
> Instead use task faults only if numa_group is not set.
> 
> Testcase   Time: Min Max Avg  StdDev
> numa01.sh  Real:  506.35  794.46  599.06  104.26
> numa01.sh   Sys:  150.37  223.56  195.99   24.94
> numa01.sh  User:43450.6961752.0449281.50 6635.33
> numa02.sh  Real:   60.33   62.40   61.310.90
> numa02.sh   Sys:   18.12   31.66   24.285.89
> numa02.sh  User: 5203.91 5325.32 5260.29   49.98
> numa03.sh  Real:  696.47  853.62  745.80   57.28
> numa03.sh   Sys:   85.68  123.71   97.89   13.48
> numa03.sh  User:55978.4566418.6359254.94 3737.97
> numa04.sh  Real:  444.05  514.83  497.06   26.85
> numa04.sh   Sys:  230.39  375.79  316.23   48.58
> numa04.sh  User:35403.1241004.1039720.80 2163.08
> numa05.sh  Real:  423.09  460.41  439.57   13.92
> numa05.sh   Sys:  287.38  480.15  369.37   68.52
> numa05.sh  User:34732.1238016.8036255.85 1070.51
> 
> Testcase   Time: Min Max Avg  StdDev   %Change
> numa01.sh  Real:  478.45  565.90  515.11   30.87   16.29%
> numa01.sh   Sys:  207.79  271.04  232.94   21.33   -15.8%
> numa01.sh  User:39763.9347303.1243210.73 2644.86   14.04%
> numa02.sh  Real:   60.00   61.46   60.780.49   0.871%
> numa02.sh   Sys:   15.71   25.31   20.693.42   17.35%
> numa02.sh  User: 5175.92 5265.86 5235.97   32.82   0.464%
> numa03.sh  Real:  776.42  834.85  806.01   23.22   -7.47%
> numa03.sh   Sys:  114.43  128.75  121.655.49   -19.5%
> numa03.sh  User:60773.9364855.2562616.91 1576.39   -5.36%
> numa04.sh  Real:  456.93  511.95  482.91   20.88   2.930%
> numa04.sh   Sys:  178.09  460.89  356.86   94.58   -11.3%
> numa04.sh  User:36312.0942553.2439623.21 2247.96   0.246%
> numa05.sh  Real:  393.98  493.48  436.61   35.59   0.677%
> numa05.sh   Sys:  164.49  329.15  265.87   61.78   38.92%
> numa05.sh  User:33182.6536654.5335074.51 1187.71   3.368%
> 
> Ideally this change shouldn't have affected performance.

Ideally you go on here to explain why it does in fact do affect
performance.. :-)

Re: [PATCH] sched/fair: Fix util_avg of new tasks for asymmetric systems

2018-06-04 Thread Vincent Guittot

On 4 June 2018 at 13:58, Quentin Perret  wrote:
> When a new task wakes-up for the first time, its initial utilization
> is set to half of the spare capacity of its CPU. The current
> implementation of post_init_entity_util_avg() uses SCHED_CAPACITY_SCALE
> directly as a capacity reference. As a result, on a big.LITTLE system, a
> new task waking up on an idle little CPU will be given ~512 of util_avg,
> even if the CPU's capacity is significantly less than that.
>
> Fix this by computing the spare capacity with arch_scale_cpu_capacity().
>
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Signed-off-by: Quentin Perret 

Acked-by: Vincent Guittot 

> ---
>  kernel/sched/fair.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e497c05aab7f..f19432c17017 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -735,11 +735,12 @@ static void attach_entity_cfs_rq(struct sched_entity 
> *se);
>   * To solve this problem, we also cap the util_avg of successive tasks to
>   * only 1/2 of the left utilization budget:
>   *
> - *   util_avg_cap = (1024 - cfs_rq->avg.util_avg) / 2^n
> + *   util_avg_cap = (cpu_scale - cfs_rq->avg.util_avg) / 2^n
>   *
> - * where n denotes the nth task.
> + * where n denotes the nth task and cpu_scale the CPU capacity.
>   *
> - * For example, a simplest series from the beginning would be like:
> + * For example, for a CPU with 1024 of capacity, a simplest series from
> + * the beginning would be like:
>   *
>   *  task  util_avg: 512, 256, 128,  64,  32,   16,8, ...
>   * cfs_rq util_avg: 512, 768, 896, 960, 992, 1008, 1016, ...
> @@ -751,7 +752,8 @@ void post_init_entity_util_avg(struct sched_entity *se)
>  {
> struct cfs_rq *cfs_rq = cfs_rq_of(se);
> struct sched_avg *sa = >avg;
> -   long cap = (long)(SCHED_CAPACITY_SCALE - cfs_rq->avg.util_avg) / 2;
> +   long cpu_scale = arch_scale_cpu_capacity(NULL, cpu_of(rq_of(cfs_rq)));
> +   long cap = (long)(cpu_scale - cfs_rq->avg.util_avg) / 2;
>
> if (cap > 0) {
> if (cfs_rq->avg.util_avg != 0) {
> --
> 2.17.0
>

Re: [PATCH] sched/fair: Fix util_avg of new tasks for asymmetric systems

2018-06-04 Thread Vincent Guittot

On 4 June 2018 at 13:58, Quentin Perret  wrote:
> When a new task wakes-up for the first time, its initial utilization
> is set to half of the spare capacity of its CPU. The current
> implementation of post_init_entity_util_avg() uses SCHED_CAPACITY_SCALE
> directly as a capacity reference. As a result, on a big.LITTLE system, a
> new task waking up on an idle little CPU will be given ~512 of util_avg,
> even if the CPU's capacity is significantly less than that.
>
> Fix this by computing the spare capacity with arch_scale_cpu_capacity().
>
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Signed-off-by: Quentin Perret 

Acked-by: Vincent Guittot 

> ---
>  kernel/sched/fair.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e497c05aab7f..f19432c17017 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -735,11 +735,12 @@ static void attach_entity_cfs_rq(struct sched_entity 
> *se);
>   * To solve this problem, we also cap the util_avg of successive tasks to
>   * only 1/2 of the left utilization budget:
>   *
> - *   util_avg_cap = (1024 - cfs_rq->avg.util_avg) / 2^n
> + *   util_avg_cap = (cpu_scale - cfs_rq->avg.util_avg) / 2^n
>   *
> - * where n denotes the nth task.
> + * where n denotes the nth task and cpu_scale the CPU capacity.
>   *
> - * For example, a simplest series from the beginning would be like:
> + * For example, for a CPU with 1024 of capacity, a simplest series from
> + * the beginning would be like:
>   *
>   *  task  util_avg: 512, 256, 128,  64,  32,   16,8, ...
>   * cfs_rq util_avg: 512, 768, 896, 960, 992, 1008, 1016, ...
> @@ -751,7 +752,8 @@ void post_init_entity_util_avg(struct sched_entity *se)
>  {
> struct cfs_rq *cfs_rq = cfs_rq_of(se);
> struct sched_avg *sa = >avg;
> -   long cap = (long)(SCHED_CAPACITY_SCALE - cfs_rq->avg.util_avg) / 2;
> +   long cpu_scale = arch_scale_cpu_capacity(NULL, cpu_of(rq_of(cfs_rq)));
> +   long cap = (long)(cpu_scale - cfs_rq->avg.util_avg) / 2;
>
> if (cap > 0) {
> if (cfs_rq->avg.util_avg != 0) {
> --
> 2.17.0
>

Re: [PATCH 04/19] sched/numa: Set preferred_node based on best_cpu

2018-06-04 Thread Peter Zijlstra

On Mon, Jun 04, 2018 at 03:30:13PM +0530, Srikar Dronamraju wrote:
> @@ -1785,15 +1786,13 @@ static int task_numa_migrate(struct task_struct *p)
>* trying for a better one later. Do not set the preferred node here.
>*/
>   if (p->numa_group) {
> - struct numa_group *ng = p->numa_group;
> -
>   if (env.best_cpu == -1)
>   nid = env.src_nid;
>   else
> - nid = env.dst_nid;
> + nid = cpu_to_node(env.best_cpu);

OK, the above matches the description, but I'm puzzled by the remainder:

>  
> - if (ng->active_nodes > 1 && numa_is_active_node(env.dst_nid, 
> ng))
> - sched_setnuma(p, env.dst_nid);
> + if (nid != p->numa_preferred_nid)
> + sched_setnuma(p, nid);
>   }

That seems to entirely loose the active_node thing, or are you saying
best_cpu already includes that? (Changelog could use a little help there
I suppose)

Re: [PATCH 04/19] sched/numa: Set preferred_node based on best_cpu

2018-06-04 Thread Peter Zijlstra

On Mon, Jun 04, 2018 at 03:30:13PM +0530, Srikar Dronamraju wrote:
> @@ -1785,15 +1786,13 @@ static int task_numa_migrate(struct task_struct *p)
>* trying for a better one later. Do not set the preferred node here.
>*/
>   if (p->numa_group) {
> - struct numa_group *ng = p->numa_group;
> -
>   if (env.best_cpu == -1)
>   nid = env.src_nid;
>   else
> - nid = env.dst_nid;
> + nid = cpu_to_node(env.best_cpu);

OK, the above matches the description, but I'm puzzled by the remainder:

>  
> - if (ng->active_nodes > 1 && numa_is_active_node(env.dst_nid, 
> ng))
> - sched_setnuma(p, env.dst_nid);
> + if (nid != p->numa_preferred_nid)
> + sched_setnuma(p, nid);
>   }

That seems to entirely loose the active_node thing, or are you saying
best_cpu already includes that? (Changelog could use a little help there
I suppose)

Re: [PATCH 4.9 00/29] 4.9.106-stable review

2018-06-04 Thread Greg Kroah-Hartman

On Mon, Jun 04, 2018 at 01:27:06PM +0200, Greg Kroah-Hartman wrote:
> On Mon, Jun 04, 2018 at 03:15:23AM -0700, Guenter Roeck wrote:
> > On 06/03/2018 11:57 PM, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 4.9.106 release.
> > > There are 29 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > > 
> > > Responses should be made by Wed Jun  6 06:57:52 UTC 2018.
> > > Anything received after that time might be too late.
> > > 
> > 
> > Are you testing us ?
> > 
> > Building x86_64:tools/perf ... failed
> > --
> > Error log:
> > make[4]: execvp: ./check-headers.sh: Permission denied
> > make[4]: *** [sub-make] Error 127
> 
> Heh, no, I wasn't, but thanks for checking.
> 
> > Something went wrong with the patch creating the file. In the original
> > commit it is created as 755, in the backport it is created as 644.
> 
> Ugh, quilt does not keep the permissions of files :(
> 
> I thought I had figured that out in the past, let me try to go remember
> what I did before...
> 
> There's also a .sh file in the objtool directory that is not the proper
> permissions as well...

Ok, hand editing patch files are always so much fun...  Anyway, this is
fixed up now in the -rc1 git tree I pushed out again.  Let me know if
that fails.

And thanks for testing the build of perf now :)

thanks,

greg k-h

Re: [PATCH 4.9 00/29] 4.9.106-stable review

2018-06-04 Thread Greg Kroah-Hartman

On Mon, Jun 04, 2018 at 01:27:06PM +0200, Greg Kroah-Hartman wrote:
> On Mon, Jun 04, 2018 at 03:15:23AM -0700, Guenter Roeck wrote:
> > On 06/03/2018 11:57 PM, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 4.9.106 release.
> > > There are 29 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > > 
> > > Responses should be made by Wed Jun  6 06:57:52 UTC 2018.
> > > Anything received after that time might be too late.
> > > 
> > 
> > Are you testing us ?
> > 
> > Building x86_64:tools/perf ... failed
> > --
> > Error log:
> > make[4]: execvp: ./check-headers.sh: Permission denied
> > make[4]: *** [sub-make] Error 127
> 
> Heh, no, I wasn't, but thanks for checking.
> 
> > Something went wrong with the patch creating the file. In the original
> > commit it is created as 755, in the backport it is created as 644.
> 
> Ugh, quilt does not keep the permissions of files :(
> 
> I thought I had figured that out in the past, let me try to go remember
> what I did before...
> 
> There's also a .sh file in the objtool directory that is not the proper
> permissions as well...

Ok, hand editing patch files are always so much fun...  Anyway, this is
fixed up now in the -rc1 git tree I pushed out again.  Let me know if
that fails.

And thanks for testing the build of perf now :)

thanks,

greg k-h

[GIT PULL] x86/asm changes for v4.18

2018-06-04 Thread Ingo Molnar

Linus,

Please pull the latest x86-asm-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-asm-for-linus

   # HEAD: 6469a0ee0a06b2ea1f5afbb1d5a3feed017d4c7a x86/io: Define 
readq()/writeq() to use 64-bit type

Two smaller changes:

 - better support (non-atomic) 64-bit readq()/writeq() variants (Andy 
Shevchenko)

 - __clear_user() micro-optimization (Alexey Dobriyan)

 Thanks,

Ingo

-->
Alexey Dobriyan (1):
  x86/asm/64: Micro-optimize __clear_user() - Use immediate constants

Andy Shevchenko (1):
  x86/io: Define readq()/writeq() to use 64-bit type


 arch/x86/include/asm/io.h  | 8 
 arch/x86/lib/usercopy_64.c | 9 -
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index f6e5b9375d8c..6de64840dd22 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -94,10 +94,10 @@ build_mmio_write(__writel, "l", unsigned int, "r", )
 
 #ifdef CONFIG_X86_64
 
-build_mmio_read(readq, "q", unsigned long, "=r", :"memory")
-build_mmio_read(__readq, "q", unsigned long, "=r", )
-build_mmio_write(writeq, "q", unsigned long, "r", :"memory")
-build_mmio_write(__writeq, "q", unsigned long, "r", )
+build_mmio_read(readq, "q", u64, "=r", :"memory")
+build_mmio_read(__readq, "q", u64, "=r", )
+build_mmio_write(writeq, "q", u64, "r", :"memory")
+build_mmio_write(__writeq, "q", u64, "r", )
 
 #define readq_relaxed(a)   __readq(a)
 #define writeq_relaxed(v, a)   __writeq(v, a)
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 75d3776123cc..a624dcc4de10 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -23,13 +23,13 @@ unsigned long __clear_user(void __user *addr, unsigned long 
size)
asm volatile(
"   testq  %[size8],%[size8]\n"
"   jz 4f\n"
-   "0: movq %[zero],(%[dst])\n"
-   "   addq   %[eight],%[dst]\n"
+   "0: movq $0,(%[dst])\n"
+   "   addq   $8,%[dst]\n"
"   decl %%ecx ; jnz   0b\n"
"4: movq  %[size1],%%rcx\n"
"   testl %%ecx,%%ecx\n"
"   jz 2f\n"
-   "1: movb   %b[zero],(%[dst])\n"
+   "1: movb   $0,(%[dst])\n"
"   incq   %[dst]\n"
"   decl %%ecx ; jnz  1b\n"
"2:\n"
@@ -40,8 +40,7 @@ unsigned long __clear_user(void __user *addr, unsigned long 
size)
_ASM_EXTABLE(0b,3b)
_ASM_EXTABLE(1b,2b)
: [size8] "="(size), [dst] "=" (__d0)
-   : [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr),
- [zero] "r" (0UL), [eight] "r" (8UL));
+   : [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr));
clac();
return size;
 }

[GIT PULL] x86/asm changes for v4.18

2018-06-04 Thread Ingo Molnar

Linus,

Please pull the latest x86-asm-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-asm-for-linus

   # HEAD: 6469a0ee0a06b2ea1f5afbb1d5a3feed017d4c7a x86/io: Define 
readq()/writeq() to use 64-bit type

Two smaller changes:

 - better support (non-atomic) 64-bit readq()/writeq() variants (Andy 
Shevchenko)

 - __clear_user() micro-optimization (Alexey Dobriyan)

 Thanks,

Ingo

-->
Alexey Dobriyan (1):
  x86/asm/64: Micro-optimize __clear_user() - Use immediate constants

Andy Shevchenko (1):
  x86/io: Define readq()/writeq() to use 64-bit type


 arch/x86/include/asm/io.h  | 8 
 arch/x86/lib/usercopy_64.c | 9 -
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index f6e5b9375d8c..6de64840dd22 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -94,10 +94,10 @@ build_mmio_write(__writel, "l", unsigned int, "r", )
 
 #ifdef CONFIG_X86_64
 
-build_mmio_read(readq, "q", unsigned long, "=r", :"memory")
-build_mmio_read(__readq, "q", unsigned long, "=r", )
-build_mmio_write(writeq, "q", unsigned long, "r", :"memory")
-build_mmio_write(__writeq, "q", unsigned long, "r", )
+build_mmio_read(readq, "q", u64, "=r", :"memory")
+build_mmio_read(__readq, "q", u64, "=r", )
+build_mmio_write(writeq, "q", u64, "r", :"memory")
+build_mmio_write(__writeq, "q", u64, "r", )
 
 #define readq_relaxed(a)   __readq(a)
 #define writeq_relaxed(v, a)   __writeq(v, a)
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 75d3776123cc..a624dcc4de10 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -23,13 +23,13 @@ unsigned long __clear_user(void __user *addr, unsigned long 
size)
asm volatile(
"   testq  %[size8],%[size8]\n"
"   jz 4f\n"
-   "0: movq %[zero],(%[dst])\n"
-   "   addq   %[eight],%[dst]\n"
+   "0: movq $0,(%[dst])\n"
+   "   addq   $8,%[dst]\n"
"   decl %%ecx ; jnz   0b\n"
"4: movq  %[size1],%%rcx\n"
"   testl %%ecx,%%ecx\n"
"   jz 2f\n"
-   "1: movb   %b[zero],(%[dst])\n"
+   "1: movb   $0,(%[dst])\n"
"   incq   %[dst]\n"
"   decl %%ecx ; jnz  1b\n"
"2:\n"
@@ -40,8 +40,7 @@ unsigned long __clear_user(void __user *addr, unsigned long 
size)
_ASM_EXTABLE(0b,3b)
_ASM_EXTABLE(1b,2b)
: [size8] "="(size), [dst] "=" (__d0)
-   : [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr),
- [zero] "r" (0UL), [eight] "r" (8UL));
+   : [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr));
clac();
return size;
 }

[GIT PULL] x86/boot changes for v4.18

2018-06-04 Thread Ingo Molnar

Linus,

Please pull the latest x86-boot-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-boot-for-linus

   # HEAD: e4e961e36f063484c48bed919013c106d178995d x86/mm: Mark 
__pgtable_l5_enabled __initdata

The main changes in this cycle were:

 - Centaur CPU updates (David Wang)

 - AMD and other CPU topology enumeration improvements and fixes
   (Borislav Petkov, Thomas Gleixner, Suravee Suthikulpanit)

 - Continued 5-level paging work (Kirill A. Shutemov)

 Thanks,

Ingo

-->

Borislav Petkov (2):
  x86/CPU/AMD: Have smp_num_siblings and cpu_llc_id always be present
  x86/CPU: Rename intel_cacheinfo.c to cacheinfo.c

David Wang (4):
  x86/Centaur: Initialize supported CPU features properly
  x86/CPU: Make intel_num_cpu_cores() generic
  x86/CPU: Move cpu_detect_cache_sizes() into init_intel_cacheinfo()
  x86/Centaur: Report correct CPU/cache topology

Kirill A. Shutemov (6):
  x86/boot/compressed/64: Fix trampoline page table address calculation
  x86/mm: Unify pgtable_l5_enabled usage in early boot code
  x86/mm: Stop pretending pgtable_l5_enabled is a variable
  x86/mm: Introduce the 'no5lvl' kernel parameter
  x86/mm: Mark p4d_offset() __always_inline
  x86/mm: Mark __pgtable_l5_enabled __initdata

Suravee Suthikulpanit (4):
  perf/events/amd/uncore: Fix amd_uncore_llc ID to use pre-defined 
cpu_llc_id
  x86/CPU/AMD: Calculate last level cache ID from number of sharing threads
  x86/CPU: Modify detect_extended_topology() to return result
  x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available

Thomas Gleixner (2):
  x86/CPU: Move cpu local function declarations to local header
  x86/CPU: Move x86_cpuinfo::x86_max_cores assignment to 
detect_num_cpu_cores()

 Documentation/admin-guide/kernel-parameters.txt|  3 +++
 arch/x86/boot/compressed/cmdline.c |  2 +-
 arch/x86/boot/compressed/head_64.S |  1 +
 arch/x86/boot/compressed/kaslr.c   |  4 ++--
 arch/x86/boot/compressed/misc.h|  6 ++
 arch/x86/boot/compressed/pgtable_64.c  | 14 +++---
 arch/x86/events/amd/uncore.c   | 21 
++---
 arch/x86/include/asm/cacheinfo.h   |  7 +++
 arch/x86/include/asm/page_64_types.h   |  2 +-
 arch/x86/include/asm/paravirt.h|  4 ++--
 arch/x86/include/asm/pgalloc.h |  4 ++--
 arch/x86/include/asm/pgtable.h | 12 ++--
 arch/x86/include/asm/pgtable_32_types.h|  2 +-
 arch/x86/include/asm/pgtable_64.h  |  2 +-
 arch/x86/include/asm/pgtable_64_types.h| 25 
++---
 arch/x86/include/asm/processor.h   |  9 -
 arch/x86/include/asm/smp.h |  1 -
 arch/x86/include/asm/sparsemem.h   |  4 ++--
 arch/x86/kernel/cpu/Makefile   |  2 +-
 arch/x86/kernel/cpu/amd.c  | 36 

 arch/x86/kernel/cpu/{intel_cacheinfo.c => cacheinfo.c} | 46 
--
 arch/x86/kernel/cpu/centaur.c  | 53 
+
 arch/x86/kernel/cpu/common.c   | 35 
+++
 arch/x86/kernel/cpu/cpu.h  | 10 ++
 arch/x86/kernel/cpu/intel.c| 34 
+-
 arch/x86/kernel/cpu/topology.c |  8 
 arch/x86/kernel/head64.c   | 25 
-
 arch/x86/kernel/machine_kexec_64.c |  3 ++-
 arch/x86/kernel/smpboot.c  |  7 ---
 arch/x86/mm/dump_pagetables.c  |  6 +++---
 arch/x86/mm/fault.c|  4 ++--
 arch/x86/mm/ident_map.c|  2 +-
 arch/x86/mm/init_64.c  |  8 
 arch/x86/mm/kasan_init_64.c| 14 ++
 arch/x86/mm/kaslr.c|  8 
 arch/x86/mm/tlb.c  |  2 +-
 arch/x86/platform/efi/efi_64.c |  2 +-
 arch/x86/power/hibernate_64.c  |  2 +-
 38 files changed, 263 insertions(+), 167 deletions(-)
 create mode 100644 arch/x86/include/asm/cacheinfo.h
 rename arch/x86/kernel/cpu/{intel_cacheinfo.c => cacheinfo.c} (95%)

[GIT PULL] x86/boot changes for v4.18

2018-06-04 Thread Ingo Molnar

Linus,

Please pull the latest x86-boot-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-boot-for-linus

   # HEAD: e4e961e36f063484c48bed919013c106d178995d x86/mm: Mark 
__pgtable_l5_enabled __initdata

The main changes in this cycle were:

 - Centaur CPU updates (David Wang)

 - AMD and other CPU topology enumeration improvements and fixes
   (Borislav Petkov, Thomas Gleixner, Suravee Suthikulpanit)

 - Continued 5-level paging work (Kirill A. Shutemov)

 Thanks,

Ingo

-->

Borislav Petkov (2):
  x86/CPU/AMD: Have smp_num_siblings and cpu_llc_id always be present
  x86/CPU: Rename intel_cacheinfo.c to cacheinfo.c

David Wang (4):
  x86/Centaur: Initialize supported CPU features properly
  x86/CPU: Make intel_num_cpu_cores() generic
  x86/CPU: Move cpu_detect_cache_sizes() into init_intel_cacheinfo()
  x86/Centaur: Report correct CPU/cache topology

Kirill A. Shutemov (6):
  x86/boot/compressed/64: Fix trampoline page table address calculation
  x86/mm: Unify pgtable_l5_enabled usage in early boot code
  x86/mm: Stop pretending pgtable_l5_enabled is a variable
  x86/mm: Introduce the 'no5lvl' kernel parameter
  x86/mm: Mark p4d_offset() __always_inline
  x86/mm: Mark __pgtable_l5_enabled __initdata

Suravee Suthikulpanit (4):
  perf/events/amd/uncore: Fix amd_uncore_llc ID to use pre-defined 
cpu_llc_id
  x86/CPU/AMD: Calculate last level cache ID from number of sharing threads
  x86/CPU: Modify detect_extended_topology() to return result
  x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available

Thomas Gleixner (2):
  x86/CPU: Move cpu local function declarations to local header
  x86/CPU: Move x86_cpuinfo::x86_max_cores assignment to 
detect_num_cpu_cores()

 Documentation/admin-guide/kernel-parameters.txt|  3 +++
 arch/x86/boot/compressed/cmdline.c |  2 +-
 arch/x86/boot/compressed/head_64.S |  1 +
 arch/x86/boot/compressed/kaslr.c   |  4 ++--
 arch/x86/boot/compressed/misc.h|  6 ++
 arch/x86/boot/compressed/pgtable_64.c  | 14 +++---
 arch/x86/events/amd/uncore.c   | 21 
++---
 arch/x86/include/asm/cacheinfo.h   |  7 +++
 arch/x86/include/asm/page_64_types.h   |  2 +-
 arch/x86/include/asm/paravirt.h|  4 ++--
 arch/x86/include/asm/pgalloc.h |  4 ++--
 arch/x86/include/asm/pgtable.h | 12 ++--
 arch/x86/include/asm/pgtable_32_types.h|  2 +-
 arch/x86/include/asm/pgtable_64.h  |  2 +-
 arch/x86/include/asm/pgtable_64_types.h| 25 
++---
 arch/x86/include/asm/processor.h   |  9 -
 arch/x86/include/asm/smp.h |  1 -
 arch/x86/include/asm/sparsemem.h   |  4 ++--
 arch/x86/kernel/cpu/Makefile   |  2 +-
 arch/x86/kernel/cpu/amd.c  | 36 

 arch/x86/kernel/cpu/{intel_cacheinfo.c => cacheinfo.c} | 46 
--
 arch/x86/kernel/cpu/centaur.c  | 53 
+
 arch/x86/kernel/cpu/common.c   | 35 
+++
 arch/x86/kernel/cpu/cpu.h  | 10 ++
 arch/x86/kernel/cpu/intel.c| 34 
+-
 arch/x86/kernel/cpu/topology.c |  8 
 arch/x86/kernel/head64.c   | 25 
-
 arch/x86/kernel/machine_kexec_64.c |  3 ++-
 arch/x86/kernel/smpboot.c  |  7 ---
 arch/x86/mm/dump_pagetables.c  |  6 +++---
 arch/x86/mm/fault.c|  4 ++--
 arch/x86/mm/ident_map.c|  2 +-
 arch/x86/mm/init_64.c  |  8 
 arch/x86/mm/kasan_init_64.c| 14 ++
 arch/x86/mm/kaslr.c|  8 
 arch/x86/mm/tlb.c  |  2 +-
 arch/x86/platform/efi/efi_64.c |  2 +-
 arch/x86/power/hibernate_64.c  |  2 +-
 38 files changed, 263 insertions(+), 167 deletions(-)
 create mode 100644 arch/x86/include/asm/cacheinfo.h
 rename arch/x86/kernel/cpu/{intel_cacheinfo.c => cacheinfo.c} (95%)

Re: [PATCH 04/19] sched/numa: Set preferred_node based on best_cpu

2018-06-04 Thread Peter Zijlstra

On Mon, Jun 04, 2018 at 03:30:13PM +0530, Srikar Dronamraju wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ea32a66..94091e6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1725,8 +1725,9 @@ static int task_numa_migrate(struct task_struct *p)
>* Tasks that are "trapped" in such domains cannot be migrated
>* elsewhere, so there is no point in (re)trying.
>*/
> - if (unlikely(!sd)) {
> - p->numa_preferred_nid = task_node(p);
> + if (unlikely(!sd) && p->numa_preferred_nid != task_node(p)) {
> + /* Set the new preferred node */
> + sched_setnuma(p, task_node(p));
>   return -EINVAL;
>   }
>  

That looks dodgy.. this would allow things to continue with !sd.

Re: [PATCH 04/19] sched/numa: Set preferred_node based on best_cpu

2018-06-04 Thread Peter Zijlstra

On Mon, Jun 04, 2018 at 03:30:13PM +0530, Srikar Dronamraju wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ea32a66..94091e6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1725,8 +1725,9 @@ static int task_numa_migrate(struct task_struct *p)
>* Tasks that are "trapped" in such domains cannot be migrated
>* elsewhere, so there is no point in (re)trying.
>*/
> - if (unlikely(!sd)) {
> - p->numa_preferred_nid = task_node(p);
> + if (unlikely(!sd) && p->numa_preferred_nid != task_node(p)) {
> + /* Set the new preferred node */
> + sched_setnuma(p, task_node(p));
>   return -EINVAL;
>   }
>  

That looks dodgy.. this would allow things to continue with !sd.

Re: [PATCH v7 2/2] Refactor part of the oom report in dump_header

2018-06-04 Thread Michal Hocko

On Mon 04-06-18 20:13:44, 禹舟键 wrote:
> Hi Michal
> I will add the missing information in the cover-letter.

I do not really think the cover letter needs much improvements. It is
the patch description that should be as specific as possible. Cover
letter should contain a highlevel description usually.
 
> > That being said, I am ready to ack a patch which adds the memcg of the
> > oom victim. I will not ack (nor nack) the patch which turns it into a
> > single print because I am not sure the benefit is really worth it. Maybe
> > others will though.
> 
> OK, I will use the pr_cont_cgroup_name() to print origin and kill
> memcg's name. I hope David will not have other opinions :)

As I've said this can be always added on top pressuming there is a good
justification.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH v7 2/2] Refactor part of the oom report in dump_header

2018-06-04 Thread Michal Hocko

On Mon 04-06-18 20:13:44, 禹舟键 wrote:
> Hi Michal
> I will add the missing information in the cover-letter.

I do not really think the cover letter needs much improvements. It is
the patch description that should be as specific as possible. Cover
letter should contain a highlevel description usually.
 
> > That being said, I am ready to ack a patch which adds the memcg of the
> > oom victim. I will not ack (nor nack) the patch which turns it into a
> > single print because I am not sure the benefit is really worth it. Maybe
> > others will though.
> 
> OK, I will use the pr_cont_cgroup_name() to print origin and kill
> memcg's name. I hope David will not have other opinions :)

As I've said this can be always added on top pressuming there is a good
justification.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH v7 2/2] Refactor part of the oom report in dump_header

2018-06-04 Thread 禹舟键

Hi Michal
I will add the missing information in the cover-letter.

> That being said, I am ready to ack a patch which adds the memcg of the
> oom victim. I will not ack (nor nack) the patch which turns it into a
> single print because I am not sure the benefit is really worth it. Maybe
> others will though.

OK, I will use the pr_cont_cgroup_name() to print origin and kill
memcg's name. I hope David will not have other opinions :)

Thanks

Re: [PATCH v7 2/2] Refactor part of the oom report in dump_header

2018-06-04 Thread 禹舟键

Hi Michal
I will add the missing information in the cover-letter.

> That being said, I am ready to ack a patch which adds the memcg of the
> oom victim. I will not ack (nor nack) the patch which turns it into a
> single print because I am not sure the benefit is really worth it. Maybe
> others will though.

OK, I will use the pr_cont_cgroup_name() to print origin and kill
memcg's name. I hope David will not have other opinions :)

Thanks

Re: [PATCH v12 5/5] arm64: Allow huge io mappings again

2018-06-04 Thread Will Deacon

On Fri, Jun 01, 2018 at 06:09:18PM +0530, Chintan Pandya wrote:
> Huge mappings have had stability issues due to stale
> TLB entry and memory leak issues. Since, those are
> addressed in this series of patches, it is now safe
> to allow huge mappings.
> 
> Signed-off-by: Chintan Pandya 
> ---
>  arch/arm64/mm/mmu.c | 18 ++
>  1 file changed, 2 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 6e7e16c..c65abc4 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -934,15 +934,8 @@ int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t 
> prot)
>  {
>   pgprot_t sect_prot = __pgprot(PUD_TYPE_SECT |
>   pgprot_val(mk_sect_prot(prot)));
> - pud_t new_pud = pfn_pud(__phys_to_pfn(phys), sect_prot);
> -
> - /* Only allow permission changes for now */
> - if (!pgattr_change_is_safe(READ_ONCE(pud_val(*pudp)),
> -pud_val(new_pud)))
> - return 0;

Do you actually need to remove these checks? If we're doing
break-before-make properly, then the check won't fire but it would be
good to keep it there so we can catch misuse of these in future.

In other words, can we drop this patch?

Will

Re: [PATCH v12 5/5] arm64: Allow huge io mappings again

2018-06-04 Thread Will Deacon

On Fri, Jun 01, 2018 at 06:09:18PM +0530, Chintan Pandya wrote:
> Huge mappings have had stability issues due to stale
> TLB entry and memory leak issues. Since, those are
> addressed in this series of patches, it is now safe
> to allow huge mappings.
> 
> Signed-off-by: Chintan Pandya 
> ---
>  arch/arm64/mm/mmu.c | 18 ++
>  1 file changed, 2 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 6e7e16c..c65abc4 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -934,15 +934,8 @@ int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t 
> prot)
>  {
>   pgprot_t sect_prot = __pgprot(PUD_TYPE_SECT |
>   pgprot_val(mk_sect_prot(prot)));
> - pud_t new_pud = pfn_pud(__phys_to_pfn(phys), sect_prot);
> -
> - /* Only allow permission changes for now */
> - if (!pgattr_change_is_safe(READ_ONCE(pud_val(*pudp)),
> -pud_val(new_pud)))
> - return 0;

Do you actually need to remove these checks? If we're doing
break-before-make properly, then the check won't fire but it would be
good to keep it there so we can catch misuse of these in future.

In other words, can we drop this patch?

Will

Re: [PATCH v12 3/5] arm64: pgtable: Add p*d_page_vaddr helper macros

2018-06-04 Thread Will Deacon

On Fri, Jun 01, 2018 at 06:09:16PM +0530, Chintan Pandya wrote:
> Add helper macros to give virtual references to page
> tables. These will be used while freeing dangling
> page tables.
> 
> Signed-off-by: Chintan Pandya 
> ---
>  arch/arm64/include/asm/pgtable.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index 7c4c8f3..ef4047f 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -580,6 +580,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>  
>  #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
>  
> +#define pmd_page_vaddr(pmd) __va(pmd_page_paddr(pmd))
> +#define pud_page_vaddr(pud) __va(pud_page_paddr(pud))

Are these actually needed, or do pte_offset_kernel and pmd_offset do the
job already?

Will

Re: [PATCH v12 4/5] arm64: Implement page table free interfaces

2018-06-04 Thread Will Deacon

On Fri, Jun 01, 2018 at 06:09:17PM +0530, Chintan Pandya wrote:
> Implement pud_free_pmd_page() and pmd_free_pte_page().
> 
> Implementation requires,
>  1) Clearing off the current pud/pmd entry
>  2) Invalidate TLB which could have previously
> valid but not stale entry
>  3) Freeing of the un-used next level page tables

Please can you rewrite this describing the problem that you're solving,
rather than a brief summary of some requirements?

> Signed-off-by: Chintan Pandya 
> ---
>  arch/arm64/mm/mmu.c | 38 ++
>  1 file changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 8ae5d7a..6e7e16c 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -45,6 +45,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define NO_BLOCK_MAPPINGSBIT(0)
>  #define NO_CONT_MAPPINGS BIT(1)
> @@ -977,12 +978,41 @@ int pmd_clear_huge(pmd_t *pmdp)
>   return 1;
>  }
>  
> -int pud_free_pmd_page(pud_t *pud, unsigned long addr)
> +int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)
>  {
> - return pud_none(*pud);
> + pte_t *table;
> + pmd_t pmd;
> +
> + pmd = READ_ONCE(*pmdp);
> + if (pmd_present(pmd)) {
> + table = pmd_page_vaddr(pmd);
> + pmd_clear(pmdp);
> + __flush_tlb_kernel_pgtable(addr);
> + pte_free_kernel(NULL, table);
> + }
> + return 1;
>  }
>  
> -int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
> +int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
>  {
> - return pmd_none(*pmd);
> + pmd_t *table;
> + pmd_t *entry;
> + pud_t pud;
> + unsigned long next, end;
> +
> + pud = READ_ONCE(*pudp);
> + if (pud_present(pud)) {

Just some stylistic stuff, but please can you rewrite this as:

if (!pud_present(pud) || VM_WARN_ON(!pud_table(pud)))
return 1;

similarly for the pmd/pte code above.

> + table = pud_page_vaddr(pud);
> + entry = table;

Could you rename entry -> pmdp, please?

> + next = addr;
> + end = addr + PUD_SIZE;
> + do {
> + pmd_free_pte_page(entry, next);
> + } while (entry++, next += PMD_SIZE, next != end);
> +
> + pud_clear(pudp);
> + __flush_tlb_kernel_pgtable(addr);
> + pmd_free(NULL, table);
> + }
> + return 1;

So with these patches, we only ever return 1 from these helpers. It looks
like the same is true for x86, so how about we make them void and move the
calls inside the conditionals in lib/ioremap.c? Obviously, this would be a
separate patch on the end.

Will

Re: [PATCH v12 3/5] arm64: pgtable: Add p*d_page_vaddr helper macros

2018-06-04 Thread Will Deacon

On Fri, Jun 01, 2018 at 06:09:16PM +0530, Chintan Pandya wrote:
> Add helper macros to give virtual references to page
> tables. These will be used while freeing dangling
> page tables.
> 
> Signed-off-by: Chintan Pandya 
> ---
>  arch/arm64/include/asm/pgtable.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index 7c4c8f3..ef4047f 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -580,6 +580,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>  
>  #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
>  
> +#define pmd_page_vaddr(pmd) __va(pmd_page_paddr(pmd))
> +#define pud_page_vaddr(pud) __va(pud_page_paddr(pud))

Are these actually needed, or do pte_offset_kernel and pmd_offset do the
job already?

Will

Re: [PATCH v12 4/5] arm64: Implement page table free interfaces

2018-06-04 Thread Will Deacon

On Fri, Jun 01, 2018 at 06:09:17PM +0530, Chintan Pandya wrote:
> Implement pud_free_pmd_page() and pmd_free_pte_page().
> 
> Implementation requires,
>  1) Clearing off the current pud/pmd entry
>  2) Invalidate TLB which could have previously
> valid but not stale entry
>  3) Freeing of the un-used next level page tables

Please can you rewrite this describing the problem that you're solving,
rather than a brief summary of some requirements?

> Signed-off-by: Chintan Pandya 
> ---
>  arch/arm64/mm/mmu.c | 38 ++
>  1 file changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 8ae5d7a..6e7e16c 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -45,6 +45,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define NO_BLOCK_MAPPINGSBIT(0)
>  #define NO_CONT_MAPPINGS BIT(1)
> @@ -977,12 +978,41 @@ int pmd_clear_huge(pmd_t *pmdp)
>   return 1;
>  }
>  
> -int pud_free_pmd_page(pud_t *pud, unsigned long addr)
> +int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)
>  {
> - return pud_none(*pud);
> + pte_t *table;
> + pmd_t pmd;
> +
> + pmd = READ_ONCE(*pmdp);
> + if (pmd_present(pmd)) {
> + table = pmd_page_vaddr(pmd);
> + pmd_clear(pmdp);
> + __flush_tlb_kernel_pgtable(addr);
> + pte_free_kernel(NULL, table);
> + }
> + return 1;
>  }
>  
> -int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
> +int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
>  {
> - return pmd_none(*pmd);
> + pmd_t *table;
> + pmd_t *entry;
> + pud_t pud;
> + unsigned long next, end;
> +
> + pud = READ_ONCE(*pudp);
> + if (pud_present(pud)) {

Just some stylistic stuff, but please can you rewrite this as:

if (!pud_present(pud) || VM_WARN_ON(!pud_table(pud)))
return 1;

similarly for the pmd/pte code above.

> + table = pud_page_vaddr(pud);
> + entry = table;

Could you rename entry -> pmdp, please?

> + next = addr;
> + end = addr + PUD_SIZE;
> + do {
> + pmd_free_pte_page(entry, next);
> + } while (entry++, next += PMD_SIZE, next != end);
> +
> + pud_clear(pudp);
> + __flush_tlb_kernel_pgtable(addr);
> + pmd_free(NULL, table);
> + }
> + return 1;

So with these patches, we only ever return 1 from these helpers. It looks
like the same is true for x86, so how about we make them void and move the
calls inside the conditionals in lib/ioremap.c? Obviously, this would be a
separate patch on the end.

Will

[GIT PULL] scheduler changes for v4.18

2018-06-04 Thread Ingo Molnar

Linus,

Please pull the latest sched-core-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
sched-core-for-linus

   # HEAD: 2539fc82aa9b07d968cf9ba1ffeec3e0416ac721 sched/fair: Update util_est 
before updating schedutil

The main changes in this cycle were:

 - power-aware scheduling improvements (Patrick Bellasi)

 - NUMA balancing improvements (Mel Gorman)

 - vCPU scheduling fixes (Rohit Jain)

 Thanks,

Ingo

-->
Claudio Scordino (1):
  sched/deadline/Documentation: Add overrun signal and GRUB-PA documentation

Mel Gorman (1):
  sched/numa: Stagger NUMA balancing scan periods for new threads

Patrick Bellasi (2):
  sched/cpufreq: Modify aggregate utilization to always include blocked 
FAIR utilization
  sched/fair: Update util_est before updating schedutil

Rohit Jain (2):
  sched/core: Don't schedule threads on pre-empted vCPUs
  sched/core: Distinguish between idle_cpu() calls based on desired effect, 
introduce available_idle_cpu()

Sebastian Andrzej Siewior (1):
  sched/wait: Include  in 

Viresh Kumar (2):
  sched/fair: Rearrange select_task_rq_fair() to optimize it
  sched/fair: Avoid calling sync_entity_load_avg() unnecessarily


 Documentation/scheduler/sched-deadline.txt |  25 +-
 include/linux/sched.h  |   1 +
 include/linux/swait.h  |   1 +
 kernel/sched/core.c|  39 +-
 kernel/sched/cpufreq_schedutil.c   |  17 ++---
 kernel/sched/fair.c| 117 +++--
 kernel/sched/sched.h   |   6 ++
 7 files changed, 137 insertions(+), 69 deletions(-)

[GIT PULL] scheduler changes for v4.18

2018-06-04 Thread Ingo Molnar

Linus,

Please pull the latest sched-core-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
sched-core-for-linus

   # HEAD: 2539fc82aa9b07d968cf9ba1ffeec3e0416ac721 sched/fair: Update util_est 
before updating schedutil

The main changes in this cycle were:

 - power-aware scheduling improvements (Patrick Bellasi)

 - NUMA balancing improvements (Mel Gorman)

 - vCPU scheduling fixes (Rohit Jain)

 Thanks,

Ingo

-->
Claudio Scordino (1):
  sched/deadline/Documentation: Add overrun signal and GRUB-PA documentation

Mel Gorman (1):
  sched/numa: Stagger NUMA balancing scan periods for new threads

Patrick Bellasi (2):
  sched/cpufreq: Modify aggregate utilization to always include blocked 
FAIR utilization
  sched/fair: Update util_est before updating schedutil

Rohit Jain (2):
  sched/core: Don't schedule threads on pre-empted vCPUs
  sched/core: Distinguish between idle_cpu() calls based on desired effect, 
introduce available_idle_cpu()

Sebastian Andrzej Siewior (1):
  sched/wait: Include  in 

Viresh Kumar (2):
  sched/fair: Rearrange select_task_rq_fair() to optimize it
  sched/fair: Avoid calling sync_entity_load_avg() unnecessarily


 Documentation/scheduler/sched-deadline.txt |  25 +-
 include/linux/sched.h  |   1 +
 include/linux/swait.h  |   1 +
 kernel/sched/core.c|  39 +-
 kernel/sched/cpufreq_schedutil.c   |  17 ++---
 kernel/sched/fair.c| 117 +++--
 kernel/sched/sched.h   |   6 ++
 7 files changed, 137 insertions(+), 69 deletions(-)

[PATCH] rtc: mrst: switch to devm_rtc_allocate_device

2018-06-04 Thread Alexandre Belloni

Switch to devm_rtc_allocate_device/rtc_device_unregister to allow for
further improvements.

Signed-off-by: Alexandre Belloni 
---
 drivers/rtc/rtc-mrst.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/rtc/rtc-mrst.c b/drivers/rtc/rtc-mrst.c
index fcb9de5218b2..a3c528f7e833 100644
--- a/drivers/rtc/rtc-mrst.c
+++ b/drivers/rtc/rtc-mrst.c
@@ -341,12 +341,11 @@ static int vrtc_mrst_do_probe(struct device *dev, struct 
resource *iomem,
mrst_rtc.dev = dev;
dev_set_drvdata(dev, _rtc);
 
-   mrst_rtc.rtc = rtc_device_register(driver_name, dev,
-   _rtc_ops, THIS_MODULE);
-   if (IS_ERR(mrst_rtc.rtc)) {
-   retval = PTR_ERR(mrst_rtc.rtc);
-   goto cleanup0;
-   }
+   mrst_rtc.rtc = devm_rtc_allocate_device(dev);
+   if (IS_ERR(mrst_rtc.rtc))
+   return PTR_ERR(mrst_rtc.rtc);
+
+   mrst_rtc.rtc->ops = _rtc_ops;
 
rename_region(iomem, dev_name(_rtc.rtc->dev));
 
@@ -365,14 +364,21 @@ static int vrtc_mrst_do_probe(struct device *dev, struct 
resource *iomem,
if (retval < 0) {
dev_dbg(dev, "IRQ %d is already in use, err %d\n",
rtc_irq, retval);
-   goto cleanup1;
+   goto cleanup0;
}
}
+
+   retval = rtc_register_device(mrst_rtc.rtc);
+   if (retval) {
+   retval = PTR_ERR(mrst_rtc.rtc);
+   goto cleanup1;
+   }
+
dev_dbg(dev, "initialised\n");
return 0;
 
 cleanup1:
-   rtc_device_unregister(mrst_rtc.rtc);
+   free_irq(rtc_irq, mrst->rtc);
 cleanup0:
mrst_rtc.dev = NULL;
release_mem_region(iomem->start, resource_size(iomem));
@@ -397,7 +403,6 @@ static void rtc_mrst_do_remove(struct device *dev)
if (mrst->irq)
free_irq(mrst->irq, mrst->rtc);
 
-   rtc_device_unregister(mrst->rtc);
mrst->rtc = NULL;
 
iomem = mrst->iomem;
-- 
2.17.1

[PATCH] rtc: mrst: switch to devm_rtc_allocate_device

2018-06-04 Thread Alexandre Belloni

Switch to devm_rtc_allocate_device/rtc_device_unregister to allow for
further improvements.

Signed-off-by: Alexandre Belloni 
---
 drivers/rtc/rtc-mrst.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/rtc/rtc-mrst.c b/drivers/rtc/rtc-mrst.c
index fcb9de5218b2..a3c528f7e833 100644
--- a/drivers/rtc/rtc-mrst.c
+++ b/drivers/rtc/rtc-mrst.c
@@ -341,12 +341,11 @@ static int vrtc_mrst_do_probe(struct device *dev, struct 
resource *iomem,
mrst_rtc.dev = dev;
dev_set_drvdata(dev, _rtc);
 
-   mrst_rtc.rtc = rtc_device_register(driver_name, dev,
-   _rtc_ops, THIS_MODULE);
-   if (IS_ERR(mrst_rtc.rtc)) {
-   retval = PTR_ERR(mrst_rtc.rtc);
-   goto cleanup0;
-   }
+   mrst_rtc.rtc = devm_rtc_allocate_device(dev);
+   if (IS_ERR(mrst_rtc.rtc))
+   return PTR_ERR(mrst_rtc.rtc);
+
+   mrst_rtc.rtc->ops = _rtc_ops;
 
rename_region(iomem, dev_name(_rtc.rtc->dev));
 
@@ -365,14 +364,21 @@ static int vrtc_mrst_do_probe(struct device *dev, struct 
resource *iomem,
if (retval < 0) {
dev_dbg(dev, "IRQ %d is already in use, err %d\n",
rtc_irq, retval);
-   goto cleanup1;
+   goto cleanup0;
}
}
+
+   retval = rtc_register_device(mrst_rtc.rtc);
+   if (retval) {
+   retval = PTR_ERR(mrst_rtc.rtc);
+   goto cleanup1;
+   }
+
dev_dbg(dev, "initialised\n");
return 0;
 
 cleanup1:
-   rtc_device_unregister(mrst_rtc.rtc);
+   free_irq(rtc_irq, mrst->rtc);
 cleanup0:
mrst_rtc.dev = NULL;
release_mem_region(iomem->start, resource_size(iomem));
@@ -397,7 +403,6 @@ static void rtc_mrst_do_remove(struct device *dev)
if (mrst->irq)
free_irq(mrst->irq, mrst->rtc);
 
-   rtc_device_unregister(mrst->rtc);
mrst->rtc = NULL;
 
iomem = mrst->iomem;
-- 
2.17.1

[PATCH] rtc: sunxi: fix possible race condition

2018-06-04 Thread Alexandre Belloni

The IRQ is requested before the struct rtc is allocated and registered, but
this struct is used in the IRQ handler. This may lead to a NULL pointer
dereference.

Switch to devm_rtc_allocate_device/rtc_register_device to allocate the rtc
before requesting the IRQ.

Signed-off-by: Alexandre Belloni 
---
 drivers/rtc/rtc-sunxi.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/rtc/rtc-sunxi.c b/drivers/rtc/rtc-sunxi.c
index dadbf8b324ad..21865d3d8fe8 100644
--- a/drivers/rtc/rtc-sunxi.c
+++ b/drivers/rtc/rtc-sunxi.c
@@ -445,6 +445,10 @@ static int sunxi_rtc_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, chip);
chip->dev = >dev;
 
+   chip->rtc = devm_rtc_allocate_device(>dev);
+   if (IS_ERR(chip->rtc))
+   return PTR_ERR(chip->rtc);
+
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
chip->base = devm_ioremap_resource(>dev, res);
if (IS_ERR(chip->base))
@@ -481,11 +485,12 @@ static int sunxi_rtc_probe(struct platform_device *pdev)
writel(SUNXI_ALRM_IRQ_STA_CNT_IRQ_PEND, chip->base +
SUNXI_ALRM_IRQ_STA);
 
-   chip->rtc = rtc_device_register("rtc-sunxi", >dev,
-   _rtc_ops, THIS_MODULE);
-   if (IS_ERR(chip->rtc)) {
+   chip->rtc->ops = _rtc_ops;
+
+   ret = rtc_register_device(chip->rtc);
+   if (ret) {
dev_err(>dev, "unable to register device\n");
-   return PTR_ERR(chip->rtc);
+   return ret;
}
 
dev_info(>dev, "RTC enabled\n");
@@ -493,18 +498,8 @@ static int sunxi_rtc_probe(struct platform_device *pdev)
return 0;
 }
 
-static int sunxi_rtc_remove(struct platform_device *pdev)
-{
-   struct sunxi_rtc_dev *chip = platform_get_drvdata(pdev);
-
-   rtc_device_unregister(chip->rtc);
-
-   return 0;
-}
-
 static struct platform_driver sunxi_rtc_driver = {
.probe  = sunxi_rtc_probe,
-   .remove = sunxi_rtc_remove,
.driver = {
.name   = "sunxi-rtc",
.of_match_table = sunxi_rtc_dt_ids,
-- 
2.17.1

[PATCH] rtc: sunxi: fix possible race condition

2018-06-04 Thread Alexandre Belloni

The IRQ is requested before the struct rtc is allocated and registered, but
this struct is used in the IRQ handler. This may lead to a NULL pointer
dereference.

Switch to devm_rtc_allocate_device/rtc_register_device to allocate the rtc
before requesting the IRQ.

Signed-off-by: Alexandre Belloni 
---
 drivers/rtc/rtc-sunxi.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/rtc/rtc-sunxi.c b/drivers/rtc/rtc-sunxi.c
index dadbf8b324ad..21865d3d8fe8 100644
--- a/drivers/rtc/rtc-sunxi.c
+++ b/drivers/rtc/rtc-sunxi.c
@@ -445,6 +445,10 @@ static int sunxi_rtc_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, chip);
chip->dev = >dev;
 
+   chip->rtc = devm_rtc_allocate_device(>dev);
+   if (IS_ERR(chip->rtc))
+   return PTR_ERR(chip->rtc);
+
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
chip->base = devm_ioremap_resource(>dev, res);
if (IS_ERR(chip->base))
@@ -481,11 +485,12 @@ static int sunxi_rtc_probe(struct platform_device *pdev)
writel(SUNXI_ALRM_IRQ_STA_CNT_IRQ_PEND, chip->base +
SUNXI_ALRM_IRQ_STA);
 
-   chip->rtc = rtc_device_register("rtc-sunxi", >dev,
-   _rtc_ops, THIS_MODULE);
-   if (IS_ERR(chip->rtc)) {
+   chip->rtc->ops = _rtc_ops;
+
+   ret = rtc_register_device(chip->rtc);
+   if (ret) {
dev_err(>dev, "unable to register device\n");
-   return PTR_ERR(chip->rtc);
+   return ret;
}
 
dev_info(>dev, "RTC enabled\n");
@@ -493,18 +498,8 @@ static int sunxi_rtc_probe(struct platform_device *pdev)
return 0;
 }
 
-static int sunxi_rtc_remove(struct platform_device *pdev)
-{
-   struct sunxi_rtc_dev *chip = platform_get_drvdata(pdev);
-
-   rtc_device_unregister(chip->rtc);
-
-   return 0;
-}
-
 static struct platform_driver sunxi_rtc_driver = {
.probe  = sunxi_rtc_probe,
-   .remove = sunxi_rtc_remove,
.driver = {
.name   = "sunxi-rtc",
.of_match_table = sunxi_rtc_dt_ids,
-- 
2.17.1

Re: [PATCH v2] nvme: trace: add disk name to tracepoints

2018-06-04 Thread Christoph Hellwig

On Mon, Jun 04, 2018 at 02:48:57PM +0300, Sagi Grimberg wrote:
>
>> Add disk name to tracepoints so we can better destinguish between
>> individual disks in the trace output.
>>
>> Signed-off-by: Johannes Thumshirn 
>> Reviewed-by: Sagi Grimberg 
>
> Nit: s/destinguish/distinguish/g
>
> Christoph, can you fix it up when applying or you want me
> to do it?

I can fix it up.

Re: [PATCH v2] nvme: trace: add disk name to tracepoints

2018-06-04 Thread Christoph Hellwig

On Mon, Jun 04, 2018 at 02:48:57PM +0300, Sagi Grimberg wrote:
>
>> Add disk name to tracepoints so we can better destinguish between
>> individual disks in the trace output.
>>
>> Signed-off-by: Johannes Thumshirn 
>> Reviewed-by: Sagi Grimberg 
>
> Nit: s/destinguish/distinguish/g
>
> Christoph, can you fix it up when applying or you want me
> to do it?

I can fix it up.

Re: WARNING and PANIC in irq_matrix_free

2018-06-04 Thread Dou Liyang


Hi Thomas,

At 06/04/2018 07:17 PM, Thomas Gleixner wrote:

On Mon, 4 Jun 2018, Dou Liyang wrote:

Here, why didn't we avoid this cleanup by

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index a75de0792942..0cc59646755f 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -821,6 +821,9 @@ static void free_moved_vector(struct apic_chip_data
*apicd)
  */
 WARN_ON_ONCE(managed);

+   if (!vector)
+   return;
+
 trace_vector_free_moved(apicd->irq, cpu, vector, managed);
 irq_matrix_free(vector_matrix, cpu, vector, managed);
 per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;

Is there something I didn't consider with? ;-)


Well, that just prevents the warning, but the hlist is already
corrupted. So you'd just cure the symptom ...



I see.


I'm about to send a patch series which addresses that. Just need to finish
writing changelogs.



Thank you for telling me that.

Thanks,
dou

Re: WARNING and PANIC in irq_matrix_free

2018-06-04 Thread Dou Liyang


Hi Thomas,

At 06/04/2018 07:17 PM, Thomas Gleixner wrote:

On Mon, 4 Jun 2018, Dou Liyang wrote:

Here, why didn't we avoid this cleanup by

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index a75de0792942..0cc59646755f 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -821,6 +821,9 @@ static void free_moved_vector(struct apic_chip_data
*apicd)
  */
 WARN_ON_ONCE(managed);

+   if (!vector)
+   return;
+
 trace_vector_free_moved(apicd->irq, cpu, vector, managed);
 irq_matrix_free(vector_matrix, cpu, vector, managed);
 per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;

Is there something I didn't consider with? ;-)


Well, that just prevents the warning, but the hlist is already
corrupted. So you'd just cure the symptom ...



I see.


I'm about to send a patch series which addresses that. Just need to finish
writing changelogs.



Thank you for telling me that.

Thanks,
dou

linux-next: Tree for Jun 4

2018-06-04 Thread Stephen Rothwell

Hi all,

Changes since 20180601:

The overlayfs tree gained a conflict against Linus' tree.

The net-next tree still had its build failure for which I disabled
BPFILTER.

The drm tree gained a conflict against Linus' tree.

The drm-msm tree gained a build failure for which I reverted a commit.

The vfio tree gained a conflict against Linus' tree.

The kvm tree gained conflicts against Linus' and the arm64 trees.

The akpm-current tree gained a conflict against the nvdimm tree.

Non-merge commits (relative to Linus' tree): 10961
 10688 files changed, 481577 insertions(+), 403670 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 278 trees (counting Linus' and 64 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (29dcea88779c Linux 4.17)
Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging kbuild-current/fixes (b04e217704b7 Linux 4.17-rc7)
Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4)
Merging arm-current/fixes (92d44a42af81 ARM: fix kill( ,SIGFPE) breakage)
Merging arm64-fixes/for-next/fixes (82034c23fcbc arm64: Make sure permission 
updates happen for pmd/pud)
Merging m68k-current/for-linus (ecd685580c8f m68k/mac: Remove bogus "FIXME" 
comment)
Merging powerpc-fixes/fixes (faf37c44a105 powerpc/64s: Clear PCR on boot)
Merging sparc/master (fff75eb2a08c Merge tag 'errseq-v4.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (885892fb378d mlx4_core: restore optimal ICM memory 
allocation)
Merging bpf/master (36f9814a494a bpf: fix uapi hole for 32 bit compat 
applications)
Merging ipsec/master (38369f54d97d xfrm Fix potential error pointer dereference 
in xfrm_bundle_create.)
Merging netfilter/master (31875d4970ba ipvs: register conntrack hooks for ftp)
Merging ipvs/master (312564269535 net: netsec: reduce DMA mask to 40 bits)
Merging wireless-drivers/master (ab1068d6866e iwlwifi: pcie: compare with 
number of IRQs requested for, not number of CPUs)
Merging mac80211/master (312564269535 net: netsec: reduce DMA mask to 40 bits)
Merging rdma-fixes/for-rc (a840c93ca758 IB/core: Fix error code for invalid GID 
entry)
Merging sound-current/for-linus (009f8c90f571 ALSA: hda - Fix runtime PM)
Merging sound-asoc-fixes/for-linus (b88fadcb12f7 Merge branch 'asoc-4.17' into 
asoc-linus)
Merging regmap-fixes/for-linus (4b1b7043a286 Merge branch 'regmap-4.17' into 
regmap-linus)
Merging regulator-fixes/for-linus (1ead77b61050 Merge branch 'regulator-4.17' 
into regulator-linus)
Merging spi-fixes/for-linus (ea783ec61508 Merge branch 'spi-4.17' into 
spi-linus)
Merging pci-current/for-linus (0cf22d6b317c PCI: Add "PCIe" to 
pcie_print_link_status() messages)
Merging driver-core.current/driver-core-linus (6da6c0db5316 Linux v4.17-rc3)
Merging tty.current/tty-linus (6da6c0db5316 Linux v4.17-rc3)
Merging usb.current/usb-linus (771c577c23ba Linux 4.17-rc6)
Merging usb-gadget-fixes/fixes (6d08b06e67cd Linux 4.17-rc2)
Merging usb-serial-fixes/usb-linus (75bc37fefc44 Linux 4.17-rc4)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (60cc43fc8884 Linux 4.17-rc1)
Merging

linux-next: Tree for Jun 4

2018-06-04 Thread Stephen Rothwell

Hi all,

Changes since 20180601:

The overlayfs tree gained a conflict against Linus' tree.

The net-next tree still had its build failure for which I disabled
BPFILTER.

The drm tree gained a conflict against Linus' tree.

The drm-msm tree gained a build failure for which I reverted a commit.

The vfio tree gained a conflict against Linus' tree.

The kvm tree gained conflicts against Linus' and the arm64 trees.

The akpm-current tree gained a conflict against the nvdimm tree.

Non-merge commits (relative to Linus' tree): 10961
 10688 files changed, 481577 insertions(+), 403670 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 278 trees (counting Linus' and 64 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (29dcea88779c Linux 4.17)
Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging kbuild-current/fixes (b04e217704b7 Linux 4.17-rc7)
Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4)
Merging arm-current/fixes (92d44a42af81 ARM: fix kill( ,SIGFPE) breakage)
Merging arm64-fixes/for-next/fixes (82034c23fcbc arm64: Make sure permission 
updates happen for pmd/pud)
Merging m68k-current/for-linus (ecd685580c8f m68k/mac: Remove bogus "FIXME" 
comment)
Merging powerpc-fixes/fixes (faf37c44a105 powerpc/64s: Clear PCR on boot)
Merging sparc/master (fff75eb2a08c Merge tag 'errseq-v4.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (885892fb378d mlx4_core: restore optimal ICM memory 
allocation)
Merging bpf/master (36f9814a494a bpf: fix uapi hole for 32 bit compat 
applications)
Merging ipsec/master (38369f54d97d xfrm Fix potential error pointer dereference 
in xfrm_bundle_create.)
Merging netfilter/master (31875d4970ba ipvs: register conntrack hooks for ftp)
Merging ipvs/master (312564269535 net: netsec: reduce DMA mask to 40 bits)
Merging wireless-drivers/master (ab1068d6866e iwlwifi: pcie: compare with 
number of IRQs requested for, not number of CPUs)
Merging mac80211/master (312564269535 net: netsec: reduce DMA mask to 40 bits)
Merging rdma-fixes/for-rc (a840c93ca758 IB/core: Fix error code for invalid GID 
entry)
Merging sound-current/for-linus (009f8c90f571 ALSA: hda - Fix runtime PM)
Merging sound-asoc-fixes/for-linus (b88fadcb12f7 Merge branch 'asoc-4.17' into 
asoc-linus)
Merging regmap-fixes/for-linus (4b1b7043a286 Merge branch 'regmap-4.17' into 
regmap-linus)
Merging regulator-fixes/for-linus (1ead77b61050 Merge branch 'regulator-4.17' 
into regulator-linus)
Merging spi-fixes/for-linus (ea783ec61508 Merge branch 'spi-4.17' into 
spi-linus)
Merging pci-current/for-linus (0cf22d6b317c PCI: Add "PCIe" to 
pcie_print_link_status() messages)
Merging driver-core.current/driver-core-linus (6da6c0db5316 Linux v4.17-rc3)
Merging tty.current/tty-linus (6da6c0db5316 Linux v4.17-rc3)
Merging usb.current/usb-linus (771c577c23ba Linux 4.17-rc6)
Merging usb-gadget-fixes/fixes (6d08b06e67cd Linux 4.17-rc2)
Merging usb-serial-fixes/usb-linus (75bc37fefc44 Linux 4.17-rc4)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (60cc43fc8884 Linux 4.17-rc1)
Merging

< 5 6 7 8 9 10 11 12 13 14 >

901 - 1000 of 1660 matches

Mail list logo