date:20190429

RE: [EXT] Re: [PATCH 1/3] dt-bindings: i2c: add optional mul-value property to binding

2019-04-29 Thread Chuanhua Han



> -Original Message-
> From: Uwe Kleine-König 
> Sent: 2019年4月30日 14:38
> To: Chuanhua Han 
> Cc: robh...@kernel.org; mark.rutl...@arm.com; shawn...@kernel.org;
> s.ha...@pengutronix.de; Leo Li ;
> linux-kernel@vger.kernel.org; devicet...@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; linux-...@vger.kernel.org;
> ker...@pengutronix.de; dl-linux-imx ;
> feste...@gmail.com; wsa+rene...@sang-engineering.com; e...@deif.com;
> li...@rempel-privat.de; Sumit Batra ;
> l.st...@pengutronix.de; p...@axentia.se
> Subject: [EXT] Re: [PATCH 1/3] dt-bindings: i2c: add optional mul-value
> property to binding
> 
> Caution: EXT Email
> 
> On Tue, Apr 30, 2019 at 12:32:40PM +0800, Chuanhua Han wrote:
> > NXP Layerscape SoC have up to three MUL options available for all
> > divider values, we choice of MUL determines the internal monitor rate
> > of the I2C bus (SCL and SDA signals):
> > A lower MUL value results in a higher sampling rate of the I2C signals.
> > A higher MUL value results in a lower sampling rate of the I2C signals.
> >
> > So in Optional properties we added our custom mul-value property in
> > the binding to select which mul option for the device tree i2c
> > controller node.
> >
> > Signed-off-by: Chuanhua Han 
> > ---
> >  Documentation/devicetree/bindings/i2c/i2c-imx.txt | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/i2c/i2c-imx.txt
> > b/Documentation/devicetree/bindings/i2c/i2c-imx.txt
> > index b967544590e8..ba8e7b7b3fa8 100644
> > --- a/Documentation/devicetree/bindings/i2c/i2c-imx.txt
> > +++ b/Documentation/devicetree/bindings/i2c/i2c-imx.txt
> > @@ -18,6 +18,9 @@ Optional properties:
> >  - sda-gpios: specify the gpio related to SDA pin
> >  - pinctrl: add extra pinctrl to configure i2c pins to gpio function for i2c
> >bus recovery, call it "gpio" state
> > +- mul-value: NXP Layerscape SoC have up to three MUL options
> > +available for all I2C divider values, it describes which MUL we
> > +choose to use for the driver, the values should be 1,2,4.
> 
> Indention is broken.
Yes, I also found this problem, next version I will fix the indent problem
> 
> I wonder why this needs to be configurable on a per-machine/device level.
> What is the trade-off?
According to NXP Layerscape SoC Reference Manual, there are three MUL 
options for i2c controller to configure i2c Bus Frequency Divider Register 
(IBFD)
to determine the clock Frequency of i2c. 
Some socs (such as ls1046a) have the best performance when MUL=4, 
and the default is MUL=1. 
This option is optional and can be configured by device tree
> 
> Best regards
> Uwe
> 
> --
> Pengutronix e.K.   | Uwe Kleine-König
> |
> Industrial Linux Solutions |
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.pe
> ngutronix.de%2F&data=02%7C01%7Cchuanhua.han%40nxp.com%7C158
> 21c9cf4c449f2d5ea08d6cd367aaa%7C686ea1d3bc2b4c6fa92cd99c5c301635
> %7C0%7C0%7C636922031201957736&sdata=8jKPN%2FSJghgOF890NTr
> %2FC%2B9PsFpEr64%2B%2FXHLSX5Cipo%3D&reserved=0  |

Re: [PATCH 0/14] v2 multi-die/package topology support

2019-04-29 Thread Len Brown

On Tue, Feb 26, 2019 at 2:05 PM Peter Zijlstra  wrote:
>
> On Tue, Feb 26, 2019 at 01:19:58AM -0500, Len Brown wrote:
> >  Documentation/cputopology.txt| 72 ++-
> >  Documentation/x86/topology.txt   |  6 +-
> >  arch/x86/include/asm/processor.h |  5 +-
> >  arch/x86/include/asm/smp.h   |  1 +
> >  arch/x86/include/asm/topology.h  |  5 ++
> >  arch/x86/kernel/cpu/topology.c   | 85 
> > +---
> >  arch/x86/kernel/smpboot.c| 73 +++-
> >  arch/x86/xen/smp_pv.c|  1 +
> >  drivers/base/topology.c  | 22 +++
> >  drivers/hwmon/coretemp.c |  9 +--
> >  drivers/powercap/intel_rapl.c| 75 +---
> >  drivers/thermal/intel/x86_pkg_temp_thermal.c |  9 +--
> >  include/linux/topology.h |  6 ++
> >  13 files changed, 276 insertions(+), 93 deletions(-)
>
> Should we not also have changes to
> arch/x86/kernel/cpu/proc.c:show_cpuinfo_cores() ?

Good question.
I was thinking that /proc/cpuinfo was sort of the legacy API, and
adding a field might break something.
While adding an attribute to sysfs topology directory was the
compatible/safe way to make additions.

/proc/cpuinfo has these fields today:

physical id : 0
this is the physical package id
siblings : 8
this is the count of cpus in the same package
core id : 3
this is cpu_core_id
cpu cores : 4
this is booted_cores

If one were to make a change here, I'd consider adding the (physical) die_id,
though it is already in sysfs topology as an attribute.

Not sure if it would then make sense to print the count of cpus in the die.
Not sure what I'd name it, and this info is already in sysfs as a map and list.

Len Brown, Intel Open Source Technology Center

[PATCH v4 3/3] dt-bindings: power: supply: Add bindings for Microchip UCS1002

2019-04-29 Thread Andrey Smirnov

Add bindings for Microchip UCS1002 Programmable USB Port Power
Controller with Charger Emulation.

Signed-off-by: Andrey Smirnov 
Cc: Enric Balletbo Serra 
Cc: Chris Healy 
Cc: Lucas Stach 
Cc: Fabio Estevam 
Cc: Guenter Roeck 
Cc: Rob Herring 
Cc: devicet...@vger.kernel.org
Cc: Sebastian Reichel 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@vger.kernel.org
---
 .../power/supply/microchip,ucs1002.txt| 27 +++
 1 file changed, 27 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/power/supply/microchip,ucs1002.txt

diff --git 
a/Documentation/devicetree/bindings/power/supply/microchip,ucs1002.txt 
b/Documentation/devicetree/bindings/power/supply/microchip,ucs1002.txt
new file mode 100644
index ..021fd7aba75e
--- /dev/null
+++ b/Documentation/devicetree/bindings/power/supply/microchip,ucs1002.txt
@@ -0,0 +1,27 @@
+Microchip UCS1002 USB Port Power Controller
+
+Required properties:
+- compatible   : Should be "microchip,ucs1002";
+- reg  : I2C slave address
+
+Optional properties:
+- interrupts-extended  : A list of interrupts lines present (could be either
+ corresponding to A_DET# pin, ALERT# pin, or both)
+- interrupt-names  : A list of interrupt names. Should contain (if
+ present):
+ - "a_det" for line connected to A_DET# pin
+ - "alert" for line connected to ALERT# pin
+ Both are expected to be IRQ_TYPE_EDGE_BOTH
+Example:
+
+&i2c3 {
+   charger@32 {
+   compatible = "microchip,ucs1002";
+   pinctrl-names = "default";
+   pinctrl-0 = <&pinctrl_ucs1002_pins>;
+   reg = <0x32>;
+   interrupts-extended = <&gpio5 2 IRQ_TYPE_EDGE_BOTH>,
+ <&gpio3 21 IRQ_TYPE_EDGE_BOTH>;
+   interrupt-names = "a_det", "alert";
+   };
+};
-- 
2.20.1

[PATCH v4 2/3] power: supply: Add driver for Microchip UCS1002

2019-04-29 Thread Andrey Smirnov

Add driver for Microchip UCS1002 Programmable USB Port Power
Controller with Charger Emulation. The driver exposed a power supply
device to control/monitor various parameter of the device as well as a
regulator to allow controlling VBUS line.

Signed-off-by: Enric Balletbo Serra 
Signed-off-by: Andrey Smirnov 
Cc: Chris Healy 
Cc: Lucas Stach 
Cc: Fabio Estevam 
Cc: Guenter Roeck 
Cc: Sebastian Reichel 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@vger.kernel.org
---
 drivers/power/supply/Kconfig |   9 +
 drivers/power/supply/Makefile|   1 +
 drivers/power/supply/ucs1002_power.c | 646 +++
 3 files changed, 656 insertions(+)
 create mode 100644 drivers/power/supply/ucs1002_power.c

diff --git a/drivers/power/supply/Kconfig b/drivers/power/supply/Kconfig
index e901b9879e7e..c614c8a196f3 100644
--- a/drivers/power/supply/Kconfig
+++ b/drivers/power/supply/Kconfig
@@ -660,4 +660,13 @@ config FUEL_GAUGE_SC27XX
 Say Y here to enable support for fuel gauge with SC27XX
 PMIC chips.
 
+config CHARGER_UCS1002
+tristate "Microchip UCS1002 USB Port Power Controller"
+   depends on I2C
+   depends on OF
+   select REGMAP_I2C
+   help
+ Say Y to enable support for Microchip UCS1002 Programmable
+ USB Port Power Controller with Charger Emulation.
+
 endif # POWER_SUPPLY
diff --git a/drivers/power/supply/Makefile b/drivers/power/supply/Makefile
index b731c2a9b695..c56803a9e4fe 100644
--- a/drivers/power/supply/Makefile
+++ b/drivers/power/supply/Makefile
@@ -87,3 +87,4 @@ obj-$(CONFIG_AXP288_CHARGER)  += axp288_charger.o
 obj-$(CONFIG_CHARGER_CROS_USBPD)   += cros_usbpd-charger.o
 obj-$(CONFIG_CHARGER_SC2731)   += sc2731_charger.o
 obj-$(CONFIG_FUEL_GAUGE_SC27XX)+= sc27xx_fuel_gauge.o
+obj-$(CONFIG_CHARGER_UCS1002)  += ucs1002_power.o
diff --git a/drivers/power/supply/ucs1002_power.c 
b/drivers/power/supply/ucs1002_power.c
new file mode 100644
index ..d66b4eff9b7a
--- /dev/null
+++ b/drivers/power/supply/ucs1002_power.c
@@ -0,0 +1,646 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Driver for UCS1002 Programmable USB Port Power Controller
+ *
+ * Copyright (C) 2019 Zodiac Inflight Innovations
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* UCS1002 Registers */
+#define UCS1002_REG_CURRENT_MEASUREMENT0x00
+
+/*
+ * The Total Accumulated Charge registers store the total accumulated
+ * charge delivered from the VS source to a portable device. The total
+ * value is calculated using four registers, from 01h to 04h. The bit
+ * weighting of the registers is given in mA/hrs.
+ */
+#define UCS1002_REG_TOTAL_ACC_CHARGE   0x01
+
+/* Other Status Register */
+#define UCS1002_REG_OTHER_STATUS   0x0f
+#  define F_ADET_PIN   BIT(4)
+#  define F_CHG_ACTBIT(3)
+
+/* Interrupt Status */
+#define UCS1002_REG_INTERRUPT_STATUS   0x10
+#  define F_DISCHARGE_ERR  BIT(6)
+#  define F_RESET  BIT(5)
+#  define F_MIN_KEEP_OUT   BIT(4)
+#  define F_TSDBIT(3)
+#  define F_OVER_VOLT  BIT(2)
+#  define F_BACK_VOLT  BIT(1)
+#  define F_OVER_ILIM  BIT(0)
+
+/* Pin Status Register */
+#define UCS1002_REG_PIN_STATUS 0x14
+#  define UCS1002_PWR_STATE_MASK   0x03
+#  define F_PWR_EN_PIN BIT(6)
+#  define F_M2_PIN BIT(5)
+#  define F_M1_PIN BIT(4)
+#  define F_EM_EN_PIN  BIT(3)
+#  define F_SEL_PINBIT(2)
+#  define F_ACTIVE_MODE_MASK   GENMASK(5, 3)
+#  define F_ACTIVE_MODE_PASSTHROUGHF_M2_PIN
+#  define F_ACTIVE_MODE_DEDICATED  F_EM_EN_PIN
+#  define F_ACTIVE_MODE_BC12_DCP   (F_M2_PIN | F_EM_EN_PIN)
+#  define F_ACTIVE_MODE_BC12_SDP   F_M1_PIN
+#  define F_ACTIVE_MODE_BC12_CDP   (F_M1_PIN | F_M2_PIN | F_EM_EN_PIN)
+
+/* General Configuration Register */
+#define UCS1002_REG_GENERAL_CFG0x15
+#  define F_RATION_EN  BIT(3)
+
+/* Emulation Configuration Register */
+#define UCS1002_REG_EMU_CFG0x16
+
+/* Switch Configuration Register */
+#define UCS1002_REG_SWITCH_CFG 0x17
+#  define F_PIN_IGNORE BIT(7)
+#  define F_EM_EN_SET  BIT(5)
+#  define F_M2_SET BIT(4)
+#  define F_M1_SET BIT(3)
+#  define F_S0_SET BIT(2)
+#  define F_PWR_EN_SET BIT(1)
+#  define F_LATCH_SET  BIT(0)
+#  define V_SET_ACTIVE_MODE_MASK   GENMASK(5, 3)
+#  define V_SET_ACTIVE_MODE_PASSTHROUGHF_M2_SET
+#  define V_SET_ACTIVE_MODE_DEDICATED  F_EM_EN_SET
+#  define V_SET_ACTIVE_MODE_BC12_DCP   (F_M2_SET | F_EM_EN_SET)
+#  define V_SET_ACTIVE_MODE_BC12_SDP   F_M1_SE

[PATCH v4 1/3] power: supply: core: Add POWER_SUPPLY_HEALTH_OVERCURRENT constant

2019-04-29 Thread Andrey Smirnov

Add POWER_SUPPLY_HEALTH_OVERCURRENT constant in order to allow
singalling overcurrent condition via power supply health information.

Signed-off-by: Andrey Smirnov 
Reviewed-by: Guenter Roeck 
Cc: Enric Balletbo Serra 
Cc: Chris Healy 
Cc: Lucas Stach 
Cc: Fabio Estevam 
Cc: Guenter Roeck 
Cc: Sebastian Reichel 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@vger.kernel.org
---
 drivers/power/supply/power_supply_sysfs.c | 2 +-
 include/linux/power_supply.h  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/power/supply/power_supply_sysfs.c 
b/drivers/power/supply/power_supply_sysfs.c
index 5358a80d854f..153f4a6ca57c 100644
--- a/drivers/power/supply/power_supply_sysfs.c
+++ b/drivers/power/supply/power_supply_sysfs.c
@@ -62,7 +62,7 @@ static const char * const power_supply_charge_type_text[] = {
 static const char * const power_supply_health_text[] = {
"Unknown", "Good", "Overheat", "Dead", "Over voltage",
"Unspecified failure", "Cold", "Watchdog timer expire",
-   "Safety timer expire"
+   "Safety timer expire", "Over current"
 };
 
 static const char * const power_supply_technology_text[] = {
diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h
index 2f9c201a54d1..bdab14c7ca4d 100644
--- a/include/linux/power_supply.h
+++ b/include/linux/power_supply.h
@@ -57,6 +57,7 @@ enum {
POWER_SUPPLY_HEALTH_COLD,
POWER_SUPPLY_HEALTH_WATCHDOG_TIMER_EXPIRE,
POWER_SUPPLY_HEALTH_SAFETY_TIMER_EXPIRE,
+   POWER_SUPPLY_HEALTH_OVERCURRENT,
 };
 
 enum {
-- 
2.20.1

[PATCH v4 0/3] Driver for UCS1002

2019-04-29 Thread Andrey Smirnov



Everyone:

This small series adds a driver for UCS1002 Programmable USB Port
Power Controller with Charger Emulation. See [page] for product page
and [datasheet] for device dataseet. Hopefully each individual patch
is self explanatory.

Note that this series is a revival of the upstreaming effort by Enric
Balletbo Serra last version of which can be found at [original-effort]

Feedback is welcome!

Thanks,
Andrey Smirnov

Changes since [v3]:

- Added a check for negative values to ucs1002_set_usb_type()

Changes since [v2]:

- Fixed a bug pointed out by Lucas

Changes since [v1]:

- Moved IRQ trigger specification to DT

- Fixed silent error paths in probe()

- Dropped error message in ucs1002_set_max_current()

- Fixed license mismatch

- Changed the driver to configure the chip to BC1.2 CDP by default

- Made other small fixes as per feedback for v1

[v3] 
https://lore.kernel.org/lkml/20190429195349.20335-1-andrew.smir...@gmail.com
[v2] https://lore.kernel.org/lkml/20190429054741.7286-1-andrew.smir...@gmail.com
[v1] 
https://lore.kernel.org/lkml/20190417084457.28747-1-andrew.smir...@gmail.com/
[page] https://www.microchip.com/wwwproducts/en/UCS1002-2
[datasheet] 
https://ww1.microchip.com/downloads/en/DeviceDoc/UCS1002-2%20Data%20Sheet.pdf
[original-effort] 
https://lore.kernel.org/lkml/1460705181-10493-1-git-send-email-enric.balle...@collabora.com/

Andrey Smirnov (3):
  power: supply: core: Add POWER_SUPPLY_HEALTH_OVERCURRENT constant
  power: supply: Add driver for Microchip UCS1002
  dt-bindings: power: supply: Add bindings for Microchip UCS1002

 .../power/supply/microchip,ucs1002.txt|  27 +
 drivers/power/supply/Kconfig  |   9 +
 drivers/power/supply/Makefile |   1 +
 drivers/power/supply/power_supply_sysfs.c |   2 +-
 drivers/power/supply/ucs1002_power.c  | 646 ++
 include/linux/power_supply.h  |   1 +
 6 files changed, 685 insertions(+), 1 deletion(-)
 create mode 100644 
Documentation/devicetree/bindings/power/supply/microchip,ucs1002.txt
 create mode 100644 drivers/power/supply/ucs1002_power.c

-- 
2.20.1

[PATCH] quota: check time limit when back out space/inode change

2019-04-29 Thread Chengguang Xu

When we fail from allocating inode/space, we back out
the change we already did. In a special case which has
exceeded soft limit by the change, we should also check
time limit and reset it properly.

Signed-off-by: Chengguang Xu 
---
 fs/quota/dquot.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 9d7dfc47c854..58f15a083dd1 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1681,13 +1681,11 @@ int __dquot_alloc_space(struct inode *inode, qsize_t 
number, int flags)
if (!dquots[cnt])
continue;
spin_lock(&dquots[cnt]->dq_dqb_lock);
-   if (reserve) {
-   dquots[cnt]->dq_dqb.dqb_rsvspace -=
-   number;
-   } else {
-   dquots[cnt]->dq_dqb.dqb_curspace -=
-   number;
-   }
+   if (reserve)
+   dquot_free_reserved_space(dquots[cnt],
+ number);
+   else
+   dquot_decr_space(dquots[cnt], number);
spin_unlock(&dquots[cnt]->dq_dqb_lock);
}
spin_unlock(&inode->i_lock);
@@ -1738,7 +1736,7 @@ int dquot_alloc_inode(struct inode *inode)
continue;
/* Back out changes we already did */
spin_lock(&dquots[cnt]->dq_dqb_lock);
-   dquots[cnt]->dq_dqb.dqb_curinodes--;
+   dquot_decr_inodes(dquots[cnt], 1);
spin_unlock(&dquots[cnt]->dq_dqb_lock);
}
goto warn_put_all;
--
2.20.1

Re: [PATCH 1/3] dt-bindings: i2c: add optional mul-value property to binding

2019-04-29 Thread Uwe Kleine-König

On Tue, Apr 30, 2019 at 12:32:40PM +0800, Chuanhua Han wrote:
> NXP Layerscape SoC have up to three MUL options available for all
> divider values, we choice of MUL determines the internal monitor rate
> of the I2C bus (SCL and SDA signals):
> A lower MUL value results in a higher sampling rate of the I2C signals.
> A higher MUL value results in a lower sampling rate of the I2C signals.
> 
> So in Optional properties we added our custom mul-value property in the
> binding to select which mul option for the device tree i2c controller
> node.
> 
> Signed-off-by: Chuanhua Han 
> ---
>  Documentation/devicetree/bindings/i2c/i2c-imx.txt | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/i2c/i2c-imx.txt 
> b/Documentation/devicetree/bindings/i2c/i2c-imx.txt
> index b967544590e8..ba8e7b7b3fa8 100644
> --- a/Documentation/devicetree/bindings/i2c/i2c-imx.txt
> +++ b/Documentation/devicetree/bindings/i2c/i2c-imx.txt
> @@ -18,6 +18,9 @@ Optional properties:
>  - sda-gpios: specify the gpio related to SDA pin
>  - pinctrl: add extra pinctrl to configure i2c pins to gpio function for i2c
>bus recovery, call it "gpio" state
> +- mul-value: NXP Layerscape SoC have up to three MUL options available for
> +all I2C divider values, it describes which MUL we choose to use for the 
> driver,
> +the values should be 1,2,4.

Indention is broken.

I wonder why this needs to be configurable on a per-machine/device
level. What is the trade-off?

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |

Re: [PATCH v2 3/4] dt-bindings: pinctrl: meson: Add drive-strength-uA property

2019-04-29 Thread guillaume La Roque

Hi Martin,

On 4/27/19 9:21 PM, Martin Blumenstingl wrote:
> Hi Guillaume,
>
> On Thu, Apr 18, 2019 at 2:48 PM Guillaume La Roque
>  wrote:
>> Add optional drive-strength-uA property
>>
>> Signed-off-by: Guillaume La Roque 
>> ---
>>  Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt 
>> b/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt
>> index a47dd990a8d3..b3e4be696ddc 100644
>> --- a/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt
>> +++ b/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt
>> @@ -51,6 +51,9 @@ Configuration nodes support the generic properties 
>> "bias-disable",
>>  "bias-pull-up" and "bias-pull-down", described in file
>>  pinctrl-bindings.txt
>>
>> +Optional properties :
>> + - drive-strength-uA: Drive strength for the specified pins in uA.
> if you have to re-send this series for whatever reason then please
> mention that drive-strength-uA is only valid for G12A and newer

thanks for your review, i will do if i send new series.


> otherwise:
> Reviewed-by: Martin Blumenstingl

Re: [PATCH] cpufreq: Fix kobject memleak

2019-04-29 Thread Tobin C. Harding

On Tue, Apr 30, 2019 at 11:35:52AM +0530, Viresh Kumar wrote:
> Currently the error return path from kobject_init_and_add() is not
> followed by a call to kobject_put() - which means we are leaking the
> kobject.
> 
> Fix it by adding a call to kobject_put() in the error path of
> kobject_init_and_add().
> 
> Signed-off-by: Viresh Kumar 
> ---
> Tobin fixed this for schedutil already.

For what its worth:

 Reviewed-by: Tobin C. Harding 

Thanks Viresh, one less for me to do!

Tobin

Re: [PATCH 2/4] mtd: nand: Move ONFI code into nand/ directory

2019-04-29 Thread Miquel Raynal

Hi Shivamurthy,

"Shivamurthy Shastri (sshivamurthy)"  wrote on
Tue, 26 Mar 2019 10:51:56 +:

> Move generic ONFI code to nand/ directory, which can be used by SPI
> NAND layer.
> 
> Signed-off-by: Shivamurthy Shastri 

Reviewed-by: Miquel Raynal 

Thanks,
Miquèl

Re: [PATCH 1/4] mtd: rawnand: Turn the ONFI support to generic

2019-04-29 Thread Miquel Raynal

Hi Shivamurthy,

Sorry for the long delay I was a bit overloaded.

"Shivamurthy Shastri (sshivamurthy)"  wrote on
Tue, 26 Mar 2019 10:51:47 +:

> Fix headers to make way for adding helper functions.
> 
> Add onfi helper structure.
> 
> Add helper functions in raw NAND core, which later will be used during
> ONFI detection.
> 

As you are touching the core, I need to identify clearly each change
you make; typically in this commit you do several different changes.
Please split this patch in small meaningful peaces.

> Signed-off-by: Shivamurthy Shastri 
> ---
>  drivers/mtd/nand/raw/internals.h |   6 +-
>  drivers/mtd/nand/raw/nand_base.c | 236 ---
>  drivers/mtd/nand/raw/nand_onfi.c | 215 +---
>  include/linux/mtd/nand.h |  30 
>  include/linux/mtd/rawnand.h  |   5 +
>  5 files changed, 289 insertions(+), 203 deletions(-)
> 

Thanks,
Miquèl

Re: [PATCH v8] Bluetooth: btqca: inject command complete event during fw download

2019-04-29 Thread kbuild test robot

Hi Matthias,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bluetooth-next/master]
[also build test ERROR on next-20190429]
[cannot apply to v5.1-rc7]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Matthias-Kaehlcke/Bluetooth-btqca-inject-command-complete-event-during-fw-download/20190430-125407
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git 
master
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=xtensa 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   drivers/bluetooth/btqca.c: In function 'qca_inject_cmd_complete_event':
>> drivers/bluetooth/btqca.c:286:18: error: 'QCA_HCI_CC_SUCCESS' undeclared 
>> (first use in this function); did you mean 'QCA_HCI_CC_OPCODE'?
 skb_put_u8(skb, QCA_HCI_CC_SUCCESS);
 ^~
 QCA_HCI_CC_OPCODE
   drivers/bluetooth/btqca.c:286:18: note: each undeclared identifier is 
reported only once for each function it appears in

vim +286 drivers/bluetooth/btqca.c

   267  
   268  static int qca_inject_cmd_complete_event(struct hci_dev *hdev)
   269  {
   270  struct hci_event_hdr *hdr;
   271  struct hci_ev_cmd_complete *evt;
   272  struct sk_buff *skb;
   273  
   274  skb = bt_skb_alloc(sizeof(*hdr) + sizeof(*evt) + 1, GFP_KERNEL);
   275  if (!skb)
   276  return -ENOMEM;
   277  
   278  hdr = skb_put(skb, sizeof(*hdr));
   279  hdr->evt = HCI_EV_CMD_COMPLETE;
   280  hdr->plen = sizeof(*evt) + 1;
   281  
   282  evt = skb_put(skb, sizeof(*evt));
   283  evt->ncmd = 1;
   284  evt->opcode = HCI_OP_NOP;
   285  
 > 286  skb_put_u8(skb, QCA_HCI_CC_SUCCESS);
   287  
   288  hci_skb_pkt_type(skb) = HCI_EVENT_PKT;
   289  
   290  return hci_recv_frame(hdev, skb);
   291  }
   292  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [tip:sched/urgent] sched/cpufreq: Fix kobject memleak

2019-04-29 Thread Tobin C. Harding

On Tue, Apr 30, 2019 at 11:26:27AM +0530, Viresh Kumar wrote:
> On 29-04-19, 22:52, tip-bot for Tobin C. Harding wrote:
> > Commit-ID:  8bf7ab9c79f3d1a5f02ebac369f656de9ec0aca8
> > Gitweb: 
> > https://git.kernel.org/tip/8bf7ab9c79f3d1a5f02ebac369f656de9ec0aca8
> > Author: Tobin C. Harding 
> > AuthorDate: Tue, 30 Apr 2019 10:11:44 +1000
> > Committer:  Ingo Molnar 
> > CommitDate: Tue, 30 Apr 2019 06:24:09 +0200
> > 
> > sched/cpufreq: Fix kobject memleak
> > 
> > Currently the error return path from kobject_init_and_add() is not
> > followed by a call to kobject_put() - which means we are leaking
> > the kobject.
> > 
> > Fix it by adding a call to kobject_put() in the error path of
> > kobject_init_and_add().
> > 
> > Signed-off-by: Tobin C. Harding 
> > Add call to kobject_put() in error path of kobject_init_and_add().
> 
> This should have been present before the signed-off ?

Thanks.  Some face palm fails on this patch.  Its hard to get good help
:)

Tobin

Re: [PATCH v3 3/3] clk: sifive: add a driver for the SiFive FU540 PRCI IP block

2019-04-29 Thread Paul Walmsley

Hi Atish,

On Sat, 27 Apr 2019, Atish Patra wrote:

> On 4/11/19 1:28 AM, Paul Walmsley wrote:
> > Add driver code for the SiFive FU540 PRCI IP block.  This IP block
> > handles reset and clock control for the SiFive FU540 device and
> > implements SoC-level clock tree controls and dividers.

[...]

> > +static const struct of_device_id sifive_fu540_prci_of_match[] = {
> > +   { .compatible = "sifive,fu540-c000-prci", },
> 
> All the existing unleashed devices have prci clock compatible string as
> "sifive,aloeprci0" or "sifive,ux00prci0". Should it be added to maintain
> backward compatibility?

As you note, just adding the old (unreviewed) compatible string isn't 
enough.

> Even after adding the compatible string (just for my testing purpose), I get
> this while booting.
> 
> [0.104571] sifive-fu540-prci 1000.prci: expected only two parent
> clocks, found 1
> [0.112460] sifive-fu540-prci 1000.prci: could not register clocks: -22
> [0.119499] sifive-fu540-prci: probe of 1000.prci failed with error -22
> 
> Looking at the DT entries, your DT patch has
> 
> + prci: clock-controller@1000 {
> + compatible = "sifive,fu540-c000-prci";
> + reg = <0x0 0x1000 0x0 0x1000>;
> + clocks = <&hfclk>, <&rtcclk>;
> + #clock-cells = <1>;
> + };
> 
> 
> while current DT from FSBL
> (https://github.com/sifive/freedom-u540-c000-bootloader/blob/master/fsbl/ux00_fsbl.dts)
> 
> prci: prci@1000 {
>   compatible = "sifive,aloeprci0", "sifive,ux00prci0";
>   reg = <0x0 0x1000 0x0 0x1000>;
>   reg-names = "control";
>   clocks = <&refclk>;
>   #clock-cells = <1>;
>   };
> 
> This seems to be the cause of error. It looks like this patch needs a complete
> different DT (your DT patch) than FSBL provides.

That's right.  That old data was completely out of tree and unreviewed.  
It's part of the reason why we're going through the process of posting DT 
data to the kernel and devicetree lists and getting that data reviewed:

https://lore.kernel.org/linux-riscv/20190411084242.4999-1-paul.walms...@sifive.com/

> This means everybody must upgrade the FSBL to use your DT patch in their
> boards once this driver is merged. Is this okay?

People can continue to use the out-of-tree DT data if they want.  They'll 
just have to continue to patch their kernels to add out-of-tree drivers, 
as they do now.

Otherwise, if people want to use the upstream PRCI driver in the upstream 
kernel, then it's necessary to use DT data that aligns with what's in the 
upstream binding documentation.


- Paul

Re: [SOLVED] PROBLEM: Elan touchpad regression on Kernel 5.0.10

2019-04-29 Thread Outvi V

Hello,

  After a cold restart, this problems seem to be solved automatically on kernel 
5.0.10.

Regards,

On Tue, Apr 30, 2019, at 12:21, Outvi V wrote:
> Hello,
> 
> [1.] One line summary of the problem: Elan touchpad regression on Kernel 
> 5.0.10
> 
> [2.] Full description of the problem/report:
>   Elan touchpad does not work on 5.0.10 while working on 5.0.9
> 
> [3.] Keywords: elan_i2c_core elan i2c touchpad 5.0.10
> 
> [4.] Kernel information
> [4.1.] Kernel version:
>   Linux version 5.0.10-arch1-1-ARCH (builduser@heftig-2592) (gcc 
> version 8.3.0 (GCC)) #1 SMP PREEMPT Sat Apr 27 20:06:45 UTC 2019
> [4.2.] Kernel .config file:
>   I'm not sure, but I think it may be referring to
>   
> https://git.archlinux.org/svntogit/packages.git/tree/trunk/config?h=packages/linux
> [5.] Most recent kernel version which did not have the bug: 5.0.9
> 
> [6.] Output of Oops.. message (if applicable) with symbolic information
>  resolved (Not appliable)
> [7.] A small shell script or example program which triggers the
>  problem: (Not appliable)
> 
> [8.] Environment
> [8.1.] Software (add the output of the ver_linux script here)
>   
> Linux sheltty 5.0.10-arch1-1-ARCH #1 SMP PREEMPT Sat Apr 27 20:06:45 
> UTC 2019 x86_64 GNU/Linux
> 
> GNU C   8.3.0
> GNU Make4.2.1
> Binutils2.32
> Util-linux  2.33.2
> Mount   2.33.2
> Module-init-tools   26
> E2fsprogs   1.45.0
> Jfsutils1.1.15
> Reiserfsprogs   3.6.27
> Xfsprogs4.20.0
> PPP 2.4.7
> Linux C Library 2.29
> Dynamic linker (ldd)2.29
> Linux C++ Library   6.0.25
> Procps  3.3.15
> Kbd 2.0.4
> Console-tools   2.0.4
> Sh-utils8.31
> Udev242
> Modules Loaded  8021q 8250_dw ac ac97_bus acpi_thermal_rel 
> aesni_intel aes_x86_64 agpgart ahci arc4 atkbd battery bbswitch 
> bluetooth btbcm btintel btrtl btusb cfg80211 coretemp crc16 
> crc32c_generic crc32c_intel crc32_pclmul crct10dif_pclmul cryptd 
> crypto_simd crypto_user drm drm_kms_helper ecdh_generic elan_i2c evdev 
> ext4 fat fb_sys_fops fscrypto garp ghash_clmulni_intel glue_helper hid 
> hid_generic i2c_algo_bit i2c_hid i2c_i801 i8042 i915 idma64 input_leds 
> int3400_thermal int3403_thermal int340x_thermal_zone intel_cstate 
> intel_gtt intel_lpss intel_lpss_pci intel_pch_thermal intel_powerclamp 
> intel_rapl intel_rapl_perf intel_soc_dts_iosf intel_uncore 
> intel_wmi_thunderbolt ip_tables irqbypass iTCO_vendor_support iTCO_wdt 
> jbd2 joydev kvm kvmgt kvm_intel ledtrig_audio libahci libata libphy 
> libps2 llc mac80211 mac_hid mbcache mdev media mei mei_me mousedev mrp 
> nls_cp437 nls_iso8859_1 pcc_cpufreq processor_thermal_device r8169 
> r8822be realtek rfkill rng_core scsi_mod serio serio_raw snd 
> snd_compress snd_hda_codec snd_hda_codec_generic snd_hda_codec_hdmi 
> snd_hda_codec_realtek snd_hda_core snd_hda_ext_core snd_hda_intel 
> snd_hwdep snd_pcm snd_pcm_dmaengine snd_soc_acpi 
> snd_soc_acpi_intel_match snd_soc_core snd_soc_hdac_hda snd_soc_skl 
> snd_soc_skl_ipc snd_soc_sst_dsp snd_soc_sst_ipc snd_timer soundcore stp 
> syscopyarea sysfillrect sysimgblt tpm tpm_crb tpm_tis tpm_tis_core 
> typec typec_ucsi ucsi_acpi usbhid uvcvideo vfat vfio vfio_iommu_type1 
> vfio_mdev videobuf2_common videobuf2_memops videobuf2_v4l2 
> videobuf2_vmalloc videodev wmi wmi_bmof x86_pkg_temp_thermal xhci_hcd 
> xhci_pci x_tables
> 
> [8.2.] Processor information (from /proc/cpuinfo): (Maybe not appliable)
> [8.3.] Module information (from /proc/modules): 
> 
> (Parts related to i2c and elan:)
> 
> i2c_algo_bit 16384 1 i915, Live 0x
> i2c_hid 32768 0 - Live 0x
> hid 147456 3 hid_generic,usbhid,i2c_hid, Live 0x
> elan_i2c 49152 0 - Live 0x
> i2c_i801 36864 0 - Live 0x
> 
> [8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
> 
> /proc/ioports:
> - : PCI Bus :00
>   - : dma1
>   - : pic1
>   - : iTCO_wdt
>   - : timer0
>   - : timer1
>   - : keyboard
>   - : PNP0C09:00
> - : EC data
>   - : keyboard
>   - : PNP0C09:00
> - : EC cmd
>   - : rtc0
>   - : dma page reg
>   - : pic2
>   - : dma2
>   - : fpu
> - : PNP0C04:00
>   - : iTCO_wdt
>   - : pnp 00:02
> - : PCI conf1
> - : PCI Bus :00
>   - : pnp 00:02
>   - : pnp 00:00
> - : ACPI PM1a_EVT_BLK
> - : ACPI PM1a_CNT_BLK
> - : ACPI PM_TMR
> - : ACPI CPU throttle
> - : ACPI PM2_CNT_BLK
> - : pnp 00:04
> - : ACPI GPE0_BLK
>   - : pnp 00:01
>   - : PCI Bus :08
> - : :08:00.0
>   -0

Re: [PATCH v3 3/4] Documentation: devicetree: add PPMU events description

2019-04-29 Thread Chanwoo Choi

Hi Lukasz,

On 19. 4. 19. 오후 10:48, Lukasz Luba wrote:
> Extend the documenation by events description with new 'event-data-type'
> field. Add example how the event might be defined in DT.
> 
> Signed-off-by: Lukasz Luba 
> ---
>  .../devicetree/bindings/devfreq/event/exynos-ppmu.txt  | 18 
> ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt 
> b/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt
> index 3e36c1d..47feb5f 100644
> --- a/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt
> +++ b/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt
> @@ -145,3 +145,21 @@ Example3 : PPMUv2 nodes in exynos5433.dtsi are listed 
> below.
>   reg = <0x104d 0x2000>;
>   status = "disabled";
>   };
> +
> +The 'event' type specified in the PPMU node defines 'event-name'
> +which also contains 'id' number and optionally 'event-data-type'.
> +
> +Example:
> +
> + events {
> + ppmu_leftbus_0: ppmu-event0-leftbus {
> + event-name = "ppmu-event0-leftbus";
> + event-data-type = ;
> + };
> + };
> +
> +The 'event-data-type' defines the type of data which shell be counted
> +by the counter. You can check include/dt-bindings/pmu/exynos_ppmu.h for
> +all possible type, i.e. count read requests, count write data in bytes,
> +etc. This field is optional and when it is missing, the driver code will
> +use default data type.
> 

How about editing it as following?

--- a/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt
+++ b/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt
@@ -10,14 +10,23 @@ The Exynos PPMU driver uses the devfreq-event class to 
provide event data
 to various devfreq devices. The devfreq devices would use the event data when
 derterming the current state of each IP.
 
-Required properties:
+Required properties for PPMU device:
 - compatible: Should be "samsung,exynos-ppmu" or "samsung,exynos-ppmu-v2.
 - reg: physical base address of each PPMU and length of memory mapped region.
 
-Optional properties:
+Optional properties for PPMU device:
 - clock-names : the name of clock used by the PPMU, "ppmu"
 - clocks : phandles for clock specified in "clock-names" property
 
+Required properties for 'events' child node of PPMU device:
+- event-name : the unique event name among PPMU device
+Optional properties for 'events' child node of PPMU device:
+- event-data-type : Define the type of data which shell be counted
+by the counter. You can check include/dt-bindings/pmu/exynos_ppmu.h for
+all possible type, i.e. count read requests, count write data in bytes,
+etc. This field is optional and when it is missing, the driver code
+will use default data type.
+
 Example1 : PPMUv1 nodes in exynos3250.dtsi are listed below.
 
ppmu_dmc0: ppmu_dmc0@106a {
@@ -145,3 +154,16 @@ Example3 : PPMUv2 nodes in exynos5433.dtsi are listed 
below.
reg = <0x104d 0x2000>;
status = "disabled";
};
+
+Example4 : 'event-data-type' in exynos4412-ppmu-common.dtsi are listed below.
+
+   &ppmu_dmc0 {
+   status = "okay";
+   events {
+   ppmu_dmc0_3: ppmu-event3-dmc0 {
+   event-name = "ppmu-event3-dmc0";
+   event-data-type = <(PPMU_RO_DATA_CNT |
+   PPMU_WO_DATA_CNT)>;
+   };
+   };
+   };


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

[PATCH] ALSA: hda: check RIRB to avoid use NULL pointer

2019-04-29 Thread Song liwei

From: Liwei Song 

Fix the following BUG:

BUG: unable to handle kernel NULL pointer dereference at 000c
Workqueue: events azx_probe_work [snd_hda_intel]
RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core]
Call Trace:
 
 azx_interrupt+0x78/0x140 [snd_hda_codec]
 __handle_irq_event_percpu+0x49/0x300
 handle_irq_event_percpu+0x23/0x60
 handle_irq_event+0x3c/0x60
 handle_edge_irq+0xdb/0x180
 handle_irq+0x23/0x30
 do_IRQ+0x6a/0x140
 common_interrupt+0xf/0xf

The Call Trace happened when run kdump on a NFS rootfs system.
Exist the following calling sequence when boot the second kernel:

azx_first_init()
   --> azx_acquire_irq()
  <-- interrupt come in, azx_interrupt() was called
   --> hda_intel_init_chip()
  --> azx_init_chip()
 --> snd_hdac_bus_init_chip()
  --> snd_hdac_bus_init_cmd_io();
--> init rirb.buf and corb.buf

Interrupt happened after azx_acquire_irq() while RIRB still didn't got
initialized, then NULL pointer will be used when process the interrupt.

Check the value of RIRB to ensure it is not NULL, to aviod some special
case may hang the system.

Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core 
module")
Signed-off-by: Liwei Song 
---
 sound/hda/hdac_controller.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sound/hda/hdac_controller.c b/sound/hda/hdac_controller.c
index 74244d8e2909..2f0fa5353361 100644
--- a/sound/hda/hdac_controller.c
+++ b/sound/hda/hdac_controller.c
@@ -195,6 +195,9 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus)
return;
bus->rirb.wp = wp;
 
+   if (!bus->rirb.buf)
+   return;
+
while (bus->rirb.rp != wp) {
bus->rirb.rp++;
bus->rirb.rp %= AZX_MAX_RIRB_ENTRIES;
-- 
2.7.4

Re: [PATCH 0/7] introduce cpu.headroom knob to cpu controller

2019-04-29 Thread Song Liu




> On Apr 29, 2019, at 8:24 AM, Vincent Guittot  
> wrote:
> 
> Hi Song,
> 
> On Sun, 28 Apr 2019 at 21:47, Song Liu  wrote:
>> 
>> Hi Morten and Vincent,
>> 
>>> On Apr 22, 2019, at 6:22 PM, Song Liu  wrote:
>>> 
>>> Hi Vincent,
>>> 
 On Apr 17, 2019, at 5:56 AM, Vincent Guittot  
 wrote:
 
 On Wed, 10 Apr 2019 at 21:43, Song Liu  wrote:
> 
> Hi Morten,
> 
>> On Apr 10, 2019, at 4:59 AM, Morten Rasmussen  
>> wrote:
>> 
 
>> 
>> The bit that isn't clear to me, is _why_ adding idle cycles helps your
>> workload. I'm not convinced that adding headroom gives any latency
>> improvements beyond watering down the impact of your side jobs. AFAIK,
> 
> We think the latency improvements actually come from watering down the
> impact of side jobs. It is not just statistically improving average
> latency numbers, but also reduces resource contention caused by the side
> workload. I don't know whether it is from reducing contention of ALUs,
> memory bandwidth, CPU caches, or something else, but we saw reduced
> latencies when headroom is used.
> 
>> the throttling mechanism effectively removes the throttled tasks from
>> the schedule according to a specific duty cycle. When the side job is
>> not throttled the main workload is experiencing the same latency issues
>> as before, but by dynamically tuning the side job throttling you can
>> achieve a better average latency. Am I missing something?
>> 
>> Have you looked at your distribution of main job latency and tried to
>> compare with when throttling is active/not active?
> 
> cfs_bandwidth adjusts allowed runtime for each task_group each period
> (configurable, 100ms by default). cpu.headroom logic applies gentle
> throttling, so that the side workload gets some runtime in every period.
> Therefore, if we look at time window equal to or bigger than 100ms, we
> don't really see "throttling active time" vs. "throttling inactive time".
> 
>> 
>> I'm wondering if the headroom solution is really the right solution for
>> your use-case or if what you are really after is something which is
>> lower priority than just setting the weight to 1. Something that
> 
> The experiments show that, cpu.weight does proper work for priority: the
> main workload gets priority to use the CPU; while the side workload only
> fill the idle CPU. However, this is not sufficient, as the side workload
> creates big enough contention to impact the main workload.
> 
>> (nearly) always gets pre-empted by your main job (SCHED_BATCH and
>> SCHED_IDLE might not be enough). If your main job consist
>> of lots of relatively short wake-ups things like the min_granularity
>> could have significant latency impact.
> 
> cpu.headroom gives benefits in addition to optimizations in pre-empt
> side. By maintaining some idle time, fewer pre-empt actions are
> necessary, thus the main workload will get better latency.
 
 I agree with Morten's proposal, SCHED_IDLE should help your latency
 problem because side job will be directly preempted unlike normal cfs
 task even lowest priority.
 In addition to min_granularity, sched_period also has an impact on the
 time that a task has to wait before preempting the running task. Also,
 some sched_feature like GENTLE_FAIR_SLEEPERS can also impact the
 latency of a task.
 
 It would be nice to know if the latency problem comes from contention
 on cache resources or if it's mainly because you main load waits
 before running on a CPU
 
 Regards,
 Vincent
>>> 
>>> Thanks for these suggestions. Here are some more tests to show the impact
>>> of scheduler knobs and cpu.headroom.
>>> 
>>> side-load | cpu.headroom | side/cpu.weight | min_gran | cpu-idle | 
>>> main/latency
>>> 
>>> none|  0   | n/a |1 ms  |  45.20%  |   1.00
>>> ffmpeg   |  0   |  1  |   10 ms  |   3.38%  |   1.46
>>> ffmpeg   |  0   |   SCHED_IDLE|1 ms  |   5.69%  |   1.42
>>> ffmpeg   |20%   |   SCHED_IDLE|1 ms  |  19.00%  |   1.13
>>> ffmpeg   |30%   |   SCHED_IDLE|1 ms  |  27.60%  |   1.08
>>> 
>>> In all these cases, the main workload is loaded with same level of
>>> traffic (request per second). Main workload latency numbers are normalized
>>> based on the baseline (first row).
>>> 
>>> For the baseline, the main workload runs without any side workload, the
>>> system has about 45.20% idle CPU.
>>> 
>>> The next two rows compare the impact of scheduling knobs cpu.weight and
>>> sched_min_granularity. With cpu.weight of 1 and min_granularity of 10ms,
>>> we see a latency of 1.46; with SCHED_IDLE and min_granularity of 1ms, we
>>> see a latency of 1.42. So

Re: [PATCH v3 4/4] DT: arm: exynos4412: add event data type which is monitored

2019-04-29 Thread Chanwoo Choi

Hi,

On 19. 4. 19. 오후 10:48, Lukasz Luba wrote:
> The patch adds new field in the PPMU event which shows explicitly
> what kind of data the event is monitoring. It is possible to change it
> using defined values in exynos_ppmu.h file.
> 
> Signed-off-by: Lukasz Luba 
> ---
>  arch/arm/boot/dts/exynos4412-ppmu-common.dtsi | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/exynos4412-ppmu-common.dtsi 
> b/arch/arm/boot/dts/exynos4412-ppmu-common.dtsi
> index 3a3b2fa..549faba 100644
> --- a/arch/arm/boot/dts/exynos4412-ppmu-common.dtsi
> +++ b/arch/arm/boot/dts/exynos4412-ppmu-common.dtsi
> @@ -6,12 +6,16 @@
>   * Author: Chanwoo Choi 
>   */
>  
> +#include 
> +
>  &ppmu_dmc0 {
> status = "okay";
>  
> events {
>  ppmu_dmc0_3: ppmu-event3-dmc0 {
>  event-name = "ppmu-event3-dmc0";
> +event-data-type = <(PPMU_RO_DATA_CNT |
> +PPMU_WO_DATA_CNT)>;
>  };
> };
>  };
> @@ -22,6 +26,8 @@
> events {
>  ppmu_dmc1_3: ppmu-event3-dmc1 {
>  event-name = "ppmu-event3-dmc1";
> +event-data-type = <(PPMU_RO_DATA_CNT |
> +PPMU_WO_DATA_CNT)>;
>  };
> };
>  };
> @@ -32,6 +38,8 @@
> events {
>  ppmu_leftbus_3: ppmu-event3-leftbus {
>  event-name = "ppmu-event3-leftbus";
> +event-data-type = <(PPMU_RO_DATA_CNT |
> +PPMU_WO_DATA_CNT)>;
>  };
> };
>  };
> @@ -42,6 +50,8 @@
> events {
>  ppmu_rightbus_3: ppmu-event3-rightbus {
>  event-name = "ppmu-event3-rightbus";
> +event-data-type = <(PPMU_RO_DATA_CNT |
> +PPMU_WO_DATA_CNT)>;
>  };
> };
>  };
> 

Acked-by: Chanwoo Choi 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

[PATCH] ARM: dts: dra76x: Update MMC2_HS200_MANUAL1 iodelay values

2019-04-29 Thread Faiz Abbas

Update the MMC2_HS200_MANUAL1 iodelay values to match with the latest
dra76x data manual[1].

Also this particular pinctrl-array is using spaces instead of tabs for
spacing between the values and the comments. Fix this as well.

[1] http://www.ti.com/lit/ds/symlink/dra76p.pdf

Signed-off-by: Faiz Abbas 
---

Tested on dra76x-evm and am574x-idk.

 arch/arm/boot/dts/dra76x-mmc-iodelay.dtsi | 40 +++
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/arm/boot/dts/dra76x-mmc-iodelay.dtsi 
b/arch/arm/boot/dts/dra76x-mmc-iodelay.dtsi
index baba7b00eca7..fdca48186916 100644
--- a/arch/arm/boot/dts/dra76x-mmc-iodelay.dtsi
+++ b/arch/arm/boot/dts/dra76x-mmc-iodelay.dtsi
@@ -22,7 +22,7 @@
  *
  * Datamanual Revisions:
  *
- * DRA76x Silicon Revision 1.0: SPRS993A, Revised July 2017
+ * DRA76x Silicon Revision 1.0: SPRS993E, Revised December 2018
  *
  */
 
@@ -169,25 +169,25 @@
/* Corresponds to MMC2_HS200_MANUAL1 in datamanual */
mmc2_iodelay_hs200_conf: mmc2_iodelay_hs200_conf {
pinctrl-pin-array = <
-   0x190 A_DELAY_PS(384) G_DELAY_PS(0)   /* 
CFG_GPMC_A19_OEN */
-   0x194 A_DELAY_PS(0) G_DELAY_PS(174)   /* 
CFG_GPMC_A19_OUT */
-   0x1a8 A_DELAY_PS(410) G_DELAY_PS(0)   /* 
CFG_GPMC_A20_OEN */
-   0x1ac A_DELAY_PS(85) G_DELAY_PS(0)/* 
CFG_GPMC_A20_OUT */
-   0x1b4 A_DELAY_PS(468) G_DELAY_PS(0)   /* 
CFG_GPMC_A21_OEN */
-   0x1b8 A_DELAY_PS(139) G_DELAY_PS(0)   /* 
CFG_GPMC_A21_OUT */
-   0x1c0 A_DELAY_PS(676) G_DELAY_PS(0)   /* 
CFG_GPMC_A22_OEN */
-   0x1c4 A_DELAY_PS(69) G_DELAY_PS(0)/* 
CFG_GPMC_A22_OUT */
-   0x1d0 A_DELAY_PS(1062) G_DELAY_PS(154)/* 
CFG_GPMC_A23_OUT */
-   0x1d8 A_DELAY_PS(640) G_DELAY_PS(0)   /* 
CFG_GPMC_A24_OEN */
-   0x1dc A_DELAY_PS(0) G_DELAY_PS(0) /* 
CFG_GPMC_A24_OUT */
-   0x1e4 A_DELAY_PS(356) G_DELAY_PS(0)   /* 
CFG_GPMC_A25_OEN */
-   0x1e8 A_DELAY_PS(0) G_DELAY_PS(0) /* 
CFG_GPMC_A25_OUT */
-   0x1f0 A_DELAY_PS(579) G_DELAY_PS(0)   /* 
CFG_GPMC_A26_OEN */
-   0x1f4 A_DELAY_PS(0) G_DELAY_PS(0) /* 
CFG_GPMC_A26_OUT */
-   0x1fc A_DELAY_PS(435) G_DELAY_PS(0)   /* 
CFG_GPMC_A27_OEN */
-   0x200 A_DELAY_PS(36) G_DELAY_PS(0)/* 
CFG_GPMC_A27_OUT */
-   0x364 A_DELAY_PS(759) G_DELAY_PS(0)   /* 
CFG_GPMC_CS1_OEN */
-   0x368 A_DELAY_PS(72) G_DELAY_PS(0)/* 
CFG_GPMC_CS1_OUT */
+   0x190 A_DELAY_PS(384) G_DELAY_PS(0) /* 
CFG_GPMC_A19_OEN */
+   0x194 A_DELAY_PS(350) G_DELAY_PS(174)   /* 
CFG_GPMC_A19_OUT */
+   0x1a8 A_DELAY_PS(410) G_DELAY_PS(0) /* 
CFG_GPMC_A20_OEN */
+   0x1ac A_DELAY_PS(335) G_DELAY_PS(0) /* 
CFG_GPMC_A20_OUT */
+   0x1b4 A_DELAY_PS(468) G_DELAY_PS(0) /* 
CFG_GPMC_A21_OEN */
+   0x1b8 A_DELAY_PS(339) G_DELAY_PS(0) /* 
CFG_GPMC_A21_OUT */
+   0x1c0 A_DELAY_PS(676) G_DELAY_PS(0) /* 
CFG_GPMC_A22_OEN */
+   0x1c4 A_DELAY_PS(219) G_DELAY_PS(0) /* 
CFG_GPMC_A22_OUT */
+   0x1d0 A_DELAY_PS(1062) G_DELAY_PS(154)  /* 
CFG_GPMC_A23_OUT */
+   0x1d8 A_DELAY_PS(640) G_DELAY_PS(0) /* 
CFG_GPMC_A24_OEN */
+   0x1dc A_DELAY_PS(150) G_DELAY_PS(0) /* 
CFG_GPMC_A24_OUT */
+   0x1e4 A_DELAY_PS(356) G_DELAY_PS(0) /* 
CFG_GPMC_A25_OEN */
+   0x1e8 A_DELAY_PS(150) G_DELAY_PS(0) /* 
CFG_GPMC_A25_OUT */
+   0x1f0 A_DELAY_PS(579) G_DELAY_PS(0) /* 
CFG_GPMC_A26_OEN */
+   0x1f4 A_DELAY_PS(200) G_DELAY_PS(0) /* 
CFG_GPMC_A26_OUT */
+   0x1fc A_DELAY_PS(435) G_DELAY_PS(0) /* 
CFG_GPMC_A27_OEN */
+   0x200 A_DELAY_PS(236) G_DELAY_PS(0) /* 
CFG_GPMC_A27_OUT */
+   0x364 A_DELAY_PS(759) G_DELAY_PS(0) /* 
CFG_GPMC_CS1_OEN */
+   0x368 A_DELAY_PS(372) G_DELAY_PS(0) /* 
CFG_GPMC_CS1_OUT */
  >;
};
 
-- 
2.19.2

[PATCH] cpufreq: Fix kobject memleak

2019-04-29 Thread Viresh Kumar

Currently the error return path from kobject_init_and_add() is not
followed by a call to kobject_put() - which means we are leaking the
kobject.

Fix it by adding a call to kobject_put() in the error path of
kobject_init_and_add().

Signed-off-by: Viresh Kumar 
---
Tobin fixed this for schedutil already.

 drivers/cpufreq/cpufreq.c  | 1 +
 drivers/cpufreq/cpufreq_governor.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index e10922709d13..bbf79544d0ad 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1098,6 +1098,7 @@ static struct cpufreq_policy 
*cpufreq_policy_alloc(unsigned int cpu)
   cpufreq_global_kobject, "policy%u", cpu);
if (ret) {
pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret);
+   kobject_put(&policy->kobj);
goto err_free_real_cpus;
}
 
diff --git a/drivers/cpufreq/cpufreq_governor.c 
b/drivers/cpufreq/cpufreq_governor.c
index ffa9adeaba31..9d1d9bf02710 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -459,6 +459,8 @@ int cpufreq_dbs_governor_init(struct cpufreq_policy *policy)
/* Failure, so roll back. */
pr_err("initialization failed (dbs_data kobject init error %d)\n", ret);
 
+   kobject_put(&dbs_data->attr_set.kobj);
+
policy->governor_data = NULL;
 
if (!have_governor_per_policy())
-- 
2.21.0.rc0.269.g1a574e7a288b

Re: [PATCH v7 11/14] irqchip: ti-sci-inta: Add support for Interrupt Aggregator driver

2019-04-29 Thread Lokesh Vutla




On 29/04/19 6:41 PM, Marc Zyngier wrote:
> On 20/04/2019 11:09, Lokesh Vutla wrote:
>> Texas Instruments' K3 generation SoCs has an IP Interrupt Aggregator
>> which is an interrupt controller that does the following:
>> - Converts events to interrupts that can be understood by
>>   an interrupt router.
>> - Allows for multiplexing of events to interrupts.
>>
>> Configuration of the interrupt aggregator registers can only be done by
>> a system co-processor and the driver needs to send a message to this
>> co processor over TISCI protocol. This patch adds support for Interrupt
>> Aggregator irqdomain.
>>
>> Signed-off-by: Lokesh Vutla 
>> ---
>> Changes since v6:
>> - Updated commit message.
>> - Arranged header files in alphabetical order
>> - Included vint_bit in struct ti_sci_inta_event_desc
>> - With the above change now the chip_data is event_desc instead of vint_desc
>> - No loops are used in atomic contexts.
>> - Fixed locking issue while freeing parent virq
>> - Fixed few other cosmetic changes.
>>
>>  MAINTAINERS   |   1 +
>>  drivers/irqchip/Kconfig   |  11 +
>>  drivers/irqchip/Makefile  |   1 +
>>  drivers/irqchip/irq-ti-sci-inta.c | 589 ++
>>  4 files changed, 602 insertions(+)
>>  create mode 100644 drivers/irqchip/irq-ti-sci-inta.c
>>
> 
> [...]
> 
>> +/**
>> + * ti_sci_inta_alloc_irq() -  Allocate an irq within INTA domain
>> + * @domain: irq_domain pointer corresponding to INTA
>> + * @hwirq:  hwirq of the input event
>> + *
>> + * Note: Allocation happens in the following manner:
>> + *  - Find a free bit available in any of the vints available in the list.
>> + *  - If not found, allocate a vint from the vint pool
>> + *  - Attach the free bit to input hwirq.
>> + * Return event_desc if all went ok else appropriate error value.
>> + */
>> +static struct ti_sci_inta_event_desc *ti_sci_inta_alloc_irq(struct 
>> irq_domain *domain,
>> +u32 hwirq)
>> +{
>> +struct ti_sci_inta_irq_domain *inta = domain->host_data;
>> +struct ti_sci_inta_vint_desc *vint_desc = NULL;
>> +u16 free_bit;
>> +
>> +mutex_lock(&inta->vint_mutex);
>> +list_for_each_entry(vint_desc, &inta->vint_list, list) {
>> +mutex_lock(&vint_desc->event_mutex);
>> +free_bit = find_first_zero_bit(vint_desc->event_map,
>> +   MAX_EVENTS_PER_VINT);
>> +if (free_bit != MAX_EVENTS_PER_VINT) {
>> +set_bit(free_bit, vint_desc->event_map);
>> +mutex_unlock(&vint_desc->event_mutex);
>> +mutex_unlock(&inta->vint_mutex);
>> +goto alloc_event;
>> +}
>> +mutex_unlock(&vint_desc->event_mutex);
>> +}
>> +mutex_unlock(&inta->vint_mutex);
>> +
>> +/* No free bits available. Allocate a new vint */
>> +vint_desc = ti_sci_inta_alloc_parent_irq(domain);
>> +if (IS_ERR(vint_desc))
>> +return ERR_PTR(PTR_ERR(vint_desc));
>> +
>> +mutex_lock(&vint_desc->event_mutex);
>> +free_bit = find_first_zero_bit(vint_desc->event_map,
>> +   MAX_EVENTS_PER_VINT);
>> +set_bit(free_bit, vint_desc->event_map);
>> +mutex_unlock(&vint_desc->event_mutex);
> 
> This code is still quite racy: you can have two parallel allocations
> failing to get a free bit in any of the already allocated vint_desc, and
> then both allocating a new vint_desc. If there was only one left, one of
> the allocation will fail despite having at least 63 free interrupts.

Good point. After thinking a bit more, I saw similar issue when two parallel
frees happens on a vint with only 2 bits allocated. First free when freeing
parent_irq might see all the bits cleared and does kfree(vint). Then second free
will crash when freeing parent irq.

Ill guard the entire allocation and freeing with vint_mutex and drop the
event_mutex altogether.

Thanks and regards,
Lokesh

> 
>   M.
>

[tip:sched/urgent] sched/cpufreq: Fix kobject memleak

2019-04-29 Thread tip-bot for Tobin C. Harding

Commit-ID:  9a4f26cc98d81b67ecc23b890c28e2df324e29f3
Gitweb: https://git.kernel.org/tip/9a4f26cc98d81b67ecc23b890c28e2df324e29f3
Author: Tobin C. Harding 
AuthorDate: Tue, 30 Apr 2019 10:11:44 +1000
Committer:  Ingo Molnar 
CommitDate: Tue, 30 Apr 2019 07:57:23 +0200

sched/cpufreq: Fix kobject memleak

Currently the error return path from kobject_init_and_add() is not
followed by a call to kobject_put() - which means we are leaking
the kobject.

Fix it by adding a call to kobject_put() in the error path of
kobject_init_and_add().

Signed-off-by: Tobin C. Harding 
Cc: Greg Kroah-Hartman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rafael J. Wysocki 
Cc: Thomas Gleixner 
Cc: Tobin C. Harding 
Cc: Vincent Guittot 
Cc: Viresh Kumar 
Link: http://lkml.kernel.org/r/20190430001144.24890-1-to...@kernel.org
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cpufreq_schedutil.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 5c41ea367422..3638d2377e3c 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -771,6 +771,7 @@ out:
return 0;
 
 fail:
+   kobject_put(&tunables->attr_set.kobj);
policy->governor_data = NULL;
sugov_tunables_free(tunables);

Re: [PATCH v3 1/3] clk: analogbits: add Wide-Range PLL library

2019-04-29 Thread Paul Walmsley

On Mon, 29 Apr 2019, Stephen Boyd wrote:

> Quoting Paul Walmsley (2019-04-29 12:42:07)
> > On Fri, 26 Apr 2019, Paul Walmsley wrote:
> > > On Fri, 26 Apr 2019, Stephen Boyd wrote:
> > > 
> > > > Quoting Paul Walmsley (2019-04-11 01:27:32)
> > > > > Add common library code for the Analog Bits Wide-Range PLL (WRPLL) IP
> > > > > block, as implemented in TSMC CLN28HPC.
> > > > 
> > > > I haven't deeply reviewed at all, but I already get two problems when
> > > > compile testing these patches. I can fix them up if nothing else needs
> > > > fixing.
> > > > 
> > > > drivers/clk/analogbits/wrpll-cln28hpc.c:165 __wrpll_calc_divq() warn: 
> > > > should 'target_rate << divq' be a 64 bit type?
> > > > drivers/clk/sifive/fu540-prci.c:214:16: error: return expression in 
> > > > void function
> > > 
> > > Hmm, that's odd.  I will definitely take a look and repost.
> > 
> > I'm not able to reproduce these problems.  The configs tried here were:
> > 
> > - 64-bit RISC-V defconfig w/ PRCI driver enabled (gcc 8.2.0 built with 
> >   crosstool-NG 1.24.0)
> > 
> > - 32-bit ARM defconfig w/ PRCI driver enabled (gcc 8.3.0 built with 
> >   crosstool-NG 1.24.0)
> > 
> > - 32-bit i386 defconfig w/ PRCI driver enabled (gcc 
> >   5.4.0-6ubuntu1~16.04.11)
> > 
> > Could you post the toolchain and kernel config you're using?
> > 
> 
> I'm running sparse and smatch too.

OK.  I was able to reproduce the __wrpll_calc_divq() warning.  It's been 
resolved in the upcoming revision.  

But I don't see the second error with either sparse or smatch.  (This is 
with sparse at commit 2b96cd804dc7 and smatch at commit f0092daff69d.)


- Paul

Re: [tip:sched/urgent] sched/cpufreq: Fix kobject memleak

2019-04-29 Thread Viresh Kumar

On 29-04-19, 22:52, tip-bot for Tobin C. Harding wrote:
> Commit-ID:  8bf7ab9c79f3d1a5f02ebac369f656de9ec0aca8
> Gitweb: 
> https://git.kernel.org/tip/8bf7ab9c79f3d1a5f02ebac369f656de9ec0aca8
> Author: Tobin C. Harding 
> AuthorDate: Tue, 30 Apr 2019 10:11:44 +1000
> Committer:  Ingo Molnar 
> CommitDate: Tue, 30 Apr 2019 06:24:09 +0200
> 
> sched/cpufreq: Fix kobject memleak
> 
> Currently the error return path from kobject_init_and_add() is not
> followed by a call to kobject_put() - which means we are leaking
> the kobject.
> 
> Fix it by adding a call to kobject_put() in the error path of
> kobject_init_and_add().
> 
> Signed-off-by: Tobin C. Harding 
> Add call to kobject_put() in error path of kobject_init_and_add().

This should have been present before the signed-off ?

> Cc: Greg Kroah-Hartman 
> Cc: Linus Torvalds 
> Cc: Peter Zijlstra 
> Cc: Rafael J. Wysocki 
> Cc: Thomas Gleixner 
> Cc: Tobin C. Harding 
> Cc: Vincent Guittot 
> Cc: Viresh Kumar 
> Link: http://lkml.kernel.org/r/20190430001144.24890-1-to...@kernel.org
> Signed-off-by: Ingo Molnar 
> ---
>  kernel/sched/cpufreq_schedutil.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c 
> b/kernel/sched/cpufreq_schedutil.c
> index 5c41ea367422..3638d2377e3c 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -771,6 +771,7 @@ out:
>   return 0;
>  
>  fail:
> + kobject_put(&tunables->attr_set.kobj);
>   policy->governor_data = NULL;
>   sugov_tunables_free(tunables);
>  

-- 
viresh

Re: linux-next: build warning after merge of the clk tree

2019-04-29 Thread Stephen Rothwell

Hi Anson,

On Tue, 30 Apr 2019 01:44:58 + Anson Huang  wrote:
>
>   Thanks for notice.
>   As it is intentional, I will send out a patch to add "/* fall through 
> */" to avoid this build warning,

Excellent, thanks.

-- 
Cheers,
Stephen Rothwell


pgpWOKjnAq9zo.pgp
Description: OpenPGP digital signature

[tip:sched/urgent] sched/cpufreq: Fix kobject memleak

2019-04-29 Thread tip-bot for Tobin C. Harding

Commit-ID:  8bf7ab9c79f3d1a5f02ebac369f656de9ec0aca8
Gitweb: https://git.kernel.org/tip/8bf7ab9c79f3d1a5f02ebac369f656de9ec0aca8
Author: Tobin C. Harding 
AuthorDate: Tue, 30 Apr 2019 10:11:44 +1000
Committer:  Ingo Molnar 
CommitDate: Tue, 30 Apr 2019 06:24:09 +0200

sched/cpufreq: Fix kobject memleak

Currently the error return path from kobject_init_and_add() is not
followed by a call to kobject_put() - which means we are leaking
the kobject.

Fix it by adding a call to kobject_put() in the error path of
kobject_init_and_add().

Signed-off-by: Tobin C. Harding 
Add call to kobject_put() in error path of kobject_init_and_add().
Cc: Greg Kroah-Hartman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rafael J. Wysocki 
Cc: Thomas Gleixner 
Cc: Tobin C. Harding 
Cc: Vincent Guittot 
Cc: Viresh Kumar 
Link: http://lkml.kernel.org/r/20190430001144.24890-1-to...@kernel.org
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cpufreq_schedutil.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 5c41ea367422..3638d2377e3c 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -771,6 +771,7 @@ out:
return 0;
 
 fail:
+   kobject_put(&tunables->attr_set.kobj);
policy->governor_data = NULL;
sugov_tunables_free(tunables);

Re: [PATCH] RISC-V: Add an Image header that boot loader can parse.

2019-04-29 Thread Atish Patra


On 4/29/19 4:40 PM, Palmer Dabbelt wrote:

On Tue, 23 Apr 2019 16:25:06 PDT (-0700), atish.pa...@wdc.com wrote:

Currently, last stage boot loaders such as U-Boot can accept only
uImage which is an unnecessary additional step in automating boot flows.

Add a simple image header that boot loaders can parse and directly
load kernel flat Image. The existing booting methods will continue to
work as it is.

Tested on both QEMU and HiFive Unleashed using OpenSBI + U-Boot + Linux.

Signed-off-by: Atish Patra 
---
  arch/riscv/include/asm/image.h | 32 
  arch/riscv/kernel/head.S   | 28 
  2 files changed, 60 insertions(+)
  create mode 100644 arch/riscv/include/asm/image.h

diff --git a/arch/riscv/include/asm/image.h b/arch/riscv/include/asm/image.h
new file mode 100644
index ..76a7e0d4068a
--- /dev/null
+++ b/arch/riscv/include/asm/image.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_IMAGE_H
+#define __ASM_IMAGE_H
+
+#define RISCV_IMAGE_MAGIC  "RISCV"
+
+#ifndef __ASSEMBLY__
+/*
+ * struct riscv_image_header - riscv kernel image header
+ *
+ * @code0: Executable code
+ * @code1: Executable code
+ * @text_offset:   Image load offset
+ * @image_size:Effective Image size
+ * @reserved:  reserved
+ * @magic: Magic number
+ * @reserved:  reserved
+ */
+
+struct riscv_image_header {
+   u32 code0;
+   u32 code1;
+   u64 text_offset;
+   u64 image_size;
+   u64 res1;
+   u64 magic;
+   u32 res2;
+   u32 res3;
+};


I don't want to invent our own file format.  Is there a reason we can't just
use something standard?  Off the top of my head I can think of ELF files and
multiboot.



Additional header is required to accommodate PE header format. 
Currently, this is only used for booti command but it will be reused for 
EFI headers as well. Linux kernel Image can pretend as an EFI 
application if PE/COFF header is present. This removes the need of an 
explicit EFI boot loader and EFI firmware can directly load Linux 
(obviously after EFI stub implementation for RISC-V).


ARM64 follows the similar header format as well.
https://www.kernel.org/doc/Documentation/arm64/booting.txt

Regards,
Atish


+#endif /* __ASSEMBLY__ */
+#endif /* __ASM_IMAGE_H */
diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index fe884cd69abd..154647395601 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -19,9 +19,37 @@
  #include 
  #include 
  #include 
+#include 

  __INIT
  ENTRY(_start)
+   /*
+* Image header expected by Linux boot-loaders. The image header data
+* structure is described in asm/image.h.
+* Do not modify it without modifying the structure and all bootloaders
+* that expects this header format!!
+*/
+   /* jump to start kernel */
+   j _start_kernel
+   /* reserved */
+   .word 0
+   .balign 8
+#if __riscv_xlen == 64
+   /* Image load offset(2MB) from start of RAM */
+   .dword 0x20
+#else
+   /* Image load offset(4MB) from start of RAM */
+   .dword 0x40
+#endif
+   /* Effective size of kernel image */
+   .dword _end - _start
+   .dword 0
+   .asciz RISCV_IMAGE_MAGIC
+   .word 0
+   .word 0
+
+.global _start_kernel
+_start_kernel:
/* Mask all interrupts */
csrw sie, zero


___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: sh4-linux-gnu-ld: arch/sh/kernel/cpu/sh2/clock-sh7619.o:undefined reference to `followparent_recalc'

2019-04-29 Thread Randy Dunlap

On 4/29/19 9:48 PM, kbuild test robot wrote:
> Hi Randy,
> 
> It's probably a bug fix that unveils the link errors.

Yoshinori Sato (cc-ed) has a patch for this.  I guess that it's not in the 
arch/sh
git tree yet ???  or wherever arch/sh changes come from.



> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> master
> head:   83a50840e72a5a964b4704fcdc2fbb2d771015ab
> commit: acaf892ecbf5be7710ae05a61fd43c668f68ad95 sh: fix multiple function 
> definition build errors
> date:   3 weeks ago
> config: sh-allmodconfig (attached as .config)
> compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> git checkout acaf892ecbf5be7710ae05a61fd43c668f68ad95
> # save the attached .config to linux build tree
> GCC_VERSION=7.2.0 make.cross ARCH=sh 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot 
> 
> All errors (new ones prefixed by >>):
> 
>>> sh4-linux-gnu-ld: arch/sh/kernel/cpu/sh2/clock-sh7619.o:(.data+0x1c): 
>>> undefined reference to `followparent_recalc'
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
> 


-- 
~Randy

Re: [PATCH 7/7] dmaengine: sprd: Add interrupt support for 2-stage transfer

2019-04-29 Thread Baolin Wang

On Mon, 29 Apr 2019 at 22:10, Vinod Koul  wrote:
>
> On 29-04-19, 20:11, Baolin Wang wrote:
> > On Mon, 29 Apr 2019 at 20:01, Vinod Koul  wrote:
> > > On 15-04-19, 20:15, Baolin Wang wrote:
>
> > > > @@ -429,6 +433,9 @@ static int sprd_dma_set_2stage_config(struct 
> > > > sprd_dma_chn *schan)
> > > >   val = chn & SPRD_DMA_GLB_SRC_CHN_MASK;
> > > >   val |= BIT(schan->trg_mode - 1) << 
> > > > SPRD_DMA_GLB_TRG_OFFSET;
> > > >   val |= SPRD_DMA_GLB_2STAGE_EN;
> > > > + if (schan->int_type != SPRD_DMA_NO_INT)
> > >
> > > Who configure int_type?
> >
> > The int_type is configured through the flags of
> > sprd_dma_prep_slave_sg() by users, see:
> > https://elixir.bootlin.com/linux/v5.1-rc6/source/include/linux/dma/sprd-dma.h#L9
>
> Please use DMA_PREP_INTERRUPT flag instead!

We can not use DMA_PREP_INTERRUPT flag, since we have some Spreadtrum
specific DMA interrupt flags configured by users, which I think we
have made a consensus before. See:
https://elixir.bootlin.com/linux/v5.1-rc6/source/include/linux/dma/sprd-dma.h#L105

-- 
Baolin Wang
Best Regards

[PATCH] pid: Remove unneeded hash header file

2019-04-29 Thread Timmy Li

Hash functions are not needed since idr is used now.
Let's remove hash header file for cleanup.

Signed-off-by: Timmy Li 
---
 kernel/pid.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index 20881598bdfa..89548d35eefb 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -32,7 +32,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.17.1

Re: [PATCH 4/7] dmaengine: sprd: Add device validation to support multiple controllers

2019-04-29 Thread Baolin Wang

On Mon, 29 Apr 2019 at 22:05, Vinod Koul  wrote:
>
> On 29-04-19, 20:20, Baolin Wang wrote:
> > On Mon, 29 Apr 2019 at 19:57, Vinod Koul  wrote:
> > >
> > > On 15-04-19, 20:14, Baolin Wang wrote:
> > > > From: Eric Long 
> > > >
> > > > Since we can support multiple DMA engine controllers, we should add
> > > > device validation in filter function to check if the correct controller
> > > > to be requested.
> > > >
> > > > Signed-off-by: Eric Long 
> > > > Signed-off-by: Baolin Wang 
> > > > ---
> > > >  drivers/dma/sprd-dma.c |5 +
> > > >  1 file changed, 5 insertions(+)
> > > >
> > > > diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
> > > > index 0f92e60..9f99d4b 100644
> > > > --- a/drivers/dma/sprd-dma.c
> > > > +++ b/drivers/dma/sprd-dma.c
> > > > @@ -1020,8 +1020,13 @@ static void sprd_dma_free_desc(struct 
> > > > virt_dma_desc *vd)
> > > >  static bool sprd_dma_filter_fn(struct dma_chan *chan, void *param)
> > > >  {
> > > >   struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> > > > + struct of_phandle_args *dma_spec =
> > > > + container_of(param, struct of_phandle_args, args[0]);
> > > >   u32 slave_id = *(u32 *)param;
> > > >
> > > > + if (chan->device->dev->of_node != dma_spec->np)
> > >
> > > Are you not using of_dma_find_controller() that does this, so this would
> > > be useless!
> >
> > Yes, we can use of_dma_find_controller(), but that will be a little
> > complicated than current solution. Since we need introduce one
> > structure to save the node to validate in the filter function like
> > below, which seems make things complicated. But if you still like to
> > use of_dma_find_controller(), I can change to use it in next version.
>
> Sorry I should have clarified more..
>
> of_dma_find_controller() is called by xlate, so you already run this
> check, so why use this :)

The of_dma_find_controller() can save the requested device node into
dma_spec, and in the of_dma_simple_xlate() function, it will call
dma_request_channel() to request one channel, but it did not validate
the device node to find the corresponding dma device in
dma_request_channel(). So we should in our filter function to validate
the device node with the device node specified by the dma_spec. Hope I
make things clear.

-- 
Baolin Wang
Best Regards

Re: [PATCH v4] panic: add an option to replay all the printk message in buffer

2019-04-29 Thread Sergey Senozhatsky

On (04/29/19 13:44), Petr Mladek wrote:
> On Sat 2019-04-27 02:16:40, Sergey Senozhatsky wrote:
> > On (04/27/19 01:43), Sergey Senozhatsky wrote:
> > [..]
> > > > The console waiter logic is effective but it does not always
> > > > work. The current console owner must be calling the console
> > > > drivers.
> > > >
> > > > >   Hmm, we might have a bit of a problem here, maybe.
> > > >
> > > > Hmm, the printk() might wait forever when NMI stopped
> > > > the current console owner in the console driver code
> > > > or with the logbuf_lock taken.
> > > 
> > > I guess this is why we re-init logbuf lock from panic,
> > > however, we don't do anything with the console_owner.
> 
> > > > The console waiter logic might get solved by clearing
> > > > the console_owner in console_flush_on_panic(). It can't
> > > > be much worse, we already ignore console_lock() there, ...
> > 
> > Hmm, or maybe we are fine... console_waiter logic should work
> > before we send out stop IPI/NMI from panic CPU. When we call
> > flush_on_panic() console_unlock() clears console_owner, so
> > panic_print_sys_info() should not deadlock on console_owner.
> 
> Good point!
> 
> > It's probably only problematic if we kill a console_owner
> > CPU and then try to printk() (from smp_send_stop()) before
> > we do flush_on_panic()->console_unlock().
> 
> Yup. There are called several functions between smp_send_stop()
> and console_flush_on_panic().
> 
> The question is if it is worth a code complication. We could
> never 100% guarantee that printk() would work in panic().
> I more and more understand what Peter Zijlstra means
> by the duct taping.

Agreed.

-ss

Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

2019-04-29 Thread Andreas Dilger


> On Apr 29, 2019, at 10:26 PM, Al Viro  wrote:
> 
> On Mon, Apr 29, 2019 at 10:18:04PM -0600, Andreas Dilger wrote:
>>> 
>>> void*i_private; /* fs or device private pointer */
>>> +   void (*free_inode)(struct inode *);
>> 
>> It seems like a waste to increase the size of every struct inode just to 
>> access
>> a static pointer.  Is this the only place that ->free_inode() is called?  Why
>> not move the ->free_inode() pointer into inode->i_fop->free_inode() so that 
>> it
>> is still directly accessible at this point.
> 
> i_op, surely?

Yes, i_op is what I was thinking.

> In any case, increasing sizeof(struct inode) is not a problem -

> if anything, I'd turn ->i_fop into an anon union with that.  As in,
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index fb45590d284e..627e1766503a 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -211,8 +211,8 @@ EXPORT_SYMBOL(free_inode_nonrcu);
> static void i_callback(struct rcu_head *head)
> {
>   struct inode *inode = container_of(head, struct inode, i_rcu);
> - if (inode->i_sb->s_op->free_inode)
> - inode->i_sb->s_op->free_inode(inode);
> + if (inode->free_inode)
> + inode->free_inode(inode);
>   else
>   free_inode_nonrcu(inode);
> }
> @@ -236,6 +236,7 @@ static struct inode *alloc_inode(struct super_block *sb)
>   if (!ops->free_inode)
>   return NULL;
>   }
> + inode->free_inode = ops->free_inode;
>   i_callback(&inode->i_rcu);
>   return NULL;
>   }

> @@ -276,6 +277,7 @@ static void destroy_inode(struct inode *inode)
>   if (!ops->free_inode)
>   return;
>   }
> + inode->free_inode = ops->free_inode;
>   call_rcu(&inode->i_rcu, i_callback);
> }

This seems like kind of a hack.  I guess your goal is to have ->free_inode
accessible regardless of whether the filesystem has installed its own ->i_op
methods or not, and i_fop is no longer used by this point.

That said, this seems better than increasing the size of struct inode.

> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 2e9b9f87caca..92732286b748 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -694,7 +694,10 @@ struct inode {
> #ifdef CONFIG_IMA
>   atomic_ti_readcount; /* struct files open RO */
> #endif
> - const struct file_operations*i_fop; /* former 
> ->i_op->default_file_ops */
> + union {
> + const struct file_operations*i_fop; /* former 
> ->i_op->default_file_ops */
> + void (*free_inode)(struct inode *);
> + };


Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP

RE: [PATCH v3 1/1] Add support for IPMB driver

2019-04-29 Thread Vadim Pasternak




> -Original Message-
> From: Asmaa Mnebhi 
> Sent: Tuesday, April 30, 2019 12:57 AM
> To: miny...@acm.org; w...@the-dreams.de; Vadim Pasternak
> ; Michael Shych 
> Cc: Asmaa Mnebhi ; linux-kernel@vger.kernel.org;
> linux-...@vger.kernel.org
> Subject: [PATCH v3 1/1] Add support for IPMB driver
> 
> Support receiving IPMB requests on a Satellite MC from the BMC.
> Once a response is ready, this driver will send back a response to the BMC via
> the IPMB channel.

Hi Asmaa,

Few common questions.

You define this driver as "Mellanox  BlueField IPMB driver".
What makes it Mellanox  BlueField specific?

Which HW configuration you used for testing? Could
you please explain connectivity schema between main BMC and
satellite BMCs?

How this module is supposed to be activated?
Don't you need to add DTS/ACPI records?

Also few comments below.

> 
> Signed-off-by: Asmaa Mnebhi 
> ---
>  drivers/char/ipmi/Kconfig|   8 +
>  drivers/char/ipmi/Makefile   |   1 +
>  drivers/char/ipmi/ipmb_dev_int.c | 386
> +++
>  3 files changed, 395 insertions(+)
>  create mode 100644 drivers/char/ipmi/ipmb_dev_int.c
> 
> diff --git a/drivers/char/ipmi/Kconfig b/drivers/char/ipmi/Kconfig index
> 94719fc..12fe8f2 100644
> --- a/drivers/char/ipmi/Kconfig
> +++ b/drivers/char/ipmi/Kconfig
> @@ -74,6 +74,14 @@ config IPMI_SSIF
>have a driver that must be accessed over an I2C bus instead of a
>standard interface.  This module requires I2C support.
> 
> +config IPMB_DEVICE_INTERFACE
> +   tristate 'IPMB Interface handler'
> +   depends on I2C && I2C_SLAVE
> +   help
> + Provides a driver for a device (Satellite MC) to
> + receive requests and send responses back to the BMC via
> + the IPMB interface. This module requires I2C support.
> +
>  config IPMI_POWERNV
> depends on PPC_POWERNV
> tristate 'POWERNV (OPAL firmware) IPMI interface'
> diff --git a/drivers/char/ipmi/Makefile b/drivers/char/ipmi/Makefile index
> 3f06b20..0822adc 100644
> --- a/drivers/char/ipmi/Makefile
> +++ b/drivers/char/ipmi/Makefile
> @@ -26,3 +26,4 @@ obj-$(CONFIG_IPMI_KCS_BMC) += kcs_bmc.o
>  obj-$(CONFIG_ASPEED_BT_IPMI_BMC) += bt-bmc.o
>  obj-$(CONFIG_ASPEED_KCS_IPMI_BMC) += kcs_bmc_aspeed.o
>  obj-$(CONFIG_NPCM7XX_KCS_IPMI_BMC) += kcs_bmc_npcm7xx.o
> +obj-$(CONFIG_IPMB_DEVICE_INTERFACE) += ipmb_dev_int.o
> diff --git a/drivers/char/ipmi/ipmb_dev_int.c
> b/drivers/char/ipmi/ipmb_dev_int.c
> new file mode 100644
> index 000..63122c3
> --- /dev/null
> +++ b/drivers/char/ipmi/ipmb_dev_int.c
> @@ -0,0 +1,386 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Mellanox IPMB driver to receive a request and send a response
> + *
> + * Copyright (C) 2018 Mellanox Techologies, Ltd.
> + *
> + * This was inspired by Brendan Higgins' ipmi-bmc-bt-i2c driver.
> + */
> +
> +#define  pr_fmt(fmt) "ipmb_dev_int: " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define  MAX_MSG_LEN 128
> +#define  IPMB_REQUEST_LEN_MIN7
> +#define  NETFN_RSP_BIT_MASK  0x4
> +#define  REQUEST_QUEUE_MAX_LEN   256
> +
> +#define  IPMB_MSG_LEN_IDX0
> +#define  RQ_SA_8BIT_IDX  1
> +#define  NETFN_LUN_IDX   2
> +
> +#define  IPMB_MSG_PAYLOAD_LEN_MAX (MAX_MSG_LEN -
> IPMB_REQUEST_LEN_MIN - 1)
> +
> +struct ipmb_msg {
> + u8 len;
> + u8 rs_sa;
> + u8 netfn_rs_lun;
> + u8 checksum1;
> + u8 rq_sa;
> + u8 rq_seq_rq_lun;
> + u8 cmd;
> + u8 payload[IPMB_MSG_PAYLOAD_LEN_MAX];
> + /* checksum2 is included in payload */ } __packed;
> +
> +static u32 ipmb_msg_len(struct ipmb_msg *ipmb_msg) {
> + return ipmb_msg->len + 1;
> +}

Do you really need it as function?

> +
> +struct ipmb_request_elem {
> + struct list_head list;
> + struct ipmb_msg request;
> +};
> +
> +struct ipmb_dev {
> + struct i2c_client *client;
> + struct miscdevice miscdev;
> + struct ipmb_msg request;
> + struct list_head request_queue;
> + atomic_t request_queue_len;
> + struct ipmb_msg response;

Where you are using 'response' field?

> + size_t msg_idx;
> + spinlock_t lock;
> + wait_queue_head_t wait_queue;
> + struct mutex file_mutex;
> +};
> +
> +static int receive_ipmb_request(struct ipmb_dev *ipmb_dev_p,
> + bool non_blocking,
> + struct ipmb_msg *ipmb_request)
> +{
> + struct ipmb_request_elem *queue_elem;
> + unsigned long flags;
> + int res;
> +
> + spin_lock_irqsave(&ipmb_dev_p->lock, flags);
> +
> + while (!atomic_read(&ipmb_dev_p->request_queue_len)) {
> + spin_unlock_irqrestore(&ipmb_dev_p->lock, flags);
> + if (non_blocking)
> + return -EAGAIN;
> +
> + res = wait_event_interruptible(ipmb_dev_p->wait_queue,
> +

Re: [PATCH RESEND] sched/cpufreq: Fix kobject memleak

2019-04-29 Thread Tobin C. Harding

On Tue, Apr 30, 2019 at 06:24:43AM +0200, Ingo Molnar wrote:
> 
> * Tobin C. Harding  wrote:
> 
> > Currently error return from kobject_init_and_add() is not followed by a
> > call to kobject_put().  This means there is a memory leak.
> > 
> > Add call to kobject_put() in error path of kobject_init_and_add().
> > 
> > Signed-off-by: Tobin C. Harding 
> > ---
> > 
> > Resend with SOB tag.
> 
> Please ignore my previous mail :-)

Cheers Ingo, caught myself not checkpatching :(

thanks,
Tobin.

[PATCH v1] mmc: dt: add DT bindings for ls1028a eSDHC host controller

2019-04-29 Thread Yinbo Zhu

From: Yinbo Zhu 

Add "fsl,ls1028a-esdhc" bindings for ls1028a eSDHC host controller

Signed-off-by: Yinbo Zhu 
---
 .../devicetree/bindings/mmc/fsl-esdhc.txt  |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/Documentation/devicetree/bindings/mmc/fsl-esdhc.txt 
b/Documentation/devicetree/bindings/mmc/fsl-esdhc.txt
index 99c5cf8..a7250b9 100644
--- a/Documentation/devicetree/bindings/mmc/fsl-esdhc.txt
+++ b/Documentation/devicetree/bindings/mmc/fsl-esdhc.txt
@@ -21,6 +21,7 @@ Required properties:
"fsl,ls1043a-esdhc"
"fsl,ls1046a-esdhc"
"fsl,ls2080a-esdhc"
+   "fsl,ls1028a-esdhc"
   - clock-frequency : specifies eSDHC base clock frequency.
 
 Optional properties:
-- 
1.7.1

Re: [PATCH v2 17/19] iommu: Add max num of cache and granu types

2019-04-29 Thread Auger Eric

Hi Jacob,

On 4/29/19 6:17 PM, Jacob Pan wrote:
> On Fri, 26 Apr 2019 18:22:46 +0200
> Auger Eric  wrote:
> 
>> Hi Jacob,
>>
>> On 4/24/19 1:31 AM, Jacob Pan wrote:
>>> To convert to/from cache types and granularities between generic and
>>> VT-d specific counterparts, a 2D arrary is used. Introduce the
>>> limits  
>> array
>>> to help define the converstion array size.  
>> conversion
>>>
> will fix, thanks
>>> Signed-off-by: Jacob Pan 
>>> ---
>>>  include/uapi/linux/iommu.h | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
>>> index 5c95905..2d8fac8 100644
>>> --- a/include/uapi/linux/iommu.h
>>> +++ b/include/uapi/linux/iommu.h
>>> @@ -197,6 +197,7 @@ struct iommu_inv_addr_info {
>>> __u64   granule_size;
>>> __u64   nb_granules;
>>>  };
>>> +#define NR_IOMMU_CACHE_INVAL_GRANU (3)
>>>  
>>>  /**
>>>   * First level/stage invalidation information
>>> @@ -235,6 +236,7 @@ struct iommu_cache_invalidate_info {
>>> struct iommu_inv_addr_info addr_info;
>>> };
>>>  };
>>> +#define NR_IOMMU_CACHE_TYPE(3)
>>>  /**
>>>   * struct gpasid_bind_data - Information about device and guest
>>> PASID binding
>>>   * @gcr3:  Guest CR3 value from guest mm
>>>   
>> Is it really something that needs to be exposed in the uapi?
>>
> I put it in uapi since the related definitions for granularity and
> cache type are in the same file.
> Maybe putting them close together like this? I was thinking you can just
> fold it into your next series as one patch for introducing cache
> invalidation.
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 2d8fac8..4ff6929 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -164,6 +164,7 @@ enum iommu_inv_granularity {
> IOMMU_INV_GRANU_DOMAIN, /* domain-selective invalidation */
> IOMMU_INV_GRANU_PASID,  /* pasid-selective invalidation */
> IOMMU_INV_GRANU_ADDR,   /* page-selective invalidation */
> +   NR_IOMMU_INVAL_GRANU,   /* number of invalidation granularities
> */ };
>  
>  /**
> @@ -228,6 +229,7 @@ struct iommu_cache_invalidate_info {
>  #define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */
>  #define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device IOTLB */
>  #define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID cache */
> +#define NR_IOMMU_CACHE_TYPE(3)

OK I will add this.

Thanks

Eric
> __u8cache;
> __u8granularity;
> 
>> Thanks
>>
>> Eric
> 
> [Jacob Pan]
>

Re: [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation

2019-04-29 Thread Ingo Molnar



* Andy Lutomirski  wrote:

> On Sat, Apr 27, 2019 at 3:46 AM Ingo Molnar  wrote:

> > So I'm wondering whether there's a 4th choice as well, which avoids
> > control flow corruption *before* it happens:
> >
> >  - A C language runtime that is a subset of current C syntax and
> >semantics used in the kernel, and which doesn't allow access outside
> >of existing objects and thus creates a strictly enforced separation
> >between memory used for data, and memory used for code and control
> >flow.
> >
> >  - This would involve, at minimum:
> >
> > - tracking every type and object and its inherent length and valid
> >   access patterns, and never losing track of its type.
> >
> > - being a lot more organized about initialization, i.e. no
> >   uninitialized variables/fields.
> >
> > - being a lot more strict about type conversions and pointers in
> >   general.
> 
> You're not the only one to suggest this.  There are at least a few
> things that make this extremely difficult if not impossible.  For
> example, consider this code:
> 
> void maybe_buggy(void)
> {
>   int a, b;
>   int *p = &a;
>   int *q = (int *)some_function((unsigned long)p);
>   *q = 1;
> }
> 
> If some_function(&a) returns &a, then all is well.  But if
> some_function(&a) returns &b or even a valid address of some unrelated
> kernel object, then the code might be entirely valid and correct C,
> but I don't see how the runtime checks are supposed to tell whether
> the resulting address is valid or is a bug.  This type of code is, I
> think, quite common in the kernel -- it happens in every data
> structure where we have unions of pointers and integers or where we
> steal some known-zero bits of a pointer to store something else.

So the thing is, for the infinitely large state space of "valid C code" 
we already disallow an infinitely many versions in the Linux kernel.

We have complicated rules that disallow certain C syntactical and 
semantical constructs, both on the tooling (build failure/warning) and on 
the review (style/taste) level.

So the question IMHO isn't whether it's "valid C", because we already 
have the Linux kernel's own C syntax variant and are enforcing it with 
varying degrees of success.

The question is whether the example you gave can be written in a strongly 
typed fashion, whether it makes sense to do so, and what the costs are.

I think it's evident that it can be written with strongly typed 
constructs, by separating pointers from embedded error codes - with 
negative side effects to code generation: for example it increases 
structure sizes and error return paths.

I think there's four main costs of converting such a pattern to strongly 
typed constructs:

 - memory/cache footprint:  there's a nonzero cost there.
 - performance: this will hurt too.
 - code readability:this will probably improve.
 - code robustness: this will improve too.

So I think the proper question to ask is not whether there's common C 
syntax within the kernel that would have to be rewritten, but whether the 
total sum of memory and runtime overhead of strongly typed C programming 
(if it's possible/desirable) is larger than the total sum of a typical 
Linux distro enabling the various current and proposed kernel hardening 
features that have a runtime overhead:

 - the SMAP/SMEP overhead of STAC/CLAC for every single user copy

 - other usercopy hardening features

 - stackprotector

 - KASLR

 - compiler plugins against information leaks

 - proposed KASLR extension to implement module randomization and -PIE overhead

 - proposed function call integrity checks

 - proposed per system call kernel stack offset randomization

 - ( and I'm sure I forgot about a few more, and it's all still only 
 reactive security, not proactive security. )

That's death by a thousand cuts and CR3 switching during system calls is 
also throwing a hand grenade into the fight ;-)

So if people are also proposing to do CR3 switches in every system call, 
I'm pretty sure the answer is "yes, even a managed C runtime is probably 
faster than *THAT* sum of a performanc mess" - at least with the current 
CR3 switching x86-uarch cost structure...

Thanks,

Ingo

Re: [PATCH v3 1/4] include: dt-bindings: add Performance Monitoring Unit for Exynos

2019-04-29 Thread Chanwoo Choi

Hi,

I agree of this patch. But, I add the minor comments.

If you edit them according to my comment, feel free to add my following tag:
Acked-by: Chanwoo Choi 

On 19. 4. 19. 오후 10:48, Lukasz Luba wrote:
> This patch add support of a new feature which can be used in DT:
> Performance Monitoring Unit with defined event data type.
> In this patch the event data types are defined for Exynos PPMU.
> The patch also updates the MAINTAINERS file accordingly and
> adds the header file to devfreq event subsystem.
> 
> Signed-off-by: Lukasz Luba 
> ---
>  MAINTAINERS   |  1 +
>  include/dt-bindings/pmu/exynos_ppmu.h | 26 ++
>  2 files changed, 27 insertions(+)
>  create mode 100644 include/dt-bindings/pmu/exynos_ppmu.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3671fde..1ba4b9b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4560,6 +4560,7 @@ T:  git 
> git://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq.git
>  S:   Supported
>  F:   drivers/devfreq/event/
>  F:   drivers/devfreq/devfreq-event.c
> +F:   include/dt-bindings/pmu/exynos_ppmu.h
>  F:   include/linux/devfreq-event.h
>  F:   Documentation/devicetree/bindings/devfreq/event/
>  
> diff --git a/include/dt-bindings/pmu/exynos_ppmu.h 
> b/include/dt-bindings/pmu/exynos_ppmu.h
> new file mode 100644
> index 000..08fdce9
> --- /dev/null
> +++ b/include/dt-bindings/pmu/exynos_ppmu.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Samsung Exynos PPMU event types for counting in regs
> + *
> + * Copyright (c) 2019, Samsung

Mabye, "Samsung Electronics" instead of 'Samsung'.

> + * Author: Lukasz Luba 
> + */
> +
> +#ifndef __DT_BINDINGS_PMU_EXYNOS_PPMU_H
> +#define __DT_BINDINGS_PMU_EXYNOS_PPMU_H
> +
> +

Remove unneeded blank line.

> +#define PPMU_RO_BUSY_CYCLE_CNT   0x0
> +#define PPMU_WO_BUSY_CYCLE_CNT   0x1
> +#define PPMU_RW_BUSY_CYCLE_CNT   0x2
> +#define PPMU_RO_REQUEST_CNT  0x3
> +#define PPMU_WO_REQUEST_CNT  0x4
> +#define PPMU_RO_DATA_CNT 0x5
> +#define PPMU_WO_DATA_CNT 0x6
> +#define PPMU_RO_LATENCY  0x12
> +#define PPMU_WO_LATENCY  0x16
> +#define PPMU_V2_RO_DATA_CNT  0x4
> +#define PPMU_V2_WO_DATA_CNT  0x5
> +#define PPMU_V2_EVT3_RW_DATA_CNT 0x22
> +
> +#endif
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

sh4-linux-gnu-ld: arch/sh/kernel/cpu/sh2/clock-sh7619.o:undefined reference to `followparent_recalc'

2019-04-29 Thread kbuild test robot

Hi Randy,

It's probably a bug fix that unveils the link errors.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   83a50840e72a5a964b4704fcdc2fbb2d771015ab
commit: acaf892ecbf5be7710ae05a61fd43c668f68ad95 sh: fix multiple function 
definition build errors
date:   3 weeks ago
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout acaf892ecbf5be7710ae05a61fd43c668f68ad95
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

>> sh4-linux-gnu-ld: arch/sh/kernel/cpu/sh2/clock-sh7619.o:(.data+0x1c): 
>> undefined reference to `followparent_recalc'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH v6 01/10] clk: samsung: add needed IDs for DMC clocks in Exynos5420

2019-04-29 Thread Chanwoo Choi

Hi,

On 19. 4. 19. 오후 11:19, Lukasz Luba wrote:
> Define new IDs for clocks used by Dynamic Memory Controller in
> Exynos5422 SoC.
> 
> Acked-by: Rob Herring 
> Signed-off-by: Lukasz Luba 
> ---
>  include/dt-bindings/clock/exynos5420.h | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/include/dt-bindings/clock/exynos5420.h 
> b/include/dt-bindings/clock/exynos5420.h
> index 355f469..abb1842 100644
> --- a/include/dt-bindings/clock/exynos5420.h
> +++ b/include/dt-bindings/clock/exynos5420.h
> @@ -60,6 +60,7 @@
>  #define CLK_MAU_EPLL 159
>  #define CLK_SCLK_HSIC_12M160
>  #define CLK_SCLK_MPHY_IXTAL24161
> +#define CLK_SCLK_BPLL162
>  
>  /* gate clocks */
>  #define CLK_UART0257
> @@ -195,6 +196,18 @@
>  #define CLK_ACLK432_CAM  518
>  #define CLK_ACLK_FL1550_CAM  519
>  #define CLK_ACLK550_CAM  520
> +#define CLK_CLKM_PHY0521
> +#define CLK_CLKM_PHY1522
> +#define CLK_ACLK_PPMU_DREX0_0523
> +#define CLK_ACLK_PPMU_DREX0_1524
> +#define CLK_ACLK_PPMU_DREX1_0525
> +#define CLK_ACLK_PPMU_DREX1_1526
> +#define CLK_PCLK_PPMU_DREX0_0527
> +#define CLK_PCLK_PPMU_DREX0_1528
> +#define CLK_PCLK_PPMU_DREX1_0529
> +#define CLK_PCLK_PPMU_DREX1_1530
> +#define CLK_CDREX_PAUSE  531
> +#define CLK_CDREX_TIMING_SET 532

I cannot find the usage code of both CLK_CDREX_PAUSE
and CLK_CDREX_TIMING_SET in these patchset. 

Please remove them.

(snip)

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

[PATCH 1/2] i2c: imx: I2C Driver doesn't consider I2C_IPGCLK_SEL RCW bit when using ls1046a SoC

2019-04-29 Thread Chuanhua Han

The current kernel driver does not consider I2C_IPGCLK_SEL (424 bit
of RCW) in deciding  i2c_clk_rate in function i2c_imx_set_clk()
{ 0 Platform clock/4, 1 Platform clock/2}.

When using ls1046a SoC, this populates incorrect value in IBFD register
if I2C_IPGCLK_SEL = 0, which generates half of the desired Clock.

Therefore, if ls1046a SoC is used, we need to set the i2c clock
according to the corresponding RCW.

Signed-off-by: Sumit Batra 
Signed-off-by: Chuanhua Han 
---
 drivers/i2c/busses/i2c-imx.c | 64 
 1 file changed, 64 insertions(+)

diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index 422f1a445b55..7186cf3c7d24 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -45,6 +45,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /* This will be the driver name the kernel reports */
 #define DRIVER_NAME "imx-i2c"
@@ -109,6 +111,21 @@
 
 #define I2C_PM_TIMEOUT 10 /* ms */
 
+/* 14-1 Since array index starts from 0 */
+#define RCW_I2C_IPGCLK_WORD (14 - 1)
+/*
+ * Set mask for RCW 424th bit, reading from DCFG_CCSR RCW Status Registers
+ * Since this register in RM depicted as big endian,
+ * so consider 31st bit as LSB for creating the mask.
+ */
+#define RCW_I2C_IPGCLK_MASK0x80
+int i2c_ipgclk_sel = 1;
+
+static const struct soc_device_attribute ls1046a_soc[] = {
+  {.family = "QorIQ LS1046A"},
+  { /* sentinel */ }
+};
+
 /*
  * sorted list of clock divider, register value pairs
  * taken from table 26-5, p.26-9, Freescale i.MX
@@ -304,6 +321,11 @@ static const struct platform_device_id imx_i2c_devtype[] = 
{
 };
 MODULE_DEVICE_TABLE(platform, imx_i2c_devtype);
 
+static const struct of_device_id guts_device_ids[] = {
+   { .compatible = "fsl,qoriq-device-config", },
+   {}
+};
+
 static const struct of_device_id i2c_imx_dt_ids[] = {
{ .compatible = "fsl,imx1-i2c", .data = &imx1_i2c_hwdata, },
{ .compatible = "fsl,imx21-i2c", .data = &imx21_i2c_hwdata, },
@@ -533,6 +555,9 @@ static void i2c_imx_set_clk(struct imx_i2c_struct *i2c_imx,
unsigned int div;
int i;
 
+   if (!i2c_ipgclk_sel)
+   i2c_clk_rate = i2c_clk_rate / 2;
+
/* Divider value calculation */
if (i2c_imx->cur_clk == i2c_clk_rate)
return;
@@ -551,6 +576,10 @@ static void i2c_imx_set_clk(struct imx_i2c_struct *i2c_imx,
/* Store divider value */
i2c_imx->ifdr = i2c_clk_div[i].val;
 
+   pr_alert("[%s] CLK Rate=%u Bitrate =%u Div =%u Value =%d\n",
+__func__, i2c_clk_rate, i2c_imx->bitrate,
+div, i2c_clk_div[i].val);
+
/*
 * There dummy delay is calculated.
 * It should be about one I2C clock period long.
@@ -1116,6 +1145,9 @@ static int i2c_imx_probe(struct platform_device *pdev)
int irq, ret;
dma_addr_t phy_addr;
u32 mul_value;
+   struct device_node *guts_node;
+   static struct ccsr_guts __iomem *guts_regs;
+   u32 rcw_reg;
 
dev_dbg(&pdev->dev, "<%s>\n", __func__);
 
@@ -1135,6 +1167,38 @@ static int i2c_imx_probe(struct platform_device *pdev)
if (!i2c_imx)
return -ENOMEM;
 
+   if (soc_device_match(ls1046a_soc)) {
+   /*
+* Make device node for GUTS/DCFG (global utilities block)
+* to read RCW.
+*/
+   guts_node = of_find_matching_node(NULL, guts_device_ids);
+   if (!guts_node) {
+   dev_err(&pdev->dev, "Could not find GUTS node\n");
+   return -ENODEV;
+   }
+   /*
+* Memory (IO)  MAP the DCFG registers(for RCW) to
+* be used in kernel virtual address space.
+*/
+   guts_regs = of_iomap(guts_node, 0);
+   of_node_put(guts_node);
+   if (!guts_regs) {
+   dev_err(&pdev->dev, "IOREMAP of GUTS node failed\n");
+   return -ENOMEM;
+   }
+   /* Read rcw bit 424 (starting from 0) */
+   rcw_reg = ioread32be(&guts_regs->rcwsr[RCW_I2C_IPGCLK_WORD]);
+   pr_alert("RCW REG[%d]=0x%x\n", RCW_I2C_IPGCLK_WORD, rcw_reg);
+   if (rcw_reg & RCW_I2C_IPGCLK_MASK) {
+   pr_alert("Div by 2 Case Detected in RCW\n");
+   i2c_ipgclk_sel = 1;
+   } else {
+   pr_alert("Div by 4 Case Detected in RCW\n");
+   i2c_ipgclk_sel = 0;
+   }
+   }
+
if (of_id) {
i2c_imx->hwdata = of_id->data;
ret = of_property_read_u32(pdev->dev.of_node,
-- 
2.17.1

Re: [PATCH v6 06/10] dt-bindings: memory-controllers: add Exynos5422 DMC device description

2019-04-29 Thread Chanwoo Choi

On 19. 4. 19. 오후 11:19, Lukasz Luba wrote:
> The patch adds description for DT binding for a new Exynos5422 Dynamic
> Memory Controller device.
> 
> Signed-off-by: Lukasz Luba 
> ---
>  .../bindings/memory-controllers/exynos5422-dmc.txt | 73 
> ++
>  1 file changed, 73 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/memory-controllers/exynos5422-dmc.txt
> 
> diff --git 
> a/Documentation/devicetree/bindings/memory-controllers/exynos5422-dmc.txt 
> b/Documentation/devicetree/bindings/memory-controllers/exynos5422-dmc.txt
> new file mode 100644
> index 000..133b3cc
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/memory-controllers/exynos5422-dmc.txt
> @@ -0,0 +1,73 @@
> +* Exynos5422 frequency and voltage scaling for Dynamic Memory Controller 
> device
> +
> +The Samsung Exynos5422 SoC has DMC (Dynamic Memory Controller) to which the 
> DRAM
> +memory chips are connected. The driver is to monitor the controller in 
> runtime
> +and switch frequency and voltage. To monitor the usage of the controller in
> +runtime, the driver uses the PPMU (Platform Performance Monitoring Unit), 
> which
> +is able to measure the current load of the memory.
> +When 'userspace' governor is used for the driver, an application is able to
> +switch the DMC and memory frequency.
> +
> +Required properties for DMC device for Exynos5422:
> +- compatible: Should be "samsung,exynos5422-bus".

As I already mentioned on many times, it is not fixed.
You have to fix it as following:
- exynos5422-bus -> exynos5422-dmc

> +- clock-names : the name of clock used by the bus, "bus".

The below examples doesn't contain the 'bus' clock name.

> +- clocks : phandles for clock specified in "clock-names" property.
> +- devfreq-events : phandles for PPMU devices connected to this DMC.
> +- vdd-supply : phandle for voltage regulator which is connected.
> +- reg : registers of two CDREX controllers, chip information, clocks 
> subsystem.
> +- operating-points-v2 : phandle for OPPs described in v2 definition.
> +- device-handle : phandle of the connected DRAM memory device. For more
> + information please refer to Documentation
> +- devfreq-events : phandles of the PPMU events used by the controller.
> +
> +Example:
> +
> + ppmu_dmc0_0: ppmu@10d0 {
> + compatible = "samsung,exynos-ppmu";
> + reg = <0x10d0 0x2000>;
> + clocks = <&clock CLK_PCLK_PPMU_DREX0_0>;
> + clock-names = "ppmu";
> + status = "okay";
> + events {
> + ppmu_event_dmc0_0: ppmu-event3-dmc0_0 {
> + event-name = "ppmu-event3-dmc0_0";
> + };
> + };
> + };
> +
> + dmc: memory-controller@10c2 {
> + compatible = "samsung,exynos5422-dmc";
> + reg = <0x10c2 0x1>, <0x10c3 0x1>,
> + <0x1000 0x1000>, <0x1003 0x1000>;
> + clocks =<&clock CLK_FOUT_SPLL>,
> + <&clock CLK_MOUT_SCLK_SPLL>,
> + <&clock CLK_FF_DOUT_SPLL2>,
> + <&clock CLK_FOUT_BPLL>,
> + <&clock CLK_MOUT_BPLL>,
> + <&clock CLK_SCLK_BPLL>,
> + <&clock CLK_MOUT_MX_MSPLL_CCORE>,
> + <&clock CLK_MOUT_MX_MSPLL_CCORE_PHY>,
> + <&clock CLK_MOUT_MCLK_CDREX>,
> + <&clock CLK_DOUT_CLK2X_PHY0>,
> + <&clock CLK_CLKM_PHY0>,
> + <&clock CLK_CLKM_PHY1>;
> + clock-names =   "fout_spll",
> + "mout_sclk_spll",
> + "ff_dout_spll2",
> + "fout_bpll",
> + "mout_bpll",
> + "sclk_bpll",
> + "mout_mx_mspll_ccore",
> + "mout_mx_mspll_ccore_phy",
> + "mout_mclk_cdrex",
> + "dout_clk2x_phy0",
> + "clkm_phy0",
> + "clkm_phy1";
> + status = "okay";
> + operating-points-v2 = <&dmc_opp_table>;
> + devfreq-events = <&ppmu_event3_dmc0_0>, <&ppmu_event3_dmc0_1>,
> + <&ppmu_event3_dmc1_0>, <&ppmu_event3_dmc1_1>;
> + operating-points-v2 = <&dmc_opp_table>;
> + device-handle = <&samsung_K3QF2F20DB>;
> + vdd-supply = <&buck1_reg>;
> + };
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

[PATCH 2/2] arm64: dts: fsl: ls1046a: Add the guts node in dts

2019-04-29 Thread Chuanhua Han

For NXP ls1046a SoC, the i2c clock needs to be configured with the
appropriate bit of RCW, so we add the guts node (GUTS/DCFG global
utilities block) for the driver to read.

Signed-off-by: Sumit Batra 
Signed-off-by: Chuanhua Han 
---
 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 373310e4c0ea..f88599df18bb 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -205,6 +205,11 @@
status = "disabled";
};
 
+   guts: global-utilities@1ee {
+   compatible = "fsl,qoriq-device-config";
+   reg = <0x0 0x1ee 0x0 0x1000>;
+   };
+
qspi: spi@155 {
compatible = "fsl,ls1021a-qspi";
#address-cells = <1>;
-- 
2.17.1

Re: [RFC PATCH v2 00/17] Core scheduling v2

2019-04-29 Thread Ingo Molnar



* Aubrey Li  wrote:

> On Tue, Apr 30, 2019 at 12:01 AM Ingo Molnar  wrote:
> > * Li, Aubrey  wrote:
> >
> > > > I.e. showing the approximate CPU thread-load figure column would be
> > > > very useful too, where '50%' shows half-loaded, '100%' fully-loaded,
> > > > '200%' over-saturated, etc. - for each row?
> > >
> > > See below, hope this helps.
> > > .--.
> > > |NA/AVX vanilla-SMT [std% / sem%] cpu% |coresched-SMT   [std% / 
> > > sem%] +/- cpu% |  no-SMT [std% / sem%]   +/-  cpu% |
> > > |--|
> > > |  1/1508.5 [ 0.2%/ 0.0%] 2.1% |504.7   [ 1.1%/ 
> > > 0.1%]-0.8%2.1% |   509.0 [ 0.2%/ 0.0%]   0.1% 4.3% |
> > > |  2/2   1000.2 [ 1.4%/ 0.1%] 4.1% |   1004.1   [ 1.6%/ 
> > > 0.2%] 0.4%4.1% |   997.6 [ 1.2%/ 0.1%]  -0.3% 8.1% |
> > > |  4/4   1912.1 [ 1.0%/ 0.1%] 7.9% |   1904.2   [ 1.1%/ 
> > > 0.1%]-0.4%7.9% |  1914.9 [ 1.3%/ 0.1%]   0.1%15.1% |
> > > |  8/8   3753.5 [ 0.3%/ 0.0%]14.9% |   3748.2   [ 0.3%/ 
> > > 0.0%]-0.1%   14.9% |  3751.3 [ 0.4%/ 0.0%]  -0.1%30.5% |
> > > | 16/16  7139.3 [ 2.4%/ 0.2%]30.3% |   7137.9   [ 1.8%/ 
> > > 0.2%]-0.0%   30.3% |  7049.2 [ 2.4%/ 0.2%]  -1.3%60.4% |
> > > | 32/32 10899.0 [ 4.2%/ 0.4%]60.3% |  10780.3   [ 4.4%/ 
> > > 0.4%]-1.1%   55.9% | 10339.2 [ 9.6%/ 0.9%]  -5.1%97.7% |
> > > | 64/64 15086.1 [11.5%/ 1.2%]97.7% |  14262.0   [ 8.2%/ 
> > > 0.8%]-5.5%   82.0% | 11168.7 [22.2%/ 1.7%] -26.0%   100.0% |
> > > |128/12815371.9 [22.0%/ 2.2%]   100.0% |  14675.8   [14.4%/ 
> > > 1.4%]-4.5%   82.8% | 10963.9 [18.5%/ 1.4%] -28.7%   100.0% |
> > > |256/25615990.8 [22.0%/ 2.2%]   100.0% |  12227.9   [10.3%/ 
> > > 1.0%]   -23.5%   73.2% | 10469.9 [19.6%/ 1.7%] -34.5%   100.0% |
> > > '--'
> >
> > Very nice, thank you!
> >
> > What's interesting is how in the over-saturated case (the last three
> > rows: 128, 256 and 512 total threads) coresched-SMT leaves 20-30% CPU
> > performance on the floor according to the load figures.
> 
> Yeah, I found the next focus.
> 
> > Is this true idle time (which shows up as 'id' during 'top'), or some 
> > load average artifact?
> 
> vmstat periodically reported intermediate CPU utilization in one 
> second, it was running simultaneously when the benchmarks run. The cpu% 
> is computed by the average of (100-idle) series.

Ok - so 'vmstat' uses /proc/stat, which uses cpustat[CPUTIME_IDLE] (or 
its NOHZ work-alike), so this should be true idle time - to the extent 
the HZ process clock's sampling is accurate.

So I guess the answer to my question is "yes". ;-)

BTW., for robustness sake you might want to add iowait to idle time (it's 
the 'wa' field of vmstat) - it shouldn't matter for this particular 
benchmark which doesn't do much IO, but it might for others.

Both CPUTIME_IDLE and CPUTIME_IOWAIT are idle states when a CPU is not 
utilized.

[ Side note: we should really implement precise idle time accounting when 
  CONFIG_IRQ_TIME_ACCOUNTING=y is enabled. We pay all the costs of the 
  timestamps, but AFAICS we don't propagate that into the idle cputime
  metrics. ]

Thanks,

Ingo

Re: [PATCH v3 2/2] dt-bindings: cpufreq: Document allwinner,cpu-operating-points-v2

2019-04-29 Thread Viresh Kumar

On 29-04-19, 11:18, Rob Herring wrote:
> On Sun, Apr 28, 2019 at 4:53 AM Frank Lee  wrote:
> >
> > On Sat, Apr 27, 2019 at 5:15 AM Rob Herring  wrote:
> > >
> > > On Wed, Apr 10, 2019 at 01:41:39PM -0400, Yangtao Li wrote:
> > > > Allwinner Process Voltage Scaling Tables defines the voltage and
> > > > frequency value based on the speedbin blown in the efuse combination.
> > > > The sunxi-cpufreq-nvmem driver reads the efuse value from the SoC to
> > > > provide the OPP framework with required information.
> > > > This is used to determine the voltage and frequency value for each
> > > > OPP of operating-points-v2 table when it is parsed by the OPP framework.
> > > >
> > > > The "allwinner,cpu-operating-points-v2" DT extends the 
> > > > "operating-points-v2"
> > > > with following parameters:
> > > > - nvmem-cells (NVMEM area containig the speedbin information)
> > > > - opp-microvolt-: voltage in micro Volts.
> > > >   At runtime, the platform can pick a  and matching
> > > >   opp-microvolt- property.
> > > >   HW: :
> > > >   sun50iw-h6  speed0 speed1 speed2
> > >
> > > We already have at least one way to support speed bins with QC kryo
> > > binding. Why do we need a different way?
> >
> > For some SOCs, for some reason (making the CPU have approximate 
> > performance),
> > they use the same frequency but different voltage. In the case where
> > this speed bin
> > is not a lot and opp uses the same frequency, too many repeated opp
> > nodes are a bit
> > redundant and not intuitive enough.
> >
> > So, I think it's worth the new method.
> 
> Well, I don't.
> 
> We can't have every SoC vendor doing their own thing just because they
> want to. If there are technical reasons why existing bindings don't
> work, then maybe we need to do something different. But I haven't
> heard any reasons.

Well there is a good reason for attempting the new bindings and I wasn't sure if
updating the earlier bindings or adding another one for platform is correct. As
we aren't really adding new bindings, but just documentation around it.

So there are two ways OPP core support this thing:

- opp-supported-hw: This is a better fit if we have a smaller group of
  frequencies to select from a bigger group, so we disable non-required OPPs
  completely. This is what Qcom did as they wanted to select different
  frequencies all together.

- opp-microvolt-: This is a better fit if the frequencies remain same and
  only few of the properties like voltage/current have a different value. So we
  don't disable any OPPs but just select the right voltage/current for those
  frequencies. This avoids unnecessary duplication of the OPPs in DT and that's
  what allwinner guys want.

The kryo nvmem bindings currently supports opp-supported-hw, maybe we can add
mention support for second one in the same file and rename it well.

-- 
viresh

[PATCH 1/3] dt-bindings: i2c: add optional mul-value property to binding

2019-04-29 Thread Chuanhua Han

NXP Layerscape SoC have up to three MUL options available for all
divider values, we choice of MUL determines the internal monitor rate
of the I2C bus (SCL and SDA signals):
A lower MUL value results in a higher sampling rate of the I2C signals.
A higher MUL value results in a lower sampling rate of the I2C signals.

So in Optional properties we added our custom mul-value property in the
binding to select which mul option for the device tree i2c controller
node.

Signed-off-by: Chuanhua Han 
---
 Documentation/devicetree/bindings/i2c/i2c-imx.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/i2c/i2c-imx.txt 
b/Documentation/devicetree/bindings/i2c/i2c-imx.txt
index b967544590e8..ba8e7b7b3fa8 100644
--- a/Documentation/devicetree/bindings/i2c/i2c-imx.txt
+++ b/Documentation/devicetree/bindings/i2c/i2c-imx.txt
@@ -18,6 +18,9 @@ Optional properties:
 - sda-gpios: specify the gpio related to SDA pin
 - pinctrl: add extra pinctrl to configure i2c pins to gpio function for i2c
   bus recovery, call it "gpio" state
+- mul-value: NXP Layerscape SoC have up to three MUL options available for
+all I2C divider values, it describes which MUL we choose to use for the driver,
+the values should be 1,2,4.
 
 Examples:
 
-- 
2.17.1

[PATCH 2/3] i2c: imx: I2C Driver IBC and SCL Divider for MUL=2 and MUL=4

2019-04-29 Thread Chuanhua Han

NXP Layerscape SoC have up to three MUL options available for all
divider values,we choice of MUL determines the internal monitor rate
of the I2C bus (SCL and SDA signals).

The current kernel driver supports MUL=1 by default ,but doesn't have
the IBC and SCL Divider entries in vf610_i2c_clk_div for MUL=2  and
MUL=4,so we need to add the corresponding support.

Signed-off-by: Sumit Batra 
Signed-off-by: Chuanhua Han 
---
 drivers/i2c/busses/i2c-imx.c | 71 +++-
 1 file changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index 42fed40198a0..ac5a334b7339 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -156,6 +157,44 @@ static struct imx_i2c_clk_pair vf610_i2c_clk_div[] = {
{ 3840, 0x3F }, { 4096, 0x7B }, { 5120, 0x7D }, { 6144, 0x7E },
 };
 
+static struct imx_i2c_clk_pair mul2_i2c_clk_div[] = {
+   { 40,   0x40 }, { 44,   0x41 }, { 48,   0x42 }, { 52,   0x43 },
+   { 56,   0x44 }, { 60,   0x45 }, { 68,   0x46 }, { 80,   0x47 },
+   { 56,   0x48 }, { 64,   0x49 }, { 72,   0x4A }, { 80,   0x4B },
+   { 88,   0x4C }, { 96,   0x4D }, { 112,  0x4E }, { 136,  0x4F },
+   { 96,   0x50 }, { 112,  0x51 }, { 128,  0x52 }, { 144,  0x53 },
+   { 160,  0x54 }, { 176,  0x55 }, { 208,  0x56 }, { 256,  0x57 },
+   { 160,  0x58 }, { 192,  0x59 }, { 224,  0x5A }, { 256,  0x5B },
+   { 288,  0x5C }, { 320,  0x5D }, { 384,  0x5E }, { 480,  0x5F },
+   { 320,  0x60 }, { 384,  0x61 }, { 448,  0x62 }, { 512,  0x63 },
+   { 576,  0x64 }, { 640,  0x65 }, { 768,  0x66 }, { 960,  0x67 },
+   { 640,  0x68 }, { 768,  0x69 }, { 896,  0x6A }, { 1024, 0x6B },
+   { 1152, 0x6C }, { 1280, 0x6D }, { 1536, 0x6E }, { 1920, 0x6F },
+   { 1280, 0x70 }, { 1536, 0x71 }, { 1792, 0x72 }, { 2048, 0x73 },
+   { 2304, 0x74 }, { 2560, 0x75 }, { 3072, 0x76 }, { 3840, 0x77 },
+   { 2560, 0x78 }, { 3072, 0x79 }, { 3584, 0x7A }, { 4096, 0x7B },
+   { 4608, 0x7C }, { 5120, 0x7D }, { 6144, 0x7E }, { 7680, 0x7F },
+};
+
+static struct imx_i2c_clk_pair mul4_i2c_clk_div[] = {
+   { 80,0x80 }, { 88,0x81 }, { 96,0x82 }, { 104,   0x83 },
+   { 112,   0x84 }, { 120,   0x85 }, { 136,   0x86 }, { 160,   0x87 },
+   { 112,   0x88 }, { 128,   0x89 }, { 144,   0x8A }, { 160,   0x8B },
+   { 176,   0x8C }, { 192,   0x8D }, { 224,   0x8E }, { 272,   0x8F },
+   { 192,   0x90 }, { 224,   0x91 }, { 256,   0x92 }, { 288,   0x93 },
+   { 320,   0x94 }, { 352,   0x95 }, { 416,   0x96 }, { 512,   0x97 },
+   { 320,   0x98 }, { 384,   0x99 }, { 448,   0x9A }, { 512,   0x9B },
+   { 576,   0x9C }, { 640,   0x9D }, { 768,   0x9E }, { 960,   0x9F },
+   { 640,   0xA0 }, { 768,   0xA1 }, { 896,   0xA2 }, { 1024,  0xA3 },
+   { 1152,  0xA4 }, { 1280,  0xA5 }, { 1536,  0xA6 }, { 1792,  0xAA },
+   { 1280,  0xA8 }, { 1536,  0xA9 }, { 1920,  0xA7 }, { 2048,  0xAB },
+   { 2304,  0xAC }, { 2560,  0xAD }, { 3072,  0xAE }, { 3584,  0xB2 },
+   { 2560,  0xB0 }, { 3072,  0xB1 }, { 3820,  0xAF }, { 4096,  0xB3 },
+   { 4608,  0xB4 }, { 5120,  0xB5 }, { 6144,  0xB6 }, { 7680,  0xB7 },
+   { 5120,  0xB8 }, { 6144,  0xB9 }, { 7168,  0xBA }, { 8192,  0xBB },
+   { 9216,  0xBC }, { 10240, 0xBD }, { 12288, 0xBE }, { 15360, 0xBF },
+};
+
 enum imx_i2c_type {
IMX1_I2C,
IMX21_I2C,
@@ -234,6 +273,24 @@ static struct imx_i2c_hwdata vf610_i2c_hwdata = {
 
 };
 
+static struct imx_i2c_hwdata mul2_i2c_hwdata = {
+   .devtype= VF610_I2C,
+   .regshift   = VF610_I2C_REGSHIFT,
+   .clk_div= mul2_i2c_clk_div,
+   .ndivs  = ARRAY_SIZE(mul2_i2c_clk_div),
+   .i2sr_clr_opcode= I2SR_CLR_OPCODE_W1C,
+   .i2cr_ien_opcode= I2CR_IEN_OPCODE_0,
+};
+
+static struct imx_i2c_hwdata mul4_i2c_hwdata = {
+   .devtype= VF610_I2C,
+   .regshift   = VF610_I2C_REGSHIFT,
+   .clk_div= mul4_i2c_clk_div,
+   .ndivs  = ARRAY_SIZE(mul4_i2c_clk_div),
+   .i2sr_clr_opcode= I2SR_CLR_OPCODE_W1C,
+   .i2cr_ien_opcode= I2CR_IEN_OPCODE_0,
+};
+
 static const struct platform_device_id imx_i2c_devtype[] = {
{
.name = "imx1-i2c",
@@ -1058,6 +1115,7 @@ static int i2c_imx_probe(struct platform_device *pdev)
void __iomem *base;
int irq, ret;
dma_addr_t phy_addr;
+   u32 mul_value;
 
dev_dbg(&pdev->dev, "<%s>\n", __func__);
 
@@ -1077,11 +1135,20 @@ static int i2c_imx_probe(struct platform_device *pdev)
if (!i2c_imx)
return -ENOMEM;
 
-   if (of_id)
+   if (of_id) {
i2c_imx->hwdata = of_id->data;
-   else
+   ret = of_property_read_u32(pdev->dev.of_nod

[PATCH 3/3] arm64: dts: fsl: ls1046a: Add mul-value property of the i2c controller nodes

2019-04-29 Thread Chuanhua Han

According to LS1046A Reference Manual, for the i2c controller, you have
up to three MUL options available for all divider values. Therefore, we
need to determine which MUL to use in the device tree for driver use.

The "mul-value" property provides which mul is used in our driver.

Signed-off-by: Chuanhua Han 
---
 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index b0ef08b090dd..373310e4c0ea 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -385,6 +385,7 @@
dmas = <&edma0 1 39>,
   <&edma0 1 38>;
dma-names = "tx", "rx";
+   mul-value = <4>;
status = "disabled";
};
 
@@ -395,6 +396,7 @@
reg = <0x0 0x219 0x0 0x1>;
interrupts = ;
clocks = <&clockgen 4 1>;
+   mul-value = <4>;
status = "disabled";
};
 
@@ -405,6 +407,7 @@
reg = <0x0 0x21a 0x0 0x1>;
interrupts = ;
clocks = <&clockgen 4 1>;
+   mul-value = <4>;
status = "disabled";
};
 
@@ -415,6 +418,7 @@
reg = <0x0 0x21b 0x0 0x1>;
interrupts = ;
clocks = <&clockgen 4 1>;
+   mul-value = <4>;
status = "disabled";
};
 
-- 
2.17.1

PROBLEM: Elan touchpad regression on Kernel 5.0.10

2019-04-29 Thread Outvi V

Hello,

[1.] One line summary of the problem: Elan touchpad regression on Kernel 5.0.10

[2.] Full description of the problem/report:
  Elan touchpad does not work on 5.0.10 while working on 5.0.9

[3.] Keywords: elan_i2c_core elan i2c touchpad 5.0.10

[4.] Kernel information
[4.1.] Kernel version:
  Linux version 5.0.10-arch1-1-ARCH (builduser@heftig-2592) (gcc version 8.3.0 
(GCC)) #1 SMP PREEMPT Sat Apr 27 20:06:45 UTC 2019
[4.2.] Kernel .config file:
  I'm not sure, but I think it may be referring to
  
https://git.archlinux.org/svntogit/packages.git/tree/trunk/config?h=packages/linux
[5.] Most recent kernel version which did not have the bug: 5.0.9

[6.] Output of Oops.. message (if applicable) with symbolic information
 resolved (Not appliable)
[7.] A small shell script or example program which triggers the
 problem: (Not appliable)

[8.] Environment
[8.1.] Software (add the output of the ver_linux script here)
  
Linux sheltty 5.0.10-arch1-1-ARCH #1 SMP PREEMPT Sat Apr 27 20:06:45 UTC 2019 
x86_64 GNU/Linux

GNU C   8.3.0
GNU Make4.2.1
Binutils2.32
Util-linux  2.33.2
Mount   2.33.2
Module-init-tools   26
E2fsprogs   1.45.0
Jfsutils1.1.15
Reiserfsprogs   3.6.27
Xfsprogs4.20.0
PPP 2.4.7
Linux C Library 2.29
Dynamic linker (ldd)2.29
Linux C++ Library   6.0.25
Procps  3.3.15
Kbd 2.0.4
Console-tools   2.0.4
Sh-utils8.31
Udev242
Modules Loaded  8021q 8250_dw ac ac97_bus acpi_thermal_rel aesni_intel 
aes_x86_64 agpgart ahci arc4 atkbd battery bbswitch bluetooth btbcm btintel 
btrtl btusb cfg80211 coretemp crc16 crc32c_generic crc32c_intel crc32_pclmul 
crct10dif_pclmul cryptd crypto_simd crypto_user drm drm_kms_helper ecdh_generic 
elan_i2c evdev ext4 fat fb_sys_fops fscrypto garp ghash_clmulni_intel 
glue_helper hid hid_generic i2c_algo_bit i2c_hid i2c_i801 i8042 i915 idma64 
input_leds int3400_thermal int3403_thermal int340x_thermal_zone intel_cstate 
intel_gtt intel_lpss intel_lpss_pci intel_pch_thermal intel_powerclamp 
intel_rapl intel_rapl_perf intel_soc_dts_iosf intel_uncore 
intel_wmi_thunderbolt ip_tables irqbypass iTCO_vendor_support iTCO_wdt jbd2 
joydev kvm kvmgt kvm_intel ledtrig_audio libahci libata libphy libps2 llc 
mac80211 mac_hid mbcache mdev media mei mei_me mousedev mrp nls_cp437 
nls_iso8859_1 pcc_cpufreq processor_thermal_device r8169 r8822be realtek rfkill 
rng_core scsi_mod serio serio_raw snd snd_compress snd_hda_codec 
snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_core 
snd_hda_ext_core snd_hda_intel snd_hwdep snd_pcm snd_pcm_dmaengine snd_soc_acpi 
snd_soc_acpi_intel_match snd_soc_core snd_soc_hdac_hda snd_soc_skl 
snd_soc_skl_ipc snd_soc_sst_dsp snd_soc_sst_ipc snd_timer soundcore stp 
syscopyarea sysfillrect sysimgblt tpm tpm_crb tpm_tis tpm_tis_core typec 
typec_ucsi ucsi_acpi usbhid uvcvideo vfat vfio vfio_iommu_type1 vfio_mdev 
videobuf2_common videobuf2_memops videobuf2_v4l2 videobuf2_vmalloc videodev wmi 
wmi_bmof x86_pkg_temp_thermal xhci_hcd xhci_pci x_tables

[8.2.] Processor information (from /proc/cpuinfo): (Maybe not appliable)
[8.3.] Module information (from /proc/modules): 

(Parts related to i2c and elan:)

i2c_algo_bit 16384 1 i915, Live 0x
i2c_hid 32768 0 - Live 0x
hid 147456 3 hid_generic,usbhid,i2c_hid, Live 0x
elan_i2c 49152 0 - Live 0x
i2c_i801 36864 0 - Live 0x

[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)

/proc/ioports:
- : PCI Bus :00
  - : dma1
  - : pic1
  - : iTCO_wdt
  - : timer0
  - : timer1
  - : keyboard
  - : PNP0C09:00
- : EC data
  - : keyboard
  - : PNP0C09:00
- : EC cmd
  - : rtc0
  - : dma page reg
  - : pic2
  - : dma2
  - : fpu
- : PNP0C04:00
  - : iTCO_wdt
  - : pnp 00:02
- : PCI conf1
- : PCI Bus :00
  - : pnp 00:02
  - : pnp 00:00
- : ACPI PM1a_EVT_BLK
- : ACPI PM1a_CNT_BLK
- : ACPI PM_TMR
- : ACPI CPU throttle
- : ACPI PM2_CNT_BLK
- : pnp 00:04
- : ACPI GPE0_BLK
  - : pnp 00:01
  - : PCI Bus :08
- : :08:00.0
  - : PCI Bus :07
- : :07:00.0
  - : r8822be
  - : PCI Bus :01
- : :01:00.0
  - : :00:02.0
  - : :00:1f.4
- : i801_smbus
  - : :00:17.0
- : ahci
  - : :00:17.0
- : ahci
  - : :00:17.0
- : ahci


[8.5.] PCI information
  It seems to be long (

Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

2019-04-29 Thread Al Viro

On Mon, Apr 29, 2019 at 10:18:04PM -0600, Andreas Dilger wrote:
> > 
> > void*i_private; /* fs or device private pointer */
> > +   void (*free_inode)(struct inode *);
> 
> It seems like a waste to increase the size of every struct inode just to 
> access
> a static pointer.  Is this the only place that ->free_inode() is called?  Why
> not move the ->free_inode() pointer into inode->i_fop->free_inode() so that it
> is still directly accessible at this point.

i_op, surely?  In any case, increasing sizeof(struct inode) is not a problem -
if anything, I'd turn ->i_fop into an anon union with that.  As in,

diff --git a/Documentation/filesystems/porting 
b/Documentation/filesystems/porting
index 9d80f9e0855e..b8d3ddd8b8db 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -655,3 +655,11 @@ in your dentry operations instead.
* if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
* combination of NULL ->destroy_inode and NULL ->free_inode is
  treated as NULL/free_inode_nonrcu, to preserve the 
compatibility.
+
+   Note that the callback (be it via ->free_inode() or explicit call_rcu()
+   in ->destroy_inode()) is *NOT* ordered wrt superblock destruction;
+   as the matter of fact, the superblock and all associated structures
+   might be already gone.  The filesystem driver is guaranteed to be still
+   there, but that's it.  Freeing memory in the callback is fine; doing
+   more than that is possible, but requires a lot of care and is best
+   avoided.
diff --git a/fs/inode.c b/fs/inode.c
index fb45590d284e..627e1766503a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -211,8 +211,8 @@ EXPORT_SYMBOL(free_inode_nonrcu);
 static void i_callback(struct rcu_head *head)
 {
struct inode *inode = container_of(head, struct inode, i_rcu);
-   if (inode->i_sb->s_op->free_inode)
-   inode->i_sb->s_op->free_inode(inode);
+   if (inode->free_inode)
+   inode->free_inode(inode);
else
free_inode_nonrcu(inode);
 }
@@ -236,6 +236,7 @@ static struct inode *alloc_inode(struct super_block *sb)
if (!ops->free_inode)
return NULL;
}
+   inode->free_inode = ops->free_inode;
i_callback(&inode->i_rcu);
return NULL;
}
@@ -276,6 +277,7 @@ static void destroy_inode(struct inode *inode)
if (!ops->free_inode)
return;
}
+   inode->free_inode = ops->free_inode;
call_rcu(&inode->i_rcu, i_callback);
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2e9b9f87caca..92732286b748 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -694,7 +694,10 @@ struct inode {
 #ifdef CONFIG_IMA
atomic_ti_readcount; /* struct files open RO */
 #endif
-   const struct file_operations*i_fop; /* former 
->i_op->default_file_ops */
+   union {
+   const struct file_operations*i_fop; /* former 
->i_op->default_file_ops */
+   void (*free_inode)(struct inode *);
+   };
struct file_lock_context*i_flctx;
struct address_spacei_data;
struct list_headi_devices;

Re: [PATCH RESEND] sched/cpufreq: Fix kobject memleak

2019-04-29 Thread Ingo Molnar



* Tobin C. Harding  wrote:

> Currently error return from kobject_init_and_add() is not followed by a
> call to kobject_put().  This means there is a memory leak.
> 
> Add call to kobject_put() in error path of kobject_init_and_add().
> 
> Signed-off-by: Tobin C. Harding 
> ---
> 
> Resend with SOB tag.

Please ignore my previous mail :-)

Thanks,

Ingo

Re: [PATCH] sched/cpufreq: Fix kobject memleak

2019-04-29 Thread Ingo Molnar



* Tobin C. Harding  wrote:

> Currently error return from kobject_init_and_add() is not followed by a
> call to kobject_put().  This means there is a memory leak.
> 
> Add call to kobject_put() in error path of kobject_init_and_add().
> ---
>  kernel/sched/cpufreq_schedutil.c | 1 +
>  1 file changed, 1 insertion(+)

I've added your:

   Signed-off-by: Tobin C. Harding 

Which I suppose you intended to include?

Thanks,

Ingo

Re: [PATCH 1/2] RISC-V: Add DT documentation for SiFive L2 Cache Controller

2019-04-29 Thread Yash Shah

On Fri, Apr 26, 2019 at 3:04 PM Sudeep Holla  wrote:
>
> On Fri, Apr 26, 2019 at 11:20:17AM +0530, Yash Shah wrote:
> > On Thu, Apr 25, 2019 at 3:43 PM Sudeep Holla  wrote:
> > >
> > > On Thu, Apr 25, 2019 at 11:24:55AM +0530, Yash Shah wrote:
> > > > Add device tree bindings for SiFive FU540 L2 cache controller driver
> > > >
> > > > Signed-off-by: Yash Shah 
> > > > ---
> > > >  .../devicetree/bindings/riscv/sifive-l2-cache.txt  | 53 
> > > > ++
> > > >  1 file changed, 53 insertions(+)
> > > >  create mode 100644 
> > > > Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt
> > > >
> > > > diff --git 
> > > > a/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt 
> > > > b/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt
> > > > new file mode 100644
> > > > index 000..15132e2
> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt
> > > > @@ -0,0 +1,53 @@
> > > > +SiFive L2 Cache Controller
> > > > +--
> > > > +The SiFive Level 2 Cache Controller is used to provide access to fast 
> > > > copies
> > > > +of memory for masters in a Core Complex. The Level 2 Cache Controller 
> > > > also
> > > > +acts as directory-based coherency manager.
> > > > +
> > > > +Required Properties:
> > > > +
> > > > +- compatible: Should be "sifive,fu540-c000-ccache"
> > > > +
> > > > +- cache-block-size: Specifies the block size in bytes of the cache
> > > > +
> > > > +- cache-level: Should be set to 2 for a level 2 cache
> > > > +
> > > > +- cache-sets: Specifies the number of associativity sets of the cache
> > > > +
> > > > +- cache-size: Specifies the size in bytes of the cache
> > > > +
> > > > +- cache-unified: Specifies the cache is a unified cache
> > > > +
> > > > +- interrupt-parent: Must be core interrupt controller
> > > > +
> > > > +- interrupts: Must contain 3 entries (DirError, DataError and DataFail 
> > > > signals)
> > > > +
> > > > +- reg: Physical base address and size of L2 cache controller registers 
> > > > map
> > > > +
> > > > +- reg-names: Should be "control"
> > > > +
> > >
> > > It would be good if you mark the properties that are present in DT
> > > specification and those that are added for sifive,fu540-c000-ccache
> >
> > I believe there isn't any property which is added explicitly for
> > sifive,fu540-c000-ccache.
> >
>
> reg and interrupts are generally optional for normal cache and may be
> required for cache controller like this. DT specification[1] covers
> only caches and not cache controllers.

Are you suggesting something like this:

Required Properties:

Standard Properties:
- compatible: Should be "sifive,-ccache"
  Supported compatible strings are:
  "sifive,fu540-c000-ccache" and "sifive,fu740-c000-ccache"

- cache-block-size: Specifies the block size in bytes of the cache

- cache-level: Should be set to 2 for a level 2 cache

- cache-sets: Specifies the number of associativity sets of the cache

- cache-size: Specifies the size in bytes of the cache

- cache-unified: Specifies the cache is a unified cache

Non-Standard Properties:
- interrupt-parent: Must be core interrupt controller

- interrupts: Must contain 3 entries for FU540 (DirError, DataError and
  DataFail signals) or 4 entries for other chips (DirError, DirFail, DataError,
  DataFail signals)

- reg: Physical base address and size of L2 cache controller registers map

- reg-names: Should be "control"

- Yash
>
> --
> Regards,
> Sudeep
>
> [1] 
> https://github.com/devicetree-org/devicetree-specification/releases/download/v0.2/devicetree-specification-v0.2.pdf

Re: [PATCH v4 1/7] ocxl: Split pci.c

2019-04-29 Thread Andrew Donnellan


On 27/3/19 4:31 pm, Alastair D'Silva wrote:

From: Alastair D'Silva 

In preparation for making core code available for external drivers,
move the core code out of pci.c and into core.c

Signed-off-by: Alastair D'Silva 


There doesn't seem to be much left in pci.c, is there?

Acked-by: Andrew Donnellan 


---
  drivers/misc/ocxl/Makefile|   1 +
  drivers/misc/ocxl/core.c  | 517 +
  drivers/misc/ocxl/ocxl_internal.h |   5 +
  drivers/misc/ocxl/pci.c   | 519 +-
  4 files changed, 524 insertions(+), 518 deletions(-)
  create mode 100644 drivers/misc/ocxl/core.c

diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile
index 5229dcda8297..bc4e39bfda7b 100644
--- a/drivers/misc/ocxl/Makefile
+++ b/drivers/misc/ocxl/Makefile
@@ -3,6 +3,7 @@ ccflags-$(CONFIG_PPC_WERROR)+= -Werror
  
  ocxl-y+= main.o pci.o config.o file.o pasid.o

  ocxl-y+= link.o context.o afu_irq.o sysfs.o 
trace.o
+ocxl-y += core.o
  obj-$(CONFIG_OCXL)+= ocxl.o
  
  # For tracepoints to include our trace.h from tracepoint infrastructure:

diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
new file mode 100644
index ..1a4411b72d35
--- /dev/null
+++ b/drivers/misc/ocxl/core.c
@@ -0,0 +1,517 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2019 IBM Corp.
+#include 
+#include "ocxl_internal.h"
+
+static struct ocxl_fn *ocxl_fn_get(struct ocxl_fn *fn)
+{
+   return (get_device(&fn->dev) == NULL) ? NULL : fn;
+}
+
+static void ocxl_fn_put(struct ocxl_fn *fn)
+{
+   put_device(&fn->dev);
+}
+
+struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu)
+{
+   return (get_device(&afu->dev) == NULL) ? NULL : afu;
+}
+
+void ocxl_afu_put(struct ocxl_afu *afu)
+{
+   put_device(&afu->dev);
+}
+
+static struct ocxl_afu *alloc_afu(struct ocxl_fn *fn)
+{
+   struct ocxl_afu *afu;
+
+   afu = kzalloc(sizeof(struct ocxl_afu), GFP_KERNEL);
+   if (!afu)
+   return NULL;
+
+   mutex_init(&afu->contexts_lock);
+   mutex_init(&afu->afu_control_lock);
+   idr_init(&afu->contexts_idr);
+   afu->fn = fn;
+   ocxl_fn_get(fn);
+   return afu;
+}
+
+static void free_afu(struct ocxl_afu *afu)
+{
+   idr_destroy(&afu->contexts_idr);
+   ocxl_fn_put(afu->fn);
+   kfree(afu);
+}
+
+static void free_afu_dev(struct device *dev)
+{
+   struct ocxl_afu *afu = to_ocxl_afu(dev);
+
+   ocxl_unregister_afu(afu);
+   free_afu(afu);
+}
+
+static int set_afu_device(struct ocxl_afu *afu, const char *location)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int rc;
+
+   afu->dev.parent = &fn->dev;
+   afu->dev.release = free_afu_dev;
+   rc = dev_set_name(&afu->dev, "%s.%s.%hhu", afu->config.name, location,
+   afu->config.idx);
+   return rc;
+}
+
+static int assign_afu_actag(struct ocxl_afu *afu, struct pci_dev *dev)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int actag_count, actag_offset;
+
+   /*
+* if there were not enough actags for the function, each afu
+* reduces its count as well
+*/
+   actag_count = afu->config.actag_supported *
+   fn->actag_enabled / fn->actag_supported;
+   actag_offset = ocxl_actag_afu_alloc(fn, actag_count);
+   if (actag_offset < 0) {
+   dev_err(&afu->dev, "Can't allocate %d actags for AFU: %d\n",
+   actag_count, actag_offset);
+   return actag_offset;
+   }
+   afu->actag_base = fn->actag_base + actag_offset;
+   afu->actag_enabled = actag_count;
+
+   ocxl_config_set_afu_actag(dev, afu->config.dvsec_afu_control_pos,
+   afu->actag_base, afu->actag_enabled);
+   dev_dbg(&afu->dev, "actag base=%d enabled=%d\n",
+   afu->actag_base, afu->actag_enabled);
+   return 0;
+}
+
+static void reclaim_afu_actag(struct ocxl_afu *afu)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int start_offset, size;
+
+   start_offset = afu->actag_base - fn->actag_base;
+   size = afu->actag_enabled;
+   ocxl_actag_afu_free(afu->fn, start_offset, size);
+}
+
+static int assign_afu_pasid(struct ocxl_afu *afu, struct pci_dev *dev)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int pasid_count, pasid_offset;
+
+   /*
+* We only support the case where the function configuration
+* requested enough PASIDs to cover all AFUs.
+*/
+   pasid_count = 1 << afu->config.pasid_supported_log;
+   pasid_offset = ocxl_pasid_afu_alloc(fn, pasid_count);
+   if (pasid_offset < 0) {
+   dev_err(&afu->dev, "Can't allocate %d PASIDs for AFU: %d\n",
+   pasid_count, pasid_offset);
+   return pasid_offset;
+   }
+   afu->pasid_base = fn->pasid_base + pasid_offset;
+   afu->pasid_count = 0;
+   afu->pasid_max = pas

Re: [PATCH V2] staging: fieldbus: anybus-s: force endiannes annotation

2019-04-29 Thread Al Viro

On Tue, Apr 30, 2019 at 05:33:10AM +0200, Nicholas Mc Guire wrote:

> ok - my bad thn - I had assumed that using __force is reasonable
> if the handling is correct and its a localized conversoin only 
> like var = be16_to_cpu(var) which evaded introducing additinal
> variables just to have different types but no different function.

If compiler can't recognize that in

T1 v1;
T2 v2;

code using v1, but not v2
v2 = f(v1);
code using v2, but not v1

it can use the same memory for v1 and v2, file a bug against the
compiler.  Or stop using that toy altogether - that kind of
optimizations is early 60s stuff and any real compiler will
handle that.  Both gcc and clang certainly do handle that.

Another thing they handle is figuring out that be16_to_cpu()
et.al. are pure functions, so

f(be16_to_cpu(n));
no modifications of n
g(be16_to_cpu(n));

doesn't need to have le16_to_cpu recalculated.  IOW, that particular
code could as well have been
dev_info(dev, "Fieldbus type: %04X", be16_to_cpu(fieldbus_type));
...
cd->client->fieldbus_type = be16_to_cpu(fieldbus_type);

... not that there's much sense keeping ->fieldbus_type in host-endian,
while we are at it.

Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

2019-04-29 Thread Andreas Dilger

On Apr 29, 2019, at 9:09 PM, Al Viro  wrote:
> 
> On Tue, Apr 16, 2019 at 11:01:16AM -0700, Linus Torvalds wrote:
>> 
>> I only skimmed through the actual filesystem (and one networking)
>> patches, but they looked like trivial conversions to a better
>> interface.
> 
> ... except that this callback can (and always could) get executed after
> freeing struct super_block.  So we can't just dereference ->i_sb->s_op
> and expect to survive; the table ->s_op pointed to will still be there,
> but ->i_sb might very well have been freed, with all its contents overwritten.
> We need to copy the callback into struct inode itself, unfortunately.
> The following incremental fixes it; I'm going to fold it into the first
> commit in there.
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index fb45590d284e..855dad43b11d 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -164,6 +164,7 @@ int inode_init_always(struct super_block *sb, struct 
> inode *inode)
>   inode->i_wb_frn_avg_time = 0;
>   inode->i_wb_frn_history = 0;
> #endif
> + inode->free_inode = sb->s_op->free_inode;
> 
>   if (security_inode_alloc(inode))
>   goto out;
> @@ -211,8 +212,8 @@ EXPORT_SYMBOL(free_inode_nonrcu);
> static void i_callback(struct rcu_head *head)
> {
>   struct inode *inode = container_of(head, struct inode, i_rcu);
> - if (inode->i_sb->s_op->free_inode)
> - inode->i_sb->s_op->free_inode(inode);
> + if (inode->free_inode)
> + inode->free_inode(inode);
>   else
>   free_inode_nonrcu(inode);
> }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 2e9b9f87caca..5ed6b39e588e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -718,6 +718,7 @@ struct inode {
> #endif
> 
>   void*i_private; /* fs or device private pointer */
> + void (*free_inode)(struct inode *);

It seems like a waste to increase the size of every struct inode just to access
a static pointer.  Is this the only place that ->free_inode() is called?  Why
not move the ->free_inode() pointer into inode->i_fop->free_inode() so that it
is still directly accessible at this point.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP

Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

2019-04-29 Thread Al Viro

On Mon, Apr 29, 2019 at 08:37:29PM -0700, Linus Torvalds wrote:
> On Mon, Apr 29, 2019, 20:09 Al Viro  wrote:
> 
> >
> > ... except that this callback can (and always could) get executed after
> > freeing struct super_block.
> >
> 
> Ugh.
> 
> That food looks nasty. Shouldn't the super block freeing wait for the
> filesystem to be all done instead? Do a rcu synchronization or something?
> 
> Adding that pointer looks really wrong to me. I'd much rather delay the sb
> freeing. Is there some reason that can't be done that I'm missing?

Where would you put that synchronize_rcu()?  Doing that before ->put_super()
is too early - inode references might be dropped in there.  OTOH, doing
that after that point means that while struct super_block itself will be
there, any number of data structures hanging from it might be not.

So we are still very limited in what we can do inside ->free_inode()
instance *and* we get bunch of synchronize_rcu() for no good reason.

Note that for normal lockless accesses (lockless ->d_revalidate(), ->d_hash(),
etc.) we are just fine with having struct super_block freeing RCU-delayed
(along with any data structures we might need) - the superblock had
been seen at some point after we'd taken rcu_read_lock(), so its
freeing won't happen until we drop it.  So we don't need synchronize_rcu()
for that.

Here the problem is that we are dealing with another RCU callback;
synchronize_rcu() would be needed for it, but it will only protect that
intermediate dereference of ->i_sb; any rcu-delayed stuff scheduled
from inside ->put_super() would not be ordered wrt ->free_inode().
And if we are doing that just for the sake of that one dereference,
we might as well do it before scheduling i_callback().

PS: we *are* guaranteed that module will still be there (unregister_filesystem()
does synchronize_rcu() and rcu_barrier() is done before kmem_cache_destroy()
in assorted exit_foo_fs()).

linux-next: manual merge of the mlx5-next tree with the rdma tree

2019-04-29 Thread Stephen Rothwell

Hi Leon,

Today's linux-next merge of the mlx5-next tree got a conflict in:

  drivers/infiniband/hw/mlx5/main.c

between commit:

  35b0aa67b298 ("RDMA/mlx5: Refactor netdev affinity code")

from the rdma tree and commit:

  c42260f19545 ("net/mlx5: Separate and generalize dma device from pci device")

from the mlx5-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/infiniband/hw/mlx5/main.c
index 6135a0b285de,fae6a6a1fbea..
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@@ -200,12 -172,18 +200,12 @@@ static int mlx5_netdev_event(struct not
  
switch (event) {
case NETDEV_REGISTER:
 +  /* Should already be registered during the load */
 +  if (ibdev->is_rep)
 +  break;
write_lock(&roce->netdev_lock);
-   if (ndev->dev.parent == &mdev->pdev->dev)
 -  if (ibdev->rep) {
 -  struct mlx5_eswitch *esw = ibdev->mdev->priv.eswitch;
 -  struct net_device *rep_ndev;
 -
 -  rep_ndev = mlx5_ib_get_rep_netdev(esw,
 -ibdev->rep->vport);
 -  if (rep_ndev == ndev)
 -  roce->netdev = ndev;
 -  } else if (ndev->dev.parent == mdev->device) {
++  if (ndev->dev.parent == mdev->device)
roce->netdev = ndev;
 -  }
write_unlock(&roce->netdev_lock);
break;
  


pgp_PtkGrXy9B.pgp
Description: OpenPGP digital signature

REVIEW NOTICE ???

2019-04-29 Thread Hans erich helmut

Dear friend ,

My name is Hans Erich Helmut .

I have a client who is interested to invest in your country, she is a well 
known politician in her country and deserve a lucrative investment partnership 
with you outside her country without any delay   Please can you manage such 
investment please Kindly reply for further details.

Yours sincerely,
Hans Erich Helmut
London,UK.

linux-next: build warning after merge of the thermal tree

2019-04-29 Thread Stephen Rothwell

Hi Zhang,

After merging the thermal tree, today's linux-next build (arm
multi_v7_defconfig) produced this warning:

boolean symbol THERMAL tested for 'm'? test forced to 'n'

Introduced by commit

  be33e4fbbea5 ("thermal/drivers/core: Remove the module Kconfig's option")

There is a test for =m in drivers/net/ethernet/mellanox/mlxsw/Kconfig.

-- 
Cheers,
Stephen Rothwell


pgppg10Zmo5Rl.pgp
Description: OpenPGP digital signature

[PATCH v6 0/4] x86: Add the support of ACRN guest under x86

2019-04-29 Thread Zhao Yakui

ACRN is a flexible, lightweight reference hypervisor, built with real-time
and safety-criticality in mind, optimized to streamline embedded development
through an open source platform. It is built for embedded IOT with small
footprint and real-time features. More details can be found
in https://projectacrn.org/

This is the patch set that allows the Linux to work on ACRN hypervisor and it 
can
work with the following patch set to manage the Linux guest on ACRN hypervisor. 
It
includes the detection of ACRN hypervisor, upcall notification vector from
hypervisor, hypercall. The hypervisor detection is similar to Xen/VMWARE/Hyperv.
ACRN also uses the upcall notification mechanism similar to that in 
Xen/Microsoft
HyperV when it needs to send the notification to Linux guest. The hypercall 
provides
the mechanism that can be used to query/configure the ACRN hypervisor by Linux 
guest.

Following this patch set, we will send acrn driver part, which provides the 
interface
that can be used to manage the virtualized CPU/memory/device/interrupt for 
other guest
OS after the ACRN hypervisor is detected.

v1->v2: Change the CONFIG_ACRN to CONFIG_ACRN_GUEST, which makes it easy to
understand.
Remove the export of x86_hyper_acrn.
Remove the unused API definition of acrn_setup_intr_handler and
acrn_remove_intr_handler.
Adjust the order of header file
Add the declaration of acrn_hv_vector_handler and tracing
definition of acrn_hv_callback_vector.
Refine the comments for the function of acrn_hypercall0/1/2

v2-v3:  Add one new config symbol to unify the conditional definition
of hv_irq_callback_count
Use the "vmcall" mnemonic to replace the hard-code byte definition
Remove the unnecessary dependency of CONFIG_PARAVIRT for ACRN_GUEST

v3-v4:  Rename the file name of acrnhyper.h to acrn.h
Refine the commit log and some other minor changes(more comments and 
redundant ifdef in acrn.h, sorting the header file in acrn.c)

v4->v5: Minor changes of comments/commit log in patch 04
Use _ASM_X86_ACRN_HYPERCALL_H instead of _ASM_X86_ACRNHYPERCALL_H.
Use the "VMCALL" mnemonic in comment/commit log.
Uppercase r8/rdi/rsi/rax for hypercall parameter register in comment.

v5->v6: Remove the explicit register variable for inline assembly
Add the "extern" for the function declaration in acrn.h
Add comments about acking ACPI EOI in acrn_hv_callback_handler
Minor changes for comments/commit log in patch 03/04


Zhao Yakui (4):
  x86/Kconfig: Add new config symbol to unify conditional definition of
hv_irq_callback_count
  x86: Add the support of Linux guest on ACRN hypervisor
  x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector
  x86/acrn: Add hypercall for ACRN guest

 arch/x86/Kconfig  | 16 +++
 arch/x86/entry/entry_64.S |  5 +++
 arch/x86/include/asm/acrn.h   | 11 +
 arch/x86/include/asm/acrn_hypercall.h | 84 +++
 arch/x86/include/asm/hardirq.h|  2 +-
 arch/x86/include/asm/hypervisor.h |  1 +
 arch/x86/kernel/cpu/Makefile  |  1 +
 arch/x86/kernel/cpu/acrn.c| 68 
 arch/x86/kernel/cpu/hypervisor.c  |  4 ++
 arch/x86/kernel/irq.c |  2 +-
 arch/x86/xen/Kconfig  |  1 +
 drivers/hv/Kconfig|  1 +
 12 files changed, 194 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/include/asm/acrn.h
 create mode 100644 arch/x86/include/asm/acrn_hypercall.h
 create mode 100644 arch/x86/kernel/cpu/acrn.c

-- 
2.7.4

[PATCH v6 1/4] x86/Kconfig: Add new config symbol to unify conditional definition of hv_irq_callback_count

2019-04-29 Thread Zhao Yakui

Add a special Kconfig symbol X86_HV_CALLBACK_VECTOR so that the guests
using the hypervisor interrupt callback counter can select and thus
enable that counter. Select it when xen or hyperv support is enabled.
No functional changes.

Signed-off-by: Zhao Yakui 
Reviewed-by: Borislav Petkov 
Reviewed-by: Thomas Gleixner 
---
v3->v4: Follow the comments to refine the commit log.
---
 arch/x86/Kconfig   | 3 +++
 arch/x86/include/asm/hardirq.h | 2 +-
 arch/x86/kernel/irq.c  | 2 +-
 arch/x86/xen/Kconfig   | 1 +
 drivers/hv/Kconfig | 1 +
 5 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 62fc3fd..2fc9297 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -791,6 +791,9 @@ config QUEUED_LOCK_STAT
  behavior of paravirtualized queued spinlocks and report
  them on debugfs.
 
+config X86_HV_CALLBACK_VECTOR
+   def_bool n
+
 source "arch/x86/xen/Kconfig"
 
 config KVM_GUEST
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index d9069bb..0753379 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -37,7 +37,7 @@ typedef struct {
 #ifdef CONFIG_X86_MCE_AMD
unsigned int irq_deferred_error_count;
 #endif
-#if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
+#ifdef CONFIG_X86_HV_CALLBACK_VECTOR
unsigned int irq_hv_callback_count;
 #endif
 #if IS_ENABLED(CONFIG_HYPERV)
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 59b5f2e..a147826 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -134,7 +134,7 @@ int arch_show_interrupts(struct seq_file *p, int prec)
seq_printf(p, "%10u ", per_cpu(mce_poll_count, j));
seq_puts(p, "  Machine check polls\n");
 #endif
-#if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
+#ifdef CONFIG_X86_HV_CALLBACK_VECTOR
if (test_bit(HYPERVISOR_CALLBACK_VECTOR, system_vectors)) {
seq_printf(p, "%*s: ", prec, "HYP");
for_each_online_cpu(j)
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index e07abef..ba5a418 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -7,6 +7,7 @@ config XEN
bool "Xen guest support"
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select X86_HV_CALLBACK_VECTOR
depends on X86_64 || (X86_32 && X86_PAE)
depends on X86_LOCAL_APIC && X86_TSC
help
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 1c1a251..cafcb97 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -6,6 +6,7 @@ config HYPERV
tristate "Microsoft Hyper-V client drivers"
depends on X86 && ACPI && X86_LOCAL_APIC && HYPERVISOR_GUEST
select PARAVIRT
+   select X86_HV_CALLBACK_VECTOR
help
  Select this option to run Linux as a Hyper-V client operating
  system.
-- 
2.7.4

[PATCH v6 4/4] x86/acrn: Add hypercall for ACRN guest

2019-04-29 Thread Zhao Yakui

When the ACRN hypervisor is detected, the hypercall is needed so that the
ACRN guest can query/config some settings. For example: it can be used
to query the resources in hypervisor and manage the CPU/memory/device/
interrupt for guest operating system.

Add the hypercall so that the ACRN guest can communicate with the
low-level ACRN hypervisor. On x86 it is implemented with the VMCALL
instruction.

Co-developed-by: Jason Chen CJ 
Signed-off-by: Jason Chen CJ 
Signed-off-by: Zhao Yakui 
Reviewed-by: Thomas Gleixner 
---
V1->V2: Refine the comments for the function of acrn_hypercall0/1/2
v2->v3: Use the "vmcall" mnemonic to replace hard-code byte definition
v4->v5: Use _ASM_X86_ACRN_HYPERCALL_H instead of _ASM_X86_ACRNHYPERCALL_H.
Use the "VMCALL" mnemonic in comment/commit log.
Uppercase r8/rdi/rsi/rax for hypercall parameter register in comment.
v5->v6: Remove explicit local register variable for inline assembly
---
 arch/x86/include/asm/acrn_hypercall.h | 84 +++
 1 file changed, 84 insertions(+)
 create mode 100644 arch/x86/include/asm/acrn_hypercall.h

diff --git a/arch/x86/include/asm/acrn_hypercall.h 
b/arch/x86/include/asm/acrn_hypercall.h
new file mode 100644
index 000..5cb438e
--- /dev/null
+++ b/arch/x86/include/asm/acrn_hypercall.h
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_X86_ACRN_HYPERCALL_H
+#define _ASM_X86_ACRN_HYPERCALL_H
+
+#include 
+
+#ifdef CONFIG_ACRN_GUEST
+
+/*
+ * Hypercalls for ACRN guest
+ *
+ * Hypercall number is passed in R8 register.
+ * Up to 2 arguments are passed in RDI, RSI.
+ * Return value will be placed in RAX.
+ */
+
+static inline long acrn_hypercall0(unsigned long hcall_id)
+{
+   long result;
+
+   /* the hypercall is implemented with the VMCALL instruction.
+* volatile qualifier is added to avoid that it is dropped
+* because of compiler optimization.
+*/
+   asm volatile("movq %[hcall_id], %%r8\n\t"
+"vmcall\n\t"
+: "=a" (result)
+: [hcall_id] "g" (hcall_id)
+: "r8");
+
+   return result;
+}
+
+static inline long acrn_hypercall1(unsigned long hcall_id,
+  unsigned long param1)
+{
+   long result;
+
+   asm volatile("movq %[hcall_id], %%r8\n\t"
+"vmcall\n\t"
+: "=a" (result)
+: [hcall_id] "g" (hcall_id), "D" (param1)
+: "r8");
+
+   return result;
+}
+
+static inline long acrn_hypercall2(unsigned long hcall_id,
+  unsigned long param1,
+  unsigned long param2)
+{
+   long result;
+
+   asm volatile("movq %[hcall_id], %%r8\n\t"
+"vmcall\n\t"
+: "=a" (result)
+: [hcall_id] "g" (hcall_id), "D" (param1), "S" (param2)
+: "r8");
+
+   return result;
+}
+
+#else
+
+static inline long acrn_hypercall0(unsigned long hcall_id)
+{
+   return -ENOTSUPP;
+}
+
+static inline long acrn_hypercall1(unsigned long hcall_id,
+  unsigned long param1)
+{
+   return -ENOTSUPP;
+}
+
+static inline long acrn_hypercall2(unsigned long hcall_id,
+  unsigned long param1,
+  unsigned long param2)
+{
+   return -ENOTSUPP;
+}
+#endif /* CONFIG_ACRN_GUEST */
+#endif /* _ASM_X86_ACRN_HYPERCALL_H */
-- 
2.7.4

[PATCH v6 2/4] x86: Add the support of Linux guest on ACRN hypervisor

2019-04-29 Thread Zhao Yakui

ACRN is an open-source hypervisor maintained by Linux Foundation.
It is built for embedded IOT with small footprint and real-time features.
Add the ACRN guest support so that it allows linux to be booted under the
ACRN hypervisor. Following this patch it will setup the upcall
notification vector, enable hypercall and provide the interface that is
used to manage the virtualized CPU/memory/device/interrupt for other
guest OS.

Co-developed-by: Jason Chen CJ 
Signed-off-by: Jason Chen CJ 
Signed-off-by: Zhao Yakui 
Reviewed-by: Thomas Gleixner 
---
v1->v2: Change the CONFIG_ACRN to CONFIG_ACRN_GUEST, which makes it easy to
understand.
Remove the export of x86_hyper_acrn.

v2->v3: Remove the unnecessary dependency of PARAVIRT
v3->v4: Refine the commit log and add more meaningful description in Kconfig
v4->v5: No change
v5->v6: No change
---
 arch/x86/Kconfig  | 12 
 arch/x86/include/asm/hypervisor.h |  1 +
 arch/x86/kernel/cpu/Makefile  |  1 +
 arch/x86/kernel/cpu/acrn.c| 39 +++
 arch/x86/kernel/cpu/hypervisor.c  |  4 
 5 files changed, 57 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/acrn.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2fc9297..8dc4200 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -845,6 +845,18 @@ config JAILHOUSE_GUEST
  cell. You can leave this option disabled if you only want to start
  Jailhouse and run Linux afterwards in the root cell.
 
+config ACRN_GUEST
+   bool "ACRN Guest support"
+   depends on X86_64
+   help
+ This option allows to run Linux as guest in ACRN hypervisor. Enabling
+ this will allow the kernel to boot in virtualized environment under
+ the ACRN hypervisor.
+ ACRN is a flexible, lightweight reference open-source hypervisor, 
built
+ with real-time and safety-criticality in mind. It is built for 
embedded
+ IOT with small footprint and real-time features. More details can be
+ found in https://projectacrn.org/
+
 endif #HYPERVISOR_GUEST
 
 source "arch/x86/Kconfig.cpu"
diff --git a/arch/x86/include/asm/hypervisor.h 
b/arch/x86/include/asm/hypervisor.h
index 8c5aaba..50a30f6 100644
--- a/arch/x86/include/asm/hypervisor.h
+++ b/arch/x86/include/asm/hypervisor.h
@@ -29,6 +29,7 @@ enum x86_hypervisor_type {
X86_HYPER_XEN_HVM,
X86_HYPER_KVM,
X86_HYPER_JAILHOUSE,
+   X86_HYPER_ACRN,
 };
 
 #ifdef CONFIG_HYPERVISOR_GUEST
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index cfd24f9..17a7cdf 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -44,6 +44,7 @@ obj-$(CONFIG_X86_CPU_RESCTRL) += resctrl/
 obj-$(CONFIG_X86_LOCAL_APIC)   += perfctr-watchdog.o
 
 obj-$(CONFIG_HYPERVISOR_GUEST) += vmware.o hypervisor.o mshyperv.o
+obj-$(CONFIG_ACRN_GUEST)   += acrn.o
 
 ifdef CONFIG_X86_FEATURE_NAMES
 quiet_cmd_mkcapflags = MKCAP   $@
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
new file mode 100644
index 000..f556640
--- /dev/null
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ACRN detection support
+ *
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ *
+ * Jason Chen CJ 
+ * Zhao Yakui 
+ *
+ */
+
+#include 
+
+static uint32_t __init acrn_detect(void)
+{
+   return hypervisor_cpuid_base("ACRNACRNACRN\0\0", 0);
+}
+
+static void __init acrn_init_platform(void)
+{
+}
+
+static bool acrn_x2apic_available(void)
+{
+   /* x2apic is not supported now.
+* Later it needs to check the X86_FEATURE_X2APIC bit of cpu info
+* returned by CPUID to determine whether the x2apic is
+* supported in Linux guest.
+*/
+   return false;
+}
+
+const __initconst struct hypervisor_x86 x86_hyper_acrn = {
+   .name   = "ACRN",
+   .detect = acrn_detect,
+   .type   = X86_HYPER_ACRN,
+   .init.init_platform = acrn_init_platform,
+   .init.x2apic_available  = acrn_x2apic_available,
+};
diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c
index 479ca47..87e39ad 100644
--- a/arch/x86/kernel/cpu/hypervisor.c
+++ b/arch/x86/kernel/cpu/hypervisor.c
@@ -32,6 +32,7 @@ extern const struct hypervisor_x86 x86_hyper_xen_pv;
 extern const struct hypervisor_x86 x86_hyper_xen_hvm;
 extern const struct hypervisor_x86 x86_hyper_kvm;
 extern const struct hypervisor_x86 x86_hyper_jailhouse;
+extern const struct hypervisor_x86 x86_hyper_acrn;
 
 static const __initconst struct hypervisor_x86 * const hypervisors[] =
 {
@@ -49,6 +50,9 @@ static const __initconst struct hypervisor_x86 * const 
hypervisors[] =
 #ifdef CONFIG_JAILHOUSE_GUEST
&x86_hyper_jailhouse,
 #endif
+#ifdef CONFIG_ACRN_GUEST
+   &x86_hyper_acrn,
+#endif
 };
 
 enum x86_hypervisor_type x86_hyper_type;
-- 
2.7

[PATCH v6 3/4] x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector

2019-04-29 Thread Zhao Yakui

Linux kernel uses the HYPERVISOR_CALLBACK_VECTOR for hypervisor upcall
vector. It is already used for Xen and HyperV.
After the ACRN hypervisor is detected, it will also use this defined
vector to notify the ACRN guest.

Co-developed-by: Jason Chen CJ 
Signed-off-by: Jason Chen CJ 
Signed-off-by: Zhao Yakui 
Reviewed-by: Thomas Gleixner 
---
V1->V2: Remove the unused API definition of acrn_setup_intr_handler and
acrn_remove_intr_handler.
Adjust the order of header file
Add the declaration of acrn_hv_vector_handler and tracing
definition of acrn_hv_callback_vector.

v2->v3: No change
v3->v4: Refine the file name of acrnhyper.h to acrn.h
v5->v6: Add the "extern" for the function declarations in header file
Add some comments for calling entering_ack_irq
Some other minor changes(unnecessary spliting two lines.
and minor change in commit log)
---
 arch/x86/Kconfig|  1 +
 arch/x86/entry/entry_64.S   |  5 +
 arch/x86/include/asm/acrn.h | 11 +++
 arch/x86/kernel/cpu/acrn.c  | 29 +
 4 files changed, 46 insertions(+)
 create mode 100644 arch/x86/include/asm/acrn.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8dc4200..d7a10f6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -848,6 +848,7 @@ config JAILHOUSE_GUEST
 config ACRN_GUEST
bool "ACRN Guest support"
depends on X86_64
+   select X86_HV_CALLBACK_VECTOR
help
  This option allows to run Linux as guest in ACRN hypervisor. Enabling
  this will allow the kernel to boot in virtualized environment under
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1f0efdb..d1b8ad3 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1129,6 +1129,11 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
+#if IS_ENABLED(CONFIG_ACRN_GUEST)
+apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
+   acrn_hv_callback_vector acrn_hv_vector_handler
+#endif
+
 idtentry debug do_debughas_error_code=0
paranoid=1 shift_ist=DEBUG_STACK
 idtentry int3  do_int3 has_error_code=0
 idtentry stack_segment do_stack_segmenthas_error_code=1
diff --git a/arch/x86/include/asm/acrn.h b/arch/x86/include/asm/acrn.h
new file mode 100644
index 000..4adb13f
--- /dev/null
+++ b/arch/x86/include/asm/acrn.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_ACRN_H
+#define _ASM_X86_ACRN_H
+
+extern void acrn_hv_callback_vector(void);
+#ifdef CONFIG_TRACING
+#define trace_acrn_hv_callback_vector acrn_hv_callback_vector
+#endif
+
+extern void acrn_hv_vector_handler(struct pt_regs *regs);
+#endif /* _ASM_X86_ACRN_H */
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index f556640..ce88d2d 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -9,7 +9,11 @@
  *
  */
 
+#include 
+#include 
+#include 
 #include 
+#include 
 
 static uint32_t __init acrn_detect(void)
 {
@@ -18,6 +22,8 @@ static uint32_t __init acrn_detect(void)
 
 static void __init acrn_init_platform(void)
 {
+   /* Setup the IDT for ACRN hypervisor callback */
+   alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, acrn_hv_callback_vector);
 }
 
 static bool acrn_x2apic_available(void)
@@ -30,6 +36,29 @@ static bool acrn_x2apic_available(void)
return false;
 }
 
+static void (*acrn_intr_handler)(void);
+
+__visible void __irq_entry acrn_hv_vector_handler(struct pt_regs *regs)
+{
+   struct pt_regs *old_regs = set_irq_regs(regs);
+
+   /*
+* The hypervisor requires that the APIC EOI should be acked.
+* If the APIC EOI is not acked, the APIC ISR bit for the
+* HYPERVISOR_CALLBACK_VECTOR will not be cleared and then it
+* will block the interrupt whose vector is lower than
+* HYPERVISOR_CALLBACK_VECTOR.
+*/
+   entering_ack_irq();
+   inc_irq_stat(irq_hv_callback_count);
+
+   if (acrn_intr_handler)
+   acrn_intr_handler();
+
+   exiting_irq();
+   set_irq_regs(old_regs);
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_acrn = {
.name   = "ACRN",
.detect = acrn_detect,
-- 
2.7.4

[PATCH] drivers: thermal: processor_thermal: Read PPCC on resume

2019-04-29 Thread Srinivas Pandruvada

Read PPCC power limits on system resume in case those limits changed
while system was suspended.

Signed-off-by: Srinivas Pandruvada 
---
 .../int340x_thermal/processor_thermal_device.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c 
b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
index 436c256f111d..acb22157b9ac 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
@@ -465,6 +465,18 @@ static void  proc_thermal_pci_remove(struct pci_dev *pdev)
pci_disable_device(pdev);
 }
 
+static int proc_thermal_resume(struct device *dev)
+{
+   struct proc_thermal_device *proc_dev;
+
+   proc_dev = dev_get_drvdata(dev);
+   proc_thermal_read_ppcc(proc_dev);
+
+   return 0;
+}
+
+static SIMPLE_DEV_PM_OPS(proc_thermal_pm, NULL, proc_thermal_resume);
+
 static const struct pci_device_id proc_thermal_pci_ids[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_PROC_BDW_THERMAL)},
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_PROC_HSB_THERMAL)},
@@ -489,6 +501,7 @@ static struct pci_driver proc_thermal_pci_driver = {
.probe  = proc_thermal_pci_probe,
.remove = proc_thermal_pci_remove,
.id_table   = proc_thermal_pci_ids,
+   .driver.pm  = &proc_thermal_pm,
 };
 
 static const struct acpi_device_id int3401_device_ids[] = {
@@ -503,6 +516,7 @@ static struct platform_driver int3401_driver = {
.driver = {
.name = "int3401 thermal",
.acpi_match_table = int3401_device_ids,
+   .pm = &proc_thermal_pm,
},
 };
 
-- 
2.17.2

[PATCH] drivers: thermal: processor_thermal: Downgrade error message

2019-04-29 Thread Srinivas Pandruvada

Downgrade "Unsupported event" message from dev_err to dev_dbg. Otherwise it
floods with this message one some platforms.

Signed-off-by: Srinivas Pandruvada 
---
 .../thermal/intel/int340x_thermal/processor_thermal_device.c| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c 
b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
index 4b206b594825..436c256f111d 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
@@ -275,7 +275,7 @@ static void proc_thermal_notify(acpi_handle handle, u32 
event, void *data)
THERMAL_DEVICE_POWER_CAPABILITY_CHANGED);
break;
default:
-   dev_err(proc_priv->dev, "Unsupported event [0x%x]\n", event);
+   dev_dbg(proc_priv->dev, "Unsupported event [0x%x]\n", event);
break;
}
 }
-- 
2.17.2

Re: [PATCH V2] staging: fieldbus: anybus-s: force endiannes annotation

2019-04-29 Thread Nicholas Mc Guire

On Tue, Apr 30, 2019 at 04:02:23AM +0100, Al Viro wrote:
> On Tue, Apr 30, 2019 at 04:22:38AM +0200, Nicholas Mc Guire wrote:
> > On Mon, Apr 29, 2019 at 10:03:36AM -0400, Sven Van Asbroeck wrote:
> > > On Mon, Apr 29, 2019 at 2:11 AM Nicholas Mc Guire  
> > > wrote:
> > > >
> > > > V2: As requested by Sven Van Asbroeck  make the
> > > > impact of the patch clear in the commit message.
> > > 
> > > Thank you, but did you miss my comment about creating a local variable
> > > instead? See:
> > > https://lkml.org/lkml/2019/4/28/97
> > 
> > Did not miss it - I just don't think that makes it any more
> > understandable - the __force __be16 makes it clear I believe
> > that this is correct, sparse does not like this though - so tell
> > sparse.
> 
> ... to STFU, 'cause you know better.  The trouble is, how do we
> (or yourself a year or two later) know *why* it is correct?
> Worse, how do we (or yourself, etc.) know if a change about to be
> done to the code won't invalidate the proof of yours?
> 
> > The local variable would need to be explained as it is
> > functionally not necessary - therefor I find it more confusing
> > that using  __force here.
> 
> What's confusing is mixing host- and fixed-endian values in the
> same variable at different times.  Treat those as unrelated
> types that happen to have the same sizeof.
> 
> Quite a few of __force instances in the tree should be taken out
> and shot.  Don't add to their number.

ok - my bad thn - I had assumed that using __force is reasonable
if the handling is correct and its a localized conversoin only 
like var = be16_to_cpu(var) which evaded introducing additinal
variables just to have different types but no different function.
But the long-term issue of hiding bugs by __force makes sesne to
me - will give it another shot at scripting this in coccinelle.

thx!
hofrat

Re: [PATCH 2/2] memcg, fsnotify: no oom-kill for remote memcg charging

2019-04-29 Thread Shakeel Butt

On Mon, Apr 29, 2019 at 5:41 PM Michal Hocko  wrote:
>
> On Mon 29-04-19 10:13:32, Shakeel Butt wrote:
> [...]
> >   /*
> >* For queues with unlimited length lost events are not expected and
> >* can possibly have security implications. Avoid losing events when
> >* memory is short.
> > +  *
> > +  * Note: __GFP_NOFAIL takes precedence over __GFP_RETRY_MAYFAIL.
> >*/
>
> No, I there is no rule like that. Combining the two is undefined
> currently and I do not think we want to legitimize it. What does it even
> mean?
>

Actually the code is doing that but I agree this is not documented and
weird. I will fix this.

Shakeel

Re: [PATCH] riscv: Support non-coherency memory model

2019-04-29 Thread Guo Ren

On Mon, Apr 29, 2019 at 01:11:43PM -0700, Palmer Dabbelt wrote:
> On Mon, 22 Apr 2019 08:44:30 PDT (-0700), guo...@kernel.org wrote:
> >From: Guo Ren 
> >
> >The current riscv linux implementation requires SOC system to support
> >memory coherence between all I/O devices and CPUs. But some SOC systems
> >cannot maintain the coherence and they need support cache clean/invalid
> >operations to synchronize data.
> >
> >Current implementation is no problem with SiFive FU540, because FU540
> >keeps all IO devices and DMA master devices coherence with CPU. But to a
> >traditional SOC vendor, it may already have a stable non-coherency SOC
> >system, the need is simply to replace the CPU with RV CPU and rebuild
> >the whole system with IO-coherency is very expensive.
> >
> >So we should make riscv linux also support non-coherency memory model.
> >Here are the two points that riscv linux needs to be modified:
> >
> > - Add _PAGE_COHERENCY bit in current page table entry attributes. The bit
> >   designates a coherence for this page mapping. Software set the bit to
> >   tell the hardware that the region of the page's memory area must be
> >   coherent with IOs devices in SOC system by PMA settings.
> >   If IOs and CPU are already coherent in SOC system, CPU just ignore
> >   this bit.
> >
> >   PTE format:
> >   | XLEN-1  10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0
> > PFN  C  RSW  D   A   G   U   X   W   R   V
> >  ^
> >   BIT(9): Coherence attribute bit
> >  0: hardware needn't keep the page coherenct and software will
> > maintain the coherence with cache clear/invalid operations.
> >  1: hardware must keep the page coherenct and software needn't
> > maintain the coherence.
> >   BIT(8): Reserved for software and now it's _PAGE_SPECIAL in linux
> >
> >   Add a new hardware bit in PTE also need to modify Privileged
> >   Architecture Supervisor-Level ISA:
> >   https://github.com/riscv/riscv-isa-manual/pull/374
> 
> This is a RISC-V ISA modification, which isn't really appropriate to suggest 
> on
> the kernel mailing lists.  The right place to talk about this is at the RISC-V
> foundation, which owns the ISA -- we can't change the hardware with a patch to
> Linux :).
I just want a discussion and a wide discussion is good for all of us :)

> 
> > - Add SBI_FENCE_DMA 9 in riscv-sbi.
> >   sbi_fence_dma(start, size, dir) could synchronize CPU cache data with
> >   DMA device in non-coherency memory model. The third param's definition
> >   is the same with linux's in include/linux/dma-direction.h:
> >
> >   enum dma_data_direction {
> > DMA_BIDIRECTIONAL = 0,
> > DMA_TO_DEVICE = 1,
> > DMA_FROM_DEVICE = 2,
> > DMA_NONE = 3,
> >   };
> >
> >   The first param:start must be physical address which could be handled
> >   in M-state.
> >
> >   Here is a pull request to the riscv-sbi-doc:
> >   https://github.com/riscv/riscv-sbi-doc/pull/15
> >
> >We have tested the patch on our fpga SOC system which network controller
> >connected to a non-cache-coherency interconnect in and it couldn't work
> >without the patch.
> >
> >There is no side effect for FU540 whose CPU don't care _PAGE_COHERENCY
> >in PTE, but FU540's bbl also need to implement a simple sbi_fence_dma
> >by directly return. In fact, if you give a correct configuration for
> >dev_is_dma_conherent(), linux dma framework wouldn't call sbi_fence_dma
> >any more.
> 
> Non-coherent fences also need to be discussed as part of a RISC-V ISA
   ^^
  fences instructions? not page attributes?
> extension.  
> I know people have expressed interest, but I don't know of a
> working group that's already been set up.
Is that mean current RISC-V ISA forces the SOC to be coherent memory model?

Best Regards
 Guo Ren

Re: INFO: task hung in __get_super

2019-04-29 Thread Al Viro

On Tue, Apr 30, 2019 at 04:55:01AM +0200, Jan Kara wrote:

> Yeah, you're right. And if we push the patch a bit further to not take
> loop_ctl_mutex for invalid ioctl number, that would fix the problem. I
> can send a fix.

Huh?  We don't take it until in lo_simple_ioctl(), and that patch doesn't
get to its call on invalid ioctl numbers.  What am I missing here?

[RFC PATCH v4 15/15] dcache: Add CONFIG_DCACHE_SMO

2019-04-29 Thread Tobin C. Harding

In an attempt to make the SMO patchset as non-invasive as possible add a
config option CONFIG_DCACHE_SMO (under "Memory Management options") for
enabling SMO for the DCACHE.  Whithout this option dcache constructor is
used but no other code is built in, with this option enabled slab
mobility is enabled and the isolate/migrate functions are built in.

Add CONFIG_DCACHE_SMO to guard the partial shrinking of the dcache via
Slab Movable Objects infrastructure.

Signed-off-by: Tobin C. Harding 
---
 fs/dcache.c | 4 
 mm/Kconfig  | 7 +++
 2 files changed, 11 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 3f9daba1cc78..9edce104613b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3068,6 +3068,7 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode)
 }
 EXPORT_SYMBOL(d_tmpfile);
 
+#ifdef CONFIG_DCACHE_SMO
 /*
  * d_isolate() - Dentry isolation callback function.
  * @s: The dentry cache.
@@ -3140,6 +3141,7 @@ static void d_partial_shrink(struct kmem_cache *s, void 
**_unused, int __unused,
 
kfree(private);
 }
+#endif /* CONFIG_DCACHE_SMO */
 
 static __initdata unsigned long dhash_entries;
 static int __init set_dhash_entries(char *str)
@@ -3186,7 +3188,9 @@ static void __init dcache_init(void)
   sizeof_field(struct dentry, d_iname),
   dcache_ctor);
 
+#ifdef CONFIG_DCACHE_SMO
kmem_cache_setup_mobility(dentry_cache, d_isolate, d_partial_shrink);
+#endif
 
/* Hash may have been set up in dcache_init_early */
if (!hashdist)
diff --git a/mm/Kconfig b/mm/Kconfig
index 47040d939f3b..92fc27ad3472 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -265,6 +265,13 @@ config SMO_NODE
help
  On NUMA systems enable moving objects to and from a specified node.
 
+config DCACHE_SMO
+   bool "Enable Slab Movable Objects for the dcache"
+   depends on SLUB
+   help
+ Under memory pressure we can try to free dentry slab cache objects 
from
+ the partial slab list if this is enabled.
+
 config PHYS_ADDR_T_64BIT
def_bool 64BIT
 
-- 
2.21.0

[RFC PATCH v4 13/15] dcache: Provide a dentry constructor

2019-04-29 Thread Tobin C. Harding

In order to support object migration on the dentry cache we need to have
a determined object state at all times. Without a constructor the object
would have a random state after allocation.

Provide a dentry constructor.

Signed-off-by: Tobin C. Harding 
---
 fs/dcache.c | 30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index aac41adf4743..3d6cc06eca56 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1603,6 +1603,16 @@ void d_invalidate(struct dentry *dentry)
 }
 EXPORT_SYMBOL(d_invalidate);
 
+static void dcache_ctor(void *p)
+{
+   struct dentry *dentry = p;
+
+   /* Mimic lockref_mark_dead() */
+   dentry->d_lockref.count = -128;
+
+   spin_lock_init(&dentry->d_lock);
+}
+
 /**
  * __d_alloc   -   allocate a dcache entry
  * @sb: filesystem it will belong to
@@ -1658,7 +1668,6 @@ struct dentry *__d_alloc(struct super_block *sb, const 
struct qstr *name)
 
dentry->d_lockref.count = 1;
dentry->d_flags = 0;
-   spin_lock_init(&dentry->d_lock);
seqcount_init(&dentry->d_seq);
dentry->d_inode = NULL;
dentry->d_parent = dentry;
@@ -3091,14 +3100,17 @@ static void __init dcache_init_early(void)
 
 static void __init dcache_init(void)
 {
-   /*
-* A constructor could be added for stable state like the lists,
-* but it is probably not worth it because of the cache nature
-* of the dcache.
-*/
-   dentry_cache = KMEM_CACHE_USERCOPY(dentry,
-   SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
-   d_iname);
+   slab_flags_t flags =
+   SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | SLAB_MEM_SPREAD | 
SLAB_ACCOUNT;
+
+   dentry_cache =
+   kmem_cache_create_usercopy("dentry",
+  sizeof(struct dentry),
+  __alignof__(struct dentry),
+  flags,
+  offsetof(struct dentry, d_iname),
+  sizeof_field(struct dentry, d_iname),
+  dcache_ctor);
 
/* Hash may have been set up in dcache_init_early */
if (!hashdist)
-- 
2.21.0

[RFC PATCH v4 11/15] slub: Enable moving objects to/from specific nodes

2019-04-29 Thread Tobin C. Harding

We have just implemented Slab Movable Objects (object migration).
Currently object migration is used to defrag a cache.  On NUMA systems
it would be nice to be able to control the source and destination nodes
when moving objects.

Add CONFIG_SMO_NODE to guard this feature.  CONFIG_SMO_NODE depends on
CONFIG_SLUB_DEBUG because we use the full list.  Leave it like this for
the RFC because the patch will be less cluttered to review, separate
full list out of CONFIG_DEBUG before doing a PATCH version.

Implement moving all objects (including those in full slabs) to a
specific node.  Expose this functionality to userspace via a sysfs entry.

Add sysfs entry:

   /sysfs/kernel/slab//move

With this users get access to the following functionality:

 - Move all objects to specified node.

echo "N1" > move

 - Move all objects from specified node to other specified
   node (from N1 -> to N2):

echo "N1 N2" > move

This also enables shrinking slabs on a specific node:

echo "N1 N1" > move

Signed-off-by: Tobin C. Harding 
---
 mm/Kconfig |   7 ++
 mm/slub.c  | 249 +
 2 files changed, 256 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 25c71eb8a7db..47040d939f3b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -258,6 +258,13 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
 config ARCH_ENABLE_THP_MIGRATION
bool
 
+config SMO_NODE
+   bool "Enable per node control of Slab Movable Objects"
+   depends on SLUB && SYSFS
+   select SLUB_DEBUG
+   help
+ On NUMA systems enable moving objects to and from a specified node.
+
 config PHYS_ADDR_T_64BIT
def_bool 64BIT
 
diff --git a/mm/slub.c b/mm/slub.c
index e601c804ed79..e4f3dde443f5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4345,6 +4345,106 @@ static void move_slab_page(struct page *page, void 
*scratch, int node)
s->migrate(s, vector, count, node, private);
 }
 
+#ifdef CONFIG_SMO_NODE
+/*
+ * kmem_cache_move() - Attempt to move all slab objects.
+ * @s: The cache we are working on.
+ * @node: The node to move objects away from.
+ * @target_node: The node to move objects on to.
+ *
+ * Attempts to move all objects (partial slabs and full slabs) to target
+ * node.
+ *
+ * Context: Takes the list_lock.
+ * Return: The number of slabs remaining on node.
+ */
+static unsigned long kmem_cache_move(struct kmem_cache *s,
+int node, int target_node)
+{
+   struct kmem_cache_node *n = get_node(s, node);
+   LIST_HEAD(move_list);
+   struct page *page, *page2;
+   unsigned long flags;
+   void **scratch;
+
+   if (!s->migrate) {
+   pr_warn("%s SMO not enabled, cannot move objects\n", s->name);
+   goto out;
+   }
+
+   scratch = alloc_scratch(s);
+   if (!scratch)
+   goto out;
+
+   spin_lock_irqsave(&n->list_lock, flags);
+
+   list_for_each_entry_safe(page, page2, &n->partial, lru) {
+   if (!slab_trylock(page))
+   /* Busy slab. Get out of the way */
+   continue;
+
+   if (page->inuse) {
+   list_move(&page->lru, &move_list);
+   /* Stop page being considered for allocations */
+   n->nr_partial--;
+   page->frozen = 1;
+
+   slab_unlock(page);
+   } else {/* Empty slab page */
+   list_del(&page->lru);
+   n->nr_partial--;
+   slab_unlock(page);
+   discard_slab(s, page);
+   }
+   }
+   list_for_each_entry_safe(page, page2, &n->full, lru) {
+   if (!slab_trylock(page))
+   continue;
+
+   list_move(&page->lru, &move_list);
+   page->frozen = 1;
+   slab_unlock(page);
+   }
+
+   spin_unlock_irqrestore(&n->list_lock, flags);
+
+   list_for_each_entry(page, &move_list, lru) {
+   if (page->inuse)
+   move_slab_page(page, scratch, target_node);
+   }
+   kfree(scratch);
+
+   /* Bail here to save taking the list_lock */
+   if (list_empty(&move_list))
+   goto out;
+
+   /* Inspect results and dispose of pages */
+   spin_lock_irqsave(&n->list_lock, flags);
+   list_for_each_entry_safe(page, page2, &move_list, lru) {
+   list_del(&page->lru);
+   slab_lock(page);
+   page->frozen = 0;
+
+   if (page->inuse) {
+   if (page->inuse == page->objects) {
+   list_add(&page->lru, &n->full);
+   slab_unlock(page);
+   } else {
+   n->nr_partial++;
+   list_add_tail(&page->lru, &n->partial);
+   slab_

[RFC PATCH v4 12/15] slub: Enable balancing slabs across nodes

2019-04-29 Thread Tobin C. Harding

We have just implemented Slab Movable Objects (SMO).  On NUMA systems
slabs can become unbalanced i.e. many slabs on one node while other
nodes have few slabs.  Using SMO we can balance the slabs across all
the nodes.

The algorithm used is as follows:

 1. Move all objects to node 0 (this has the effect of defragmenting the
cache).

 2. Calculate the desired number of slabs for each node (this is done
using the approximation nr_slabs / nr_nodes).

 3. Loop over the nodes moving the desired number of slabs from node 0
to the node.

Feature is conditionally built in with CONFIG_SMO_NODE, this is because
we need the full list (we enable SLUB_DEBUG to get this).  Future
version may separate final list out of SLUB_DEBUG.

Expose this functionality to userspace via a sysfs entry.  Add sysfs
entry:

   /sysfs/kernel/slab//balance

Write of '1' to this file triggers balance, no other value accepted.

This feature relies on SMO being enable for the cache, this is done with
a call to, after the isolate/migrate functions have been defined.

kmem_cache_setup_mobility(s, isolate, migrate)

Signed-off-by: Tobin C. Harding 
---
 mm/slub.c | 120 ++
 1 file changed, 120 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index e4f3dde443f5..a5c48c41d72b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4583,6 +4583,109 @@ static unsigned long kmem_cache_move_to_node(struct 
kmem_cache *s, int node)
 
return left;
 }
+
+/*
+ * kmem_cache_move_slabs() - Attempt to move @num slabs to target_node,
+ * @s: The cache we are working on.
+ * @node: The node to move objects from.
+ * @target_node: The node to move objects to.
+ * @num: The number of slabs to move.
+ *
+ * Attempts to move @num slabs from @node to @target_node.  This is done
+ * by migrating objects from slabs on the full_list.
+ *
+ * Return: The number of slabs moved or error code.
+ */
+static long kmem_cache_move_slabs(struct kmem_cache *s,
+ int node, int target_node, long num)
+{
+   struct kmem_cache_node *n = get_node(s, node);
+   LIST_HEAD(move_list);
+   struct page *page, *page2;
+   unsigned long flags;
+   void **scratch;
+   long done = 0;
+
+   if (node == target_node)
+   return -EINVAL;
+
+   scratch = alloc_scratch(s);
+   if (!scratch)
+   return -ENOMEM;
+
+   spin_lock_irqsave(&n->list_lock, flags);
+   list_for_each_entry_safe(page, page2, &n->full, lru) {
+   if (!slab_trylock(page))
+   /* Busy slab. Get out of the way */
+   continue;
+
+   list_move(&page->lru, &move_list);
+   page->frozen = 1;
+   slab_unlock(page);
+
+   if (++done >= num)
+   break;
+   }
+   spin_unlock_irqrestore(&n->list_lock, flags);
+
+   list_for_each_entry(page, &move_list, lru) {
+   if (page->inuse)
+   move_slab_page(page, scratch, target_node);
+   }
+   kfree(scratch);
+
+   /* Inspect results and dispose of pages */
+   spin_lock_irqsave(&n->list_lock, flags);
+   list_for_each_entry_safe(page, page2, &move_list, lru) {
+   list_del(&page->lru);
+   slab_lock(page);
+   page->frozen = 0;
+
+   if (page->inuse) {
+   /*
+* This is best effort only, if slab still has
+* objects just put it back on the partial list.
+*/
+   n->nr_partial++;
+   list_add_tail(&page->lru, &n->partial);
+   slab_unlock(page);
+   } else {
+   slab_unlock(page);
+   discard_slab(s, page);
+   }
+   }
+   spin_unlock_irqrestore(&n->list_lock, flags);
+
+   return done;
+}
+
+/*
+ * kmem_cache_balance_nodes() - Balance slabs across nodes.
+ * @s: The cache we are working on.
+ */
+static void kmem_cache_balance_nodes(struct kmem_cache *s)
+{
+   struct kmem_cache_node *n = get_node(s, 0);
+   unsigned long desired_nr_slabs_per_node;
+   unsigned long nr_slabs;
+   int nr_nodes = 0;
+   int nid;
+
+   (void)kmem_cache_move_to_node(s, 0);
+
+   for_each_node_state(nid, N_NORMAL_MEMORY)
+   nr_nodes++;
+
+   nr_slabs = atomic_long_read(&n->nr_slabs);
+   desired_nr_slabs_per_node = nr_slabs / nr_nodes;
+
+   for_each_node_state(nid, N_NORMAL_MEMORY) {
+   if (nid == 0)
+   continue;
+
+   kmem_cache_move_slabs(s, 0, nid, desired_nr_slabs_per_node);
+   }
+}
 #endif
 
 /**
@@ -5847,6 +5950,22 @@ static ssize_t move_store(struct kmem_cache *s, const 
char *buf, size_t length)
return length;
 }
 SLAB_ATTR(move);
+
+static ssize_t balance_show(struct kmem_cac

[RFC PATCH v4 14/15] dcache: Implement partial shrink via Slab Movable Objects

2019-04-29 Thread Tobin C. Harding

The dentry slab cache is susceptible to internal fragmentation.  Now
that we have Slab Movable Objects we can attempt to defragment the
dcache.  Dentry objects are inherently _not_ relocatable however under
some conditions they can be free'd.  This is the same as shrinking the
dcache but instead of shrinking the whole cache we only attempt to free
those objects that are located in partially full slab pages.  There is
no guarantee that this will reduce the memory usage of the system, it is
a compromise between fragmented memory and total cache shrinkage with
the hope that some memory pressure can be alleviated.

This is implemented using the newly added Slab Movable Objects
infrastructure.  The dcache 'migration' function is intentionally _not_
called 'd_migrate' because we only free, we do not migrate.  Call it
'd_partial_shrink' to make explicit that no reallocation is done.

Implement isolate and 'migrate' functions for the dentry slab cache.

Signed-off-by: Tobin C. Harding 
---
 fs/dcache.c | 76 +
 1 file changed, 76 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 3d6cc06eca56..3f9daba1cc78 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include "mount.h"
 
@@ -3067,6 +3068,79 @@ void d_tmpfile(struct dentry *dentry, struct inode 
*inode)
 }
 EXPORT_SYMBOL(d_tmpfile);
 
+/*
+ * d_isolate() - Dentry isolation callback function.
+ * @s: The dentry cache.
+ * @v: Vector of pointers to the objects to isolate.
+ * @nr: Number of objects in @v.
+ *
+ * The slab allocator is holding off frees. We can safely examine
+ * the object without the danger of it vanishing from under us.
+ */
+static void *d_isolate(struct kmem_cache *s, void **v, int nr)
+{
+   struct list_head *dispose;
+   struct dentry *dentry;
+   int i;
+
+   dispose = kmalloc(sizeof(*dispose), GFP_KERNEL);
+   if (!dispose)
+   return NULL;
+
+   INIT_LIST_HEAD(dispose);
+
+   for (i = 0; i < nr; i++) {
+   dentry = v[i];
+   spin_lock(&dentry->d_lock);
+
+   if (dentry->d_lockref.count > 0 ||
+   dentry->d_flags & DCACHE_SHRINK_LIST) {
+   spin_unlock(&dentry->d_lock);
+   continue;
+   }
+
+   if (dentry->d_flags & DCACHE_LRU_LIST)
+   d_lru_del(dentry);
+
+   d_shrink_add(dentry, dispose);
+   spin_unlock(&dentry->d_lock);
+   }
+
+   return dispose;
+}
+
+/*
+ * d_partial_shrink() - Dentry migration callback function.
+ * @s: The dentry cache.
+ * @_unused: We do not access the vector.
+ * @__unused: No need for length of vector.
+ * @___unused: We do not do any allocation.
+ * @private: list_head pointer representing the shrink list.
+ *
+ * Dispose of the shrink list created during isolation function.
+ *
+ * Dentry objects can _not_ be relocated and shrinking the whole dcache
+ * can be expensive.  This is an effort to free dentry objects that are
+ * stopping slab pages from being free'd without clearing the whole dcache.
+ *
+ * This callback is called from the SLUB allocator object migration
+ * infrastructure in attempt to free up slab pages by freeing dentry
+ * objects from partially full slabs.
+ */
+static void d_partial_shrink(struct kmem_cache *s, void **_unused, int 
__unused,
+int ___unused, void *private)
+{
+   struct list_head *dispose = private;
+
+   if (!private)   /* kmalloc error during isolate. */
+   return;
+
+   if (!list_empty(dispose))
+   shrink_dentry_list(dispose);
+
+   kfree(private);
+}
+
 static __initdata unsigned long dhash_entries;
 static int __init set_dhash_entries(char *str)
 {
@@ -3112,6 +3186,8 @@ static void __init dcache_init(void)
   sizeof_field(struct dentry, d_iname),
   dcache_ctor);
 
+   kmem_cache_setup_mobility(dentry_cache, d_isolate, d_partial_shrink);
+
/* Hash may have been set up in dcache_init_early */
if (!hashdist)
return;
-- 
2.21.0

[RFC PATCH v4 10/15] tools/testing/slab: Add XArray movable objects tests

2019-04-29 Thread Tobin C. Harding

We just implemented movable objects for the XArray.  Let's test it
intree.

Add test module for the XArray's movable objects implementation.

Functionality of the XArray Slab Movable Object implementation can
usually be seen by simply by using `slabinfo` on a running machine since
the radix tree is typically in use on a running machine and will have
partial slabs.  For repeated testing we can use the test module to run
to simulate a workload on the XArray then use `slabinfo` to test object
migration is functioning.

If testing on freshly spun up VM (low radix tree workload) it may be
necessary to load/unload the module a number of times to create partial
slabs.

Example test session


Relevant /proc/slabinfo column headers:

  name   

Prior to testing slabinfo report for radix_tree_node:

  # slabinfo radix_tree_node --report

  Slabcache: radix_tree_node  Aliases:  0 Order :  2 Objects: 8352
  ** Reclaim accounting active
  ** Defragmentation at 30%

  Sizes (bytes) Slabs  DebugMemory
  
  Object : 576  Total  : 497   Sanity Checks : On   Total: 8142848
  SlabObj: 912  Full   : 473   Redzoning : On   Used : 4810752
  SlabSiz:   16384  Partial:  24   Poisoning : On   Loss : 3332096
  Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig: 2806272
  Align  :   8  Objects:  17   Tracing   : Off  Lpadd:  437360

Here you can see the kernel was built with Slab Movable Objects enabled
for the XArray (XArray uses the radix tree below the surface).

After inserting the test module (note we have triggered allocation of a
number of radix tree nodes increasing the object count but decreasing the
number of partial slabs):

  # slabinfo radix_tree_node --report

  Slabcache: radix_tree_node  Aliases:  0 Order :  2 Objects: 8442
  ** Reclaim accounting active
  ** Defragmentation at 30%

  Sizes (bytes) Slabs  DebugMemory
  
  Object : 576  Total  : 499   Sanity Checks : On   Total: 8175616
  SlabObj: 912  Full   : 484   Redzoning : On   Used : 4862592
  SlabSiz:   16384  Partial:  15   Poisoning : On   Loss : 3313024
  Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig: 2836512
  Align  :   8  Objects:  17   Tracing   : Off  Lpadd:  439120

Now we can shrink the radix_tree_node cache:

  # slabinfo radix_tree_node --shrink
  # slabinfo radix_tree_node --report

  Slabcache: radix_tree_node  Aliases:  0 Order :  2 Objects: 8515
  ** Reclaim accounting active
  ** Defragmentation at 30%

  Sizes (bytes) Slabs  DebugMemory
  
  Object : 576  Total  : 501   Sanity Checks : On   Total: 8208384
  SlabObj: 912  Full   : 500   Redzoning : On   Used : 4904640
  SlabSiz:   16384  Partial:   1   Poisoning : On   Loss : 3303744
  Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig: 2861040
  Align  :   8  Objects:  17   Tracing   : Off  Lpadd:  440880

Note the single remaining partial slab.

Signed-off-by: Tobin C. Harding 
---
 tools/testing/slab/Makefile |   2 +-
 tools/testing/slab/slub_defrag_xarray.c | 211 
 2 files changed, 212 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/slab/slub_defrag_xarray.c

diff --git a/tools/testing/slab/Makefile b/tools/testing/slab/Makefile
index 440c2e3e356f..44c18d9a4d52 100644
--- a/tools/testing/slab/Makefile
+++ b/tools/testing/slab/Makefile
@@ -1,4 +1,4 @@
-obj-m += slub_defrag.o
+obj-m += slub_defrag.o slub_defrag_xarray.o
 
 KTREE=../../..
 
diff --git a/tools/testing/slab/slub_defrag_xarray.c 
b/tools/testing/slab/slub_defrag_xarray.c
new file mode 100644
index ..41143f73256c
--- /dev/null
+++ b/tools/testing/slab/slub_defrag_xarray.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SMOX_CACHE_NAME "smox_test"
+static struct kmem_cache *cachep;
+
+/*
+ * Declare XArrays globally so we can clean them up on module unload.
+ */
+
+/* Used by test_smo_xarray()*/
+DEFINE_XARRAY(things);
+
+/* Thing to store pointers to in the XArray */
+struct smox_thing {
+   long id;
+};
+
+/* It's up to the caller to ensure id is unique */
+static struct smox_thing *alloc_thing(int id)
+{
+   struct smox_thing *thing;
+
+   thing = kmem_cache_alloc(cachep, GFP_KERNEL);
+   if (!thing)
+   return ERR_PTR(-ENOMEM);
+
+   thing->id = id;
+   return thing;
+}
+
+/**
+ * smox_object_ctor() - SMO object constructor function.
+ * @ptr: Pointer to memory where the object should be constructed.
+ */
+void smox_object_

[RFC PATCH v4 09/15] xarray: Implement migration function for objects

2019-04-29 Thread Tobin C. Harding

Implement functions to migrate objects. This is based on initial code by
Matthew Wilcox and was modified to work with slab object migration.

This patch can not be merged until all radix tree & IDR users are
converted to the XArray because xa_nodes and radix tree nodes share the
same slab cache (thanks Matthew).

Co-developed-by: Christoph Lameter 
Signed-off-by: Tobin C. Harding 
---
 lib/radix-tree.c | 13 +
 lib/xarray.c | 49 
 2 files changed, 62 insertions(+)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 14d51548bea6..9412c2853726 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -1613,6 +1613,17 @@ static int radix_tree_cpu_dead(unsigned int cpu)
return 0;
 }
 
+extern void xa_object_migrate(void *tree_node, int numa_node);
+
+static void radix_tree_migrate(struct kmem_cache *s, void **objects, int nr,
+  int node, void *private)
+{
+   int i;
+
+   for (i = 0; i < nr; i++)
+   xa_object_migrate(objects[i], node);
+}
+
 void __init radix_tree_init(void)
 {
int ret;
@@ -1627,4 +1638,6 @@ void __init radix_tree_init(void)
ret = cpuhp_setup_state_nocalls(CPUHP_RADIX_DEAD, "lib/radix:dead",
NULL, radix_tree_cpu_dead);
WARN_ON(ret < 0);
+   kmem_cache_setup_mobility(radix_tree_node_cachep, NULL,
+ radix_tree_migrate);
 }
diff --git a/lib/xarray.c b/lib/xarray.c
index 6be3acbb861f..731dd3d8ddb8 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1971,6 +1971,55 @@ void xa_destroy(struct xarray *xa)
 }
 EXPORT_SYMBOL(xa_destroy);
 
+void xa_object_migrate(struct xa_node *node, int numa_node)
+{
+   struct xarray *xa = READ_ONCE(node->array);
+   void __rcu **slot;
+   struct xa_node *new_node;
+   int i;
+
+   /* Freed or not yet in tree then skip */
+   if (!xa || xa == XA_RCU_FREE)
+   return;
+
+   new_node = kmem_cache_alloc_node(radix_tree_node_cachep,
+GFP_KERNEL, numa_node);
+   if (!new_node)
+   return;
+
+   xa_lock_irq(xa);
+
+   /* Check again. */
+   if (xa != node->array) {
+   node = new_node;
+   goto unlock;
+   }
+
+   memcpy(new_node, node, sizeof(struct xa_node));
+
+   if (list_empty(&node->private_list))
+   INIT_LIST_HEAD(&new_node->private_list);
+   else
+   list_replace(&node->private_list, &new_node->private_list);
+
+   for (i = 0; i < XA_CHUNK_SIZE; i++) {
+   void *x = xa_entry_locked(xa, new_node, i);
+
+   if (xa_is_node(x))
+   rcu_assign_pointer(xa_to_node(x)->parent, new_node);
+   }
+   if (!new_node->parent)
+   slot = &xa->xa_head;
+   else
+   slot = &xa_parent_locked(xa, new_node)->slots[new_node->offset];
+   rcu_assign_pointer(*slot, xa_mk_node(new_node));
+
+unlock:
+   xa_unlock_irq(xa);
+   xa_node_free(node);
+   rcu_barrier();
+}
+
 #ifdef XA_DEBUG
 void xa_dump_node(const struct xa_node *node)
 {
-- 
2.21.0

[RFC PATCH v4 08/15] tools/testing/slab: Add object migration test suite

2019-04-29 Thread Tobin C. Harding

We just added a module that enables testing the SLUB allocators ability
to defrag/shrink caches via movable objects.  Tests are better when they
are automated.

Add automated testing via a python script for SLUB movable objects.

Example output:

  $ cd path/to/linux/tools/testing/slab
  $ /slub_defrag.py
  Please run script as root

  $ sudo ./slub_defrag.py
  

  $ sudo ./slub_defrag.py --debug
  Loading module ...
  Slab cache smo_test created
  Objects per slab: 20
  Running sanity checks ...

  Running module stress test (see dmesg for additional test output) ...
  Removing module slub_defrag ...
  Loading module ...
  Slab cache smo_test created

  Running test non-movable ...
  testing slab 'smo_test' prior to enabling movable objects ...
  verified non-movable slabs are NOT shrinkable

  Running test movable ...
  testing slab 'smo_test' after enabling movable objects ...
  verified movable slabs are shrinkable

  Removing module slub_defrag ...

Signed-off-by: Tobin C. Harding 
---
 tools/testing/slab/slub_defrag.c  |   1 +
 tools/testing/slab/slub_defrag.py | 451 ++
 2 files changed, 452 insertions(+)
 create mode 100755 tools/testing/slab/slub_defrag.py

diff --git a/tools/testing/slab/slub_defrag.c b/tools/testing/slab/slub_defrag.c
index 4a5c24394b96..8332e69ee868 100644
--- a/tools/testing/slab/slub_defrag.c
+++ b/tools/testing/slab/slub_defrag.c
@@ -337,6 +337,7 @@ static int smo_run_module_tests(int nr_objs, int keep)
 
 /*
  * struct functions() - Map command to a function pointer.
+ * If you update this please update the documentation in slub_defrag.py
  */
 struct functions {
char *fn_name;
diff --git a/tools/testing/slab/slub_defrag.py 
b/tools/testing/slab/slub_defrag.py
new file mode 100755
index ..41747c0db39b
--- /dev/null
+++ b/tools/testing/slab/slub_defrag.py
@@ -0,0 +1,451 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+import subprocess
+import sys
+from os import path
+
+# SLUB Movable Objects test suite.
+#
+# Requirements:
+#  - CONFIG_SLUB=y
+#  - CONFIG_SLUB_DEBUG=y
+#  - The slub_defrag module in this directory.
+
+# Test SMO using a kernel module that enables triggering arbitrary
+# kernel code from userspace via a debugfs file.
+#
+# Module code is in ./slub_defrag.c, basically the functionality is as
+# follows:
+#
+#  - Creates debugfs file /sys/kernel/debugfs/smo/callfn
+#  - Writes to 'callfn' are parsed as a command string and the function
+#associated with command is called.
+#  - Defines 4 commands (all commands operate on smo_test cache):
+# - 'test': Runs module stress tests.
+# - 'alloc N': Allocates N slub objects
+# - 'free N POS': Frees N objects starting at POS (see below)
+# - 'enable': Enables SLUB Movable Objects
+#
+# The module maintains a list of allocated objects.  Allocation adds
+# objects to the tail of the list.  Free'ing frees from the head of the
+# list.  This has the effect of creating free slots in the slab.  For
+# finer grained control over where in the cache slots are free'd POS
+# (position) argument may be used.
+
+# The main() function is reasonably readable; the test suite does the
+# following:
+#
+# 1. Runs the module stress tests.
+# 2. Tests the cache without movable objects enabled.
+#- Creates multiple partial slabs as explained above.
+#- Verifies that partial slabs are _not_ removed by shrink (see below).
+# 3. Tests the cache with movable objects enabled.
+#- Creates multiple partial slabs as explained above.
+#- Verifies that partial slabs _are_ removed by shrink (see below).
+
+# The sysfs file /sys/kernel/slab//shrink enables calling the
+# function kmem_cache_shrink() (see mm/slab_common.c and mm/slub.cc).
+# Shrinking a cache attempts to consolidate all partial slabs by moving
+# objects if object migration is enable for the cache, otherwise
+# shrinking a cache simply re-orders the partial list so as most densely
+# populated slab are at the head of the list.
+
+# Enable/disable debugging output (also enabled via -d | --debug).
+debug = False
+
+# Used in debug messages and when running `insmod`.
+MODULE_NAME = "slub_defrag"
+
+# Slab cache created by the test module.
+CACHE_NAME = "smo_test"
+
+# Set by get_slab_config()
+objects_per_slab = 0
+pages_per_slab = 0
+debugfs_mounted = False # Set to true if we mount debugfs.
+
+
+def eprint(*args, **kwargs):
+print(*args, file=sys.stderr, **kwargs)
+
+
+def dprint(*args, **kwargs):
+if debug:
+print(*args, file=sys.stderr, **kwargs)
+
+
+def run_shell(cmd):
+return subprocess.call([cmd], shell=True)
+
+
+def run_shell_get_stdout(cmd):
+return subprocess.check_output([cmd], shell=True)
+
+
+def assert_root():
+user = run_shell_get_stdout('whoami')
+if user != b'root\n':
+eprint("Please run script as root")
+sys.exit(1)
+
+
+def mount_debugfs():
+mounted = False
+
+# Check if debugfs is mounted at a known mount

Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

2019-04-29 Thread Al Viro

On Tue, Apr 16, 2019 at 11:01:16AM -0700, Linus Torvalds wrote:
> On Tue, Apr 16, 2019 at 10:49 AM Al Viro  wrote:
> >
> >  83 files changed, 241 insertions(+), 516 deletions(-)
> 
> I think this single line is pretty convincing on its own. Ignoring
> docs and fs/inode.c, we have
> 
>  80 files changed, 190 insertions(+), 494 deletions(-)
> 
> IOW, just over 300 lines of boiler plate code removed.
> 
> The additions are
> 
>  - Ten more lines of actual code in fs/inode.c (and that's not
> actually added complexity, it looks simpler if anything - most of it
> is the new "i_callback()" helper function)
> 
>  - 19 lines of doc updates.
> 
> So it absolutely looks fine to me.
> 
> I only skimmed through the actual filesystem (and one networking)
> patches, but they looked like trivial conversions to a better
> interface.

... except that this callback can (and always could) get executed after
freeing struct super_block.  So we can't just dereference ->i_sb->s_op
and expect to survive; the table ->s_op pointed to will still be there,
but ->i_sb might very well have been freed, with all its contents overwritten.
We need to copy the callback into struct inode itself, unfortunately.
The following incremental fixes it; I'm going to fold it into the first
commit in there.

diff --git a/Documentation/filesystems/porting 
b/Documentation/filesystems/porting
index 9d80f9e0855e..b8d3ddd8b8db 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -655,3 +655,11 @@ in your dentry operations instead.
* if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
* combination of NULL ->destroy_inode and NULL ->free_inode is
  treated as NULL/free_inode_nonrcu, to preserve the 
compatibility.
+
+   Note that the callback (be it via ->free_inode() or explicit call_rcu()
+   in ->destroy_inode()) is *NOT* ordered wrt superblock destruction;
+   as the matter of fact, the superblock and all associated structures
+   might be already gone.  The filesystem driver is guaranteed to be still
+   there, but that's it.  Freeing memory in the callback is fine; doing
+   more than that is possible, but requires a lot of care and is best
+   avoided.
diff --git a/fs/inode.c b/fs/inode.c
index fb45590d284e..855dad43b11d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -164,6 +164,7 @@ int inode_init_always(struct super_block *sb, struct inode 
*inode)
inode->i_wb_frn_avg_time = 0;
inode->i_wb_frn_history = 0;
 #endif
+   inode->free_inode = sb->s_op->free_inode;
 
if (security_inode_alloc(inode))
goto out;
@@ -211,8 +212,8 @@ EXPORT_SYMBOL(free_inode_nonrcu);
 static void i_callback(struct rcu_head *head)
 {
struct inode *inode = container_of(head, struct inode, i_rcu);
-   if (inode->i_sb->s_op->free_inode)
-   inode->i_sb->s_op->free_inode(inode);
+   if (inode->free_inode)
+   inode->free_inode(inode);
else
free_inode_nonrcu(inode);
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2e9b9f87caca..5ed6b39e588e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -718,6 +718,7 @@ struct inode {
 #endif
 
void*i_private; /* fs or device private pointer */
+   void (*free_inode)(struct inode *);
 } __randomize_layout;
 
 static inline unsigned int i_blocksize(const struct inode *node)

[RFC PATCH v4 07/15] tools/testing/slab: Add object migration test module

2019-04-29 Thread Tobin C. Harding

We just implemented slab movable objects for the SLUB allocator.  We
should test that code.  In order to do so we need to be able to do a
number of things

 - Create a cache
 - Enable Slab Movable Objects for the cache
 - Allocate objects to the cache
 - Free objects from within specific slabs of the cache

We can do all this via a loadable module.

Add a module that defines functions that can be triggered from userspace
via a debugfs entry. From the source:

  /*
   * SLUB defragmentation a.k.a. Slab Movable Objects (SMO).
   *
   * This module is used for testing the SLUB allocator.  Enables
   * userspace to run kernel functions via a debugfs file.
   *
   *   debugfs: /sys/kernel/debugfs/smo/callfn (write only)
   *
   * String written to `callfn` is parsed by the module and associated
   * function is called.  See fn_tab for mapping of strings to functions.
   */

References to allocated objects are kept by the module in a linked list
so that userspace can control which object to free.

We introduce the following four functions via the function table

  "enable": Enables object migration for the test cache.
  "alloc X": Allocates X objects
  "free X [Y]": Frees X objects starting at list position Y (default Y==0)
  "test": Runs [stress] tests from within the module (see below).

   {"enable", smo_enable_cache_mobility},
   {"alloc", smo_alloc_objects},
   {"free", smo_free_object},
   {"test", smo_run_module_tests},

Freeing from the start of the list creates a hole in the slab being
freed from (i.e. creates a partial slab).  The results of running these
commands can be see using `slabinfo` (available in tools/vm/):

make -o slabinfo tools/vm/slabinfo.c

Stress tests can be run from within the module.  These tests are
internal to the module because we verify that object references are
still good after object migration.  These are called 'stress' tests
because it is intended that they create/free a lot of objects.
Userspace can control the number of objects to create, default is 1000.

Example test session


Relevant /proc/slabinfo column headers:

  name   

  # mount -t debugfs none /sys/kernel/debug/
  $ cd path/to/linux/tools/testing/slab; make
  ...

  # insmod slub_defrag.ko
  # cat /proc/slabinfo | grep smo_test | sed 's/:.*//'
  smo_test   0  0392   202

>From this we can see that the module created cache 'smo_test' with 20
objects per slab and 2 pages per slab (and cache is currently empty).

We can play with the slab allocator manually:

  # insmod slub_defrag.ko
  # echo 'alloc 21' > callfn
  # cat /proc/slabinfo | grep smo_test | sed 's/:.*//'
  smo_test  21 40392   202

We see here that 21 active objects have been allocated creating 2
slabs (40 total objects).

  # slabinfo smo_test --report

  Slabcache: smo_test Aliases:  0 Order :  1 Objects: 21

  Sizes (bytes) Slabs  DebugMemory
  
  Object :  56  Total  :   2   Sanity Checks : On   Total:   16384
  SlabObj: 392  Full   :   1   Redzoning : On   Used :1176
  SlabSiz:8192  Partial:   1   Poisoning : On   Loss :   15208
  Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig:7056
  Align  :   8  Objects:  20   Tracing   : Off  Lpadd: 704

Now free an object from the first slot of the first slab

  # echo 'free 1' > callfn
  # cat /proc/slabinfo | grep smo_test | sed 's/:.*//'
  smo_test  20 40392   202

  # slabinfo smo_test --report

  Slabcache: smo_test Aliases:  0 Order :  1 Objects: 20

  Sizes (bytes) Slabs  DebugMemory
  
  Object :  56  Total  :   2   Sanity Checks : On   Total:   16384
  SlabObj: 392  Full   :   0   Redzoning : On   Used :1120
  SlabSiz:8192  Partial:   2   Poisoning : On   Loss :   15264
  Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig:6720
  Align  :   8  Objects:  20   Tracing   : Off  Lpadd: 704

Calling shrink now on the cache does nothing because object migration is
not enabled (output omitted).  If we enable object migration then shrink
the cache we expect the object from the second slab to me moved to the
first slot in the first slab and the second slab to be removed from the
partial list.

  # echo 'enable' > callfn
  # slabinfo smo_test --shrink
  # slabinfo smo_test --report

  Slabcache: smo_test Aliases:  0 Order :  1 Objects: 20
  ** Defragmentation at 30%

  Sizes (bytes) Slabs  DebugMemory
  
  Object :  56  Total  :   1   Sanity Checks : On   Total:8192
  SlabObj: 392  Full   :   1   Redzonin

[RFC PATCH v4 06/15] tools/vm/slabinfo: Add defrag_used_ratio output

2019-04-29 Thread Tobin C. Harding

Add output for the newly added defrag_used_ratio sysfs knob.

Signed-off-by: Tobin C. Harding 
---
 tools/vm/slabinfo.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index d2c22f9ee2d8..ef4ff93df4cc 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -34,6 +34,7 @@ struct slabinfo {
unsigned int sanity_checks, slab_size, store_user, trace;
int order, poison, reclaim_account, red_zone;
int movable, ctor;
+   int defrag_used_ratio;
int remote_node_defrag_ratio;
unsigned long partial, objects, slabs, objects_partial, objects_total;
unsigned long alloc_fastpath, alloc_slowpath;
@@ -549,6 +550,8 @@ static void report(struct slabinfo *s)
printf("** Slabs are destroyed via RCU\n");
if (s->reclaim_account)
printf("** Reclaim accounting active\n");
+   if (s->movable)
+   printf("** Defragmentation at %d%%\n", s->defrag_used_ratio);
 
printf("\nSizes (bytes) Slabs  Debug
Memory\n");

printf("\n");
@@ -1279,6 +1282,7 @@ static void read_slab_dir(void)
slab->deactivate_bypass = get_obj("deactivate_bypass");
slab->remote_node_defrag_ratio =
get_obj("remote_node_defrag_ratio");
+   slab->defrag_used_ratio = get_obj("defrag_used_ratio");
chdir("..");
if (read_slab_obj(slab, "ops")) {
if (strstr(buffer, "ctor :"))
-- 
2.21.0

[RFC PATCH v4 05/15] tools/vm/slabinfo: Add remote node defrag ratio output

2019-04-29 Thread Tobin C. Harding

Add output line for NUMA remote node defrag ratio.

Signed-off-by: Tobin C. Harding 
---
 tools/vm/slabinfo.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index cbfc56c44c2f..d2c22f9ee2d8 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -34,6 +34,7 @@ struct slabinfo {
unsigned int sanity_checks, slab_size, store_user, trace;
int order, poison, reclaim_account, red_zone;
int movable, ctor;
+   int remote_node_defrag_ratio;
unsigned long partial, objects, slabs, objects_partial, objects_total;
unsigned long alloc_fastpath, alloc_slowpath;
unsigned long free_fastpath, free_slowpath;
@@ -377,6 +378,10 @@ static void slab_numa(struct slabinfo *s, int mode)
if (skip_zero && !s->slabs)
return;
 
+   if (mode) {
+   printf("\nNUMA remote node defrag ratio: %3d\n",
+  s->remote_node_defrag_ratio);
+   }
if (!line) {
printf("\n%-21s:", mode ? "NUMA nodes" : "Slab");
for(node = 0; node <= highest_node; node++)
@@ -1272,6 +1277,8 @@ static void read_slab_dir(void)
slab->cpu_partial_free = get_obj("cpu_partial_free");
slab->alloc_node_mismatch = 
get_obj("alloc_node_mismatch");
slab->deactivate_bypass = get_obj("deactivate_bypass");
+   slab->remote_node_defrag_ratio =
+   get_obj("remote_node_defrag_ratio");
chdir("..");
if (read_slab_obj(slab, "ops")) {
if (strstr(buffer, "ctor :"))
-- 
2.21.0

[RFC PATCH v4 04/15] slub: Slab defrag core

2019-04-29 Thread Tobin C. Harding

Internal fragmentation can occur within pages used by the slub
allocator.  Under some workloads large numbers of pages can be used by
partial slab pages.  This under-utilisation is bad simply because it
wastes memory but also because if the system is under memory pressure
higher order allocations may become difficult to satisfy.  If we can
defrag slab caches we can alleviate these problems.

Implement Slab Movable Objects in order to defragment slab caches.

Slab defragmentation may occur:

1. Unconditionally when __kmem_cache_shrink() is called on a slab cache
   by the kernel calling kmem_cache_shrink().

2. Unconditionally through the use of the slabinfo command.

slabinfo  -s

3. Conditionally via the use of kmem_cache_defrag()

- Use Slab Movable Objects when shrinking cache.

Currently when the kernel calls kmem_cache_shrink() we curate the
partial slabs list.  If object migration is not enabled for the cache we
still do this, if however, SMO is enabled we attempt to move objects in
partially full slabs in order to defragment the cache.  Shrink attempts
to move all objects in order to reduce the cache to a single partial
slab for each node.

- Add conditional per node defrag via new function:

kmem_defrag_slabs(int node).

kmem_defrag_slabs() attempts to defragment all slab caches for node.
 Defragmentation is done conditionally dependent on MAX_PARTIAL _AND_
 defrag_used_ratio.

   Caches are only considered for defragmentation if the number of
   partial slabs exceeds MAX_PARTIAL (per node).

   Also, defragmentation only occurs if the usage ratio of the slab is
   lower than the configured percentage (sysfs field added in this
   patch).  Fragmentation ratios are measured by calculating the
   percentage of objects in use compared to the total number of objects
   that the slab page can accommodate.

   The scanning of slab caches is optimized because the defragmentable
   slabs come first on the list. Thus we can terminate scans on the
   first slab encountered that does not support defragmentation.

   kmem_defrag_slabs() takes a node parameter. This can either be -1 if
   defragmentation should be performed on all nodes, or a node number.

   Defragmentation may be disabled by setting defrag ratio to 0

echo 0 > /sys/kernel/slab//defrag_used_ratio

- Add a defrag ratio sysfs field and set it to 30% by default. A limit
of 30% specifies that more than 3 out of 10 available slots for objects
need to be in use otherwise slab defragmentation will be attempted on
the remaining objects.

In order for a cache to be defragmentable the cache must support object
migration (SMO).  Enabling SMO for a cache is done via a call to the
recently added function:

void kmem_cache_setup_mobility(struct kmem_cache *,
   kmem_cache_isolate_func,
   kmem_cache_migrate_func);

Co-developed-by: Christoph Lameter 
Signed-off-by: Tobin C. Harding 
---
 Documentation/ABI/testing/sysfs-kernel-slab |  14 +
 include/linux/slab.h|   1 +
 include/linux/slub_def.h|   7 +
 mm/slub.c   | 385 
 4 files changed, 334 insertions(+), 73 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-slab 
b/Documentation/ABI/testing/sysfs-kernel-slab
index 29601d93a1c2..7770c03be6b4 100644
--- a/Documentation/ABI/testing/sysfs-kernel-slab
+++ b/Documentation/ABI/testing/sysfs-kernel-slab
@@ -180,6 +180,20 @@ Description:
list.  It can be written to clear the current count.
Available when CONFIG_SLUB_STATS is enabled.
 
+What:  /sys/kernel/slab/cache/defrag_used_ratio
+Date:  February 2019
+KernelVersion: 5.0
+Contact:   Christoph Lameter 
+   Pekka Enberg ,
+Description:
+   The defrag_used_ratio file allows the control of how aggressive
+   slab fragmentation reduction works at reclaiming objects from
+   sparsely populated slabs. This is a percentage. If a slab has
+   less than this percentage of objects allocated then reclaim will
+   attempt to reclaim objects so that the whole slab page can be
+   freed. 0% specifies no reclaim attempt (defrag disabled), 100%
+   specifies attempt to reclaim all pages.  The default is 30%.
+
 What:  /sys/kernel/slab/cache/deactivate_to_tail
 Date:  February 2008
 KernelVersion: 2.6.25
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 886fc130334d..4bf381b34829 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -149,6 +149,7 @@ struct kmem_cache *kmem_cache_create_usercopy(const char 
*name,
void (*ctor)(void *));
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
+unsigned long kmem_defrag_slabs(int node);
 
 void memcg_create_kmem_cache(struct mem

[RFC PATCH v4 02/15] tools/vm/slabinfo: Add support for -C and -M options

2019-04-29 Thread Tobin C. Harding

-C lists caches that use a ctor.

-M lists caches that support object migration.

Add command line options to show caches with a constructor and caches
that are movable (i.e. have migrate function).

Co-developed-by: Christoph Lameter 
Signed-off-by: Tobin C. Harding 
---
 tools/vm/slabinfo.c | 40 
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index 73818f1b2ef8..cbfc56c44c2f 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -33,6 +33,7 @@ struct slabinfo {
unsigned int hwcache_align, object_size, objs_per_slab;
unsigned int sanity_checks, slab_size, store_user, trace;
int order, poison, reclaim_account, red_zone;
+   int movable, ctor;
unsigned long partial, objects, slabs, objects_partial, objects_total;
unsigned long alloc_fastpath, alloc_slowpath;
unsigned long free_fastpath, free_slowpath;
@@ -67,6 +68,8 @@ int show_report;
 int show_alias;
 int show_slab;
 int skip_zero = 1;
+int show_movable;
+int show_ctor;
 int show_numa;
 int show_track;
 int show_first_alias;
@@ -109,11 +112,13 @@ static void fatal(const char *x, ...)
 
 static void usage(void)
 {
-   printf("slabinfo 4/15/2011. (c) 2007 sgi/(c) 2011 Linux Foundation.\n\n"
-   "slabinfo [-aADefhilnosrStTvz1LXBU] [N=K] [-dafzput] 
[slab-regexp]\n"
+   printf("slabinfo 4/15/2017. (c) 2007 sgi/(c) 2011 Linux Foundation/(c) 
2017 Jump Trading LLC.\n\n"
+  "slabinfo [-aACDefhilMnosrStTvz1LXBU] [N=K] [-dafzput] 
[slab-regexp]\n"
+
"-a|--aliases   Show aliases\n"
"-A|--activity  Most active slabs first\n"
"-B|--Bytes Show size in bytes\n"
+   "-C|--ctor  Show slabs with ctors\n"
"-D|--display-activeSwitch line format to activity\n"
"-e|--empty Show empty slabs\n"
"-f|--first-alias   Show first alias\n"
@@ -121,6 +126,7 @@ static void usage(void)
"-i|--inverted  Inverted list\n"
"-l|--slabs Show slabs\n"
"-L|--Loss  Sort by loss\n"
+   "-M|--movable   Show caches that support movable 
objects\n"
"-n|--numa  Show NUMA information\n"
"-N|--lines=K   Show the first K slabs\n"
"-o|--ops   Show kmem_cache_ops\n"
@@ -588,6 +594,12 @@ static void slabcache(struct slabinfo *s)
if (show_empty && s->slabs)
return;
 
+   if (show_ctor && !s->ctor)
+   return;
+
+   if (show_movable && !s->movable)
+   return;
+
if (sort_loss == 0)
store_size(size_str, slab_size(s));
else
@@ -602,6 +614,10 @@ static void slabcache(struct slabinfo *s)
*p++ = '*';
if (s->cache_dma)
*p++ = 'd';
+   if (s->ctor)
+   *p++ = 'C';
+   if (s->movable)
+   *p++ = 'M';
if (s->hwcache_align)
*p++ = 'A';
if (s->poison)
@@ -636,7 +652,8 @@ static void slabcache(struct slabinfo *s)
printf("%-21s %8ld %7d %15s %14s %4d %1d %3ld %3ld %s\n",
s->name, s->objects, s->object_size, size_str, dist_str,
s->objs_per_slab, s->order,
-   s->slabs ? (s->partial * 100) / s->slabs : 100,
+   s->slabs ? (s->partial * 100) /
+   (s->slabs * s->objs_per_slab) : 100,
s->slabs ? (s->objects * s->object_size * 100) /
(s->slabs * (page_size << s->order)) : 100,
flags);
@@ -1256,6 +1273,13 @@ static void read_slab_dir(void)
slab->alloc_node_mismatch = 
get_obj("alloc_node_mismatch");
slab->deactivate_bypass = get_obj("deactivate_bypass");
chdir("..");
+   if (read_slab_obj(slab, "ops")) {
+   if (strstr(buffer, "ctor :"))
+   slab->ctor = 1;
+   if (strstr(buffer, "migrate :"))
+   slab->movable = 1;
+   }
+
if (slab->name[0] == ':')
alias_targets++;
slab++;
@@ -1332,6 +1356,8 @@ static void xtotals(void)
 }
 
 struct option opts[] = {
+   { "ctor", no_argument, NULL, 'C' },
+   { "movable", no_argument, NULL, 'M' },
{ "aliases", no_argument, NULL, 'a' },
{ "activity", no_argument, NULL, 'A' },
{ "debug", optional_argument, NULL, 'd' },
@@ -1367,7 +1393,7 @@ int main(int argc, char *argv[])
 
page_size = getpagesize();
 
-

[RFC PATCH v4 03/15] slub: Sort slab cache list

2019-04-29 Thread Tobin C. Harding

It is advantageous to have all defragmentable slabs together at the
beginning of the list of slabs so that there is no need to scan the
complete list. Put defragmentable caches first when adding a slab cache
and others last.

Co-developed-by: Christoph Lameter 
Signed-off-by: Tobin C. Harding 
---
 mm/slab_common.c | 2 +-
 mm/slub.c| 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 58251ba63e4a..db5e9a0b1535 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -393,7 +393,7 @@ static struct kmem_cache *create_cache(const char *name,
goto out_free_cache;
 
s->refcount = 1;
-   list_add(&s->list, &slab_caches);
+   list_add_tail(&s->list, &slab_caches);
memcg_link_cache(s);
 out:
if (err)
diff --git a/mm/slub.c b/mm/slub.c
index ae44d640b8c1..f6b0e4a395ef 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4342,6 +4342,8 @@ void kmem_cache_setup_mobility(struct kmem_cache *s,
return;
}
 
+   mutex_lock(&slab_mutex);
+
s->isolate = isolate;
s->migrate = migrate;
 
@@ -4350,6 +4352,10 @@ void kmem_cache_setup_mobility(struct kmem_cache *s,
 * to disable fast cmpxchg based processing.
 */
s->flags &= ~__CMPXCHG_DOUBLE;
+
+   list_move(&s->list, &slab_caches);  /* Move to top */
+
+   mutex_unlock(&slab_mutex);
 }
 EXPORT_SYMBOL(kmem_cache_setup_mobility);
 
-- 
2.21.0

[RFC PATCH v4 01/15] slub: Add isolate() and migrate() methods

2019-04-29 Thread Tobin C. Harding

Add the two methods needed for moving objects and enable the display of
the callbacks via the /sys/kernel/slab interface.

Add documentation explaining the use of these methods and the prototypes
for slab.h. Add functions to setup the callbacks method for a slab
cache.

Add empty functions for SLAB/SLOB. The API is generic so it could be
theoretically implemented for these allocators as well.

Change sysfs 'ctor' field to be 'ops' to contain all the callback
operations defined for a slab cache.  Display the existing 'ctor'
callback in the ops fields contents along with 'isolate' and 'migrate'
callbacks.

Co-developed-by: Christoph Lameter 
Signed-off-by: Tobin C. Harding 
---
 include/linux/slab.h | 70 
 include/linux/slub_def.h |  3 ++
 mm/slub.c| 59 +
 3 files changed, 126 insertions(+), 6 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 9449b19c5f10..886fc130334d 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -154,6 +154,76 @@ void memcg_create_kmem_cache(struct mem_cgroup *, struct 
kmem_cache *);
 void memcg_deactivate_kmem_caches(struct mem_cgroup *);
 void memcg_destroy_kmem_caches(struct mem_cgroup *);
 
+/*
+ * Function prototypes passed to kmem_cache_setup_mobility() to enable
+ * mobile objects and targeted reclaim in slab caches.
+ */
+
+/**
+ * typedef kmem_cache_isolate_func - Object migration callback function.
+ * @s: The cache we are working on.
+ * @ptr: Pointer to an array of pointers to the objects to isolate.
+ * @nr: Number of objects in @ptr array.
+ *
+ * The purpose of kmem_cache_isolate_func() is to pin each object so that
+ * they cannot be freed until kmem_cache_migrate_func() has processed
+ * them. This may be accomplished by increasing the refcount or setting
+ * a flag.
+ *
+ * The object pointer array passed is also passed to
+ * kmem_cache_migrate_func().  The function may remove objects from the
+ * array by setting pointers to %NULL. This is useful if we can
+ * determine that an object is being freed because
+ * kmem_cache_isolate_func() was called when the subsystem was calling
+ * kmem_cache_free().  In that case it is not necessary to increase the
+ * refcount or specially mark the object because the release of the slab
+ * lock will lead to the immediate freeing of the object.
+ *
+ * Context: Called with locks held so that the slab objects cannot be
+ *  freed.  We are in an atomic context and no slab operations
+ *  may be performed.
+ * Return: A pointer that is passed to the migrate function. If any
+ * objects cannot be touched at this point then the pointer may
+ * indicate a failure and then the migration function can simply
+ * remove the references that were already obtained. The private
+ * data could be used to track the objects that were already pinned.
+ */
+typedef void *kmem_cache_isolate_func(struct kmem_cache *s, void **ptr, int 
nr);
+
+/**
+ * typedef kmem_cache_migrate_func - Object migration callback function.
+ * @s: The cache we are working on.
+ * @ptr: Pointer to an array of pointers to the objects to migrate.
+ * @nr: Number of objects in @ptr array.
+ * @node: The NUMA node where the object should be allocated.
+ * @private: The pointer returned by kmem_cache_isolate_func().
+ *
+ * This function is responsible for migrating objects.  Typically, for
+ * each object in the input array you will want to allocate an new
+ * object, copy the original object, update any pointers, and free the
+ * old object.
+ *
+ * After this function returns all pointers to the old object should now
+ * point to the new object.
+ *
+ * Context: Called with no locks held and interrupts enabled.  Sleeping
+ *  is possible.  Any operation may be performed.
+ */
+typedef void kmem_cache_migrate_func(struct kmem_cache *s, void **ptr,
+int nr, int node, void *private);
+
+/*
+ * kmem_cache_setup_mobility() is used to setup callbacks for a slab cache.
+ */
+#ifdef CONFIG_SLUB
+void kmem_cache_setup_mobility(struct kmem_cache *, kmem_cache_isolate_func,
+  kmem_cache_migrate_func);
+#else
+static inline void
+kmem_cache_setup_mobility(struct kmem_cache *s, kmem_cache_isolate_func 
isolate,
+ kmem_cache_migrate_func migrate) {}
+#endif
+
 /*
  * Please use this macro to create slab caches. Simply specify the
  * name of the structure and maybe some flags that are listed above.
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index d2153789bd9f..2879a2f5f8eb 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -99,6 +99,9 @@ struct kmem_cache {
gfp_t allocflags;   /* gfp flags to use on each alloc */
int refcount;   /* Refcount for slab cache destroy */
void (*ctor)(void *);
+   kmem_cache_isolate_func *isolate;
+

[RFC PATCH v4 00/15] Slab Movable Objects (SMO)

2019-04-29 Thread Tobin C. Harding

Hi,

Another iteration of the SMO patch set, updates to this version are
restricted to the dcache patch #14.

Applies on top of Linus' tree (tag: v5.1-rc6).

This is a patch set implementing movable objects within the SLUB
allocator.  This is work based on Christopher Lameter's patch set:

 https://lore.kernel.org/patchwork/project/lkml/list/?series=377335

The original code logic is from that set and implemented by Christopher.
Clean up, refactoring, documentation, and additional features by myself.
Responsibility for any bugs remaining falls solely with myself.

Changes to this version:

Re-write the dcache Slab Movable Objects isolate/migrate functions.
Based on review/suggestions by Alexander on the last version.

In this version the isolate function loops over the object vector and
builds a shrink list for all objects that have refcount==0 AND are NOT
on anyone else's shrink list.  A pointer to this list is returned from
the isolate function and passed to the migrate function (by the SMO
infrastructure).  The dentry migration function d_partial_shrink()
simply calls shrink_dentry_list() on the received shrink list pointer
and frees the memory associated with the list_head.

Hopefully if this is all ok I can move on to violating the inode
slab cache :)

FWIW testing on a VM in Qemu brings this mild benefit to the dentry slab
cache with no _apparent_ negatives.

CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
CONFIG_SLUB_CPU_PARTIAL=y
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_STATS=y
CONFIG_SMO_NODE=y
CONFIG_DCACHE_SMO=y

[root@vm ~]# slabinfo  dentry -r | head -n 13

Slabcache: dentry   Aliases:  0 Order :  1 Objects: 38585
** Reclaim accounting active
** Defragmentation at 30%

Sizes (bytes) Slabs  DebugMemory

Object : 192  Total  :2582   Sanity Checks : On   Total: 21151744
SlabObj: 528  Full   :2547   Redzoning : On   Used : 7408320
SlabSiz:8192  Partial:  35   Poisoning : On   Loss : 13743424
Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig: 12964560
Align  :   8  Objects:  15   Tracing   : Off  Lpadd:  702304

[root@vm ~]# slabinfo  dentry --shrink
[root@vm ~]# slabinfo  dentry -r | head -n 13

Slabcache: dentry   Aliases:  0 Order :  1 Objects: 38426
** Reclaim accounting active
** Defragmentation at 30%

Sizes (bytes) Slabs  DebugMemory

Object : 192  Total  :2578   Sanity Checks : On   Total: 21118976
SlabObj: 528  Full   :2547   Redzoning : On   Used : 7377792
SlabSiz:8192  Partial:  31   Poisoning : On   Loss : 13741184
Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig: 12911136
Align  :   8  Objects:  15   Tracing   : Off  Lpadd:  701216


Please note, this dentry shrink implementation is 'best effort', results
vary.  This is as is expected.  We are trying to unobtrusively shrink
the dentry cache.

thanks,
Tobin.


Tobin C. Harding (15):
  slub: Add isolate() and migrate() methods
  tools/vm/slabinfo: Add support for -C and -M options
  slub: Sort slab cache list
  slub: Slab defrag core
  tools/vm/slabinfo: Add remote node defrag ratio output
  tools/vm/slabinfo: Add defrag_used_ratio output
  tools/testing/slab: Add object migration test module
  tools/testing/slab: Add object migration test suite
  xarray: Implement migration function for objects
  tools/testing/slab: Add XArray movable objects tests
  slub: Enable moving objects to/from specific nodes
  slub: Enable balancing slabs across nodes
  dcache: Provide a dentry constructor
  dcache: Implement partial shrink via Slab Movable Objects
  dcache: Add CONFIG_DCACHE_SMO

 Documentation/ABI/testing/sysfs-kernel-slab |  14 +
 fs/dcache.c | 110 ++-
 include/linux/slab.h|  71 ++
 include/linux/slub_def.h|  10 +
 lib/radix-tree.c|  13 +
 lib/xarray.c|  49 ++
 mm/Kconfig  |  14 +
 mm/slab_common.c|   2 +-
 mm/slub.c   | 819 ++--
 tools/testing/slab/Makefile |  10 +
 tools/testing/slab/slub_defrag.c| 567 ++
 tools/testing/slab/slub_defrag.py   | 451 +++
 tools/testing/slab/slub_defrag_xarray.c | 211 +
 tools/vm/slabinfo.c |  51 +-
 14 files changed, 2299 insertions(+), 93 deletions(-)
 create mode 100644 tools/testing/slab/Makefile
 create mode 100644 tools/testing/slab/slub_defrag.c
 create mode 100755 tools/testing/slab/slub_defrag.py
 create mode 100644 tools/testing/slab/slub_defrag_xarray.c

-- 
2.21.0

Re: [PATCH -next] ASoC: sprd: Fix to use list_for_each_entry_safe() when delete items

2019-04-29 Thread Baolin Wang

Hi,

On Mon, 29 Apr 2019 at 20:27, Wei Yongjun  wrote:
>
> Since we will remove items off the list using list_del() we need
> to use a safe version of the list_for_each_entry() macro aptly named
> list_for_each_entry_safe().
>
> Fixes: d7bff893e04f ("ASoC: sprd: Add Spreadtrum multi-channel data transfer 
> support")
> Signed-off-by: Wei Yongjun 

Yes, thanks for your fixes.
Reviewed-by: Baolin Wang 

> ---
>  sound/soc/sprd/sprd-mcdt.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/sound/soc/sprd/sprd-mcdt.c b/sound/soc/sprd/sprd-mcdt.c
> index 28f5e649733d..df250f7f2b6f 100644
> --- a/sound/soc/sprd/sprd-mcdt.c
> +++ b/sound/soc/sprd/sprd-mcdt.c
> @@ -978,12 +978,12 @@ static int sprd_mcdt_probe(struct platform_device *pdev)
>
>  static int sprd_mcdt_remove(struct platform_device *pdev)
>  {
> -   struct sprd_mcdt_chan *temp;
> +   struct sprd_mcdt_chan *chan, *temp;
>
> mutex_lock(&sprd_mcdt_list_mutex);
>
> -   list_for_each_entry(temp, &sprd_mcdt_chan_list, list)
> -   list_del(&temp->list);
> +   list_for_each_entry_safe(chan, temp, &sprd_mcdt_chan_list, list)
> +   list_del(&chan->list);
>
> mutex_unlock(&sprd_mcdt_list_mutex);
>
>
>


-- 
Baolin Wang
Best Regards

Re: [PATCH V2] staging: fieldbus: anybus-s: force endiannes annotation

2019-04-29 Thread Al Viro

On Tue, Apr 30, 2019 at 04:22:38AM +0200, Nicholas Mc Guire wrote:
> On Mon, Apr 29, 2019 at 10:03:36AM -0400, Sven Van Asbroeck wrote:
> > On Mon, Apr 29, 2019 at 2:11 AM Nicholas Mc Guire  wrote:
> > >
> > > V2: As requested by Sven Van Asbroeck  make the
> > > impact of the patch clear in the commit message.
> > 
> > Thank you, but did you miss my comment about creating a local variable
> > instead? See:
> > https://lkml.org/lkml/2019/4/28/97
> 
> Did not miss it - I just don't think that makes it any more
> understandable - the __force __be16 makes it clear I believe
> that this is correct, sparse does not like this though - so tell
> sparse.

... to STFU, 'cause you know better.  The trouble is, how do we
(or yourself a year or two later) know *why* it is correct?
Worse, how do we (or yourself, etc.) know if a change about to be
done to the code won't invalidate the proof of yours?

> The local variable would need to be explained as it is
> functionally not necessary - therefor I find it more confusing
> that using  __force here.

What's confusing is mixing host- and fixed-endian values in the
same variable at different times.  Treat those as unrelated
types that happen to have the same sizeof.

Quite a few of __force instances in the tree should be taken out
and shot.  Don't add to their number.

RE: [PATCH] clk: imx: pllv3: Fix fall through build warning

2019-04-29 Thread Aisheng Dong

> From: Anson Huang
> Sent: Tuesday, April 30, 2019 9:55 AM
> Subject: [PATCH] clk: imx: pllv3: Fix fall through build warning
> 
> Fix below fall through build warning:
> 
> drivers/clk/imx/clk-pllv3.c:453:21: warning:
> this statement may fall through [-Wimplicit-fallthrough=]
> 
>pll->denom_offset = PLL_IMX7_DENOM_OFFSET;
>  ^
> drivers/clk/imx/clk-pllv3.c:454:2: note: here
>   case IMX_PLLV3_AV:
>   ^~~~
> 
> Signed-off-by: Anson Huang 

Reviewed-by: Dong Aisheng 

Regards
Dong Aisheng

Re: [PATCH -next] ASoC: sprd: Fix return value check in sprd_mcdt_probe()

2019-04-29 Thread Baolin Wang

On Mon, 29 Apr 2019 at 20:15, Wei Yongjun  wrote:
>
> In case of error, the function devm_ioremap_resource() returns ERR_PTR()
> and never returns NULL. The NULL test in the return value check should
> be replaced with IS_ERR().
>
> Fixes: d7bff893e04f ("ASoC: sprd: Add Spreadtrum multi-channel data transfer 
> support")
> Signed-off-by: Wei Yongjun 

Thanks for fixing my mistake.
Reviewed-by: Baolin Wang 

> ---
>  sound/soc/sprd/sprd-mcdt.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/sound/soc/sprd/sprd-mcdt.c b/sound/soc/sprd/sprd-mcdt.c
> index 28f5e649733d..e9318d7a4810 100644
> --- a/sound/soc/sprd/sprd-mcdt.c
> +++ b/sound/soc/sprd/sprd-mcdt.c
> @@ -951,8 +951,8 @@ static int sprd_mcdt_probe(struct platform_device *pdev)
>
> res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> mcdt->base = devm_ioremap_resource(&pdev->dev, res);
> -   if (!mcdt->base)
> -   return -ENOMEM;
> +   if (IS_ERR(mcdt->base))
> +   return PTR_ERR(mcdt->base);
>
> mcdt->dev = &pdev->dev;
> spin_lock_init(&mcdt->lock);
>
>
>


-- 
Baolin Wang
Best Regards

Re: INFO: task hung in __get_super

2019-04-29 Thread Jan Kara

On Sun 28-04-19 19:51:09, Al Viro wrote:
> On Sun, Apr 28, 2019 at 11:14:06AM -0700, syzbot wrote:
> >  down_read+0x49/0x90 kernel/locking/rwsem.c:26
> >  __get_super.part.0+0x203/0x2e0 fs/super.c:788
> >  __get_super include/linux/spinlock.h:329 [inline]
> >  get_super+0x2e/0x50 fs/super.c:817
> >  fsync_bdev+0x19/0xd0 fs/block_dev.c:525
> >  invalidate_partition+0x36/0x60 block/genhd.c:1581
> >  drop_partitions block/partition-generic.c:443 [inline]
> >  rescan_partitions+0xef/0xa20 block/partition-generic.c:516
> >  __blkdev_reread_part+0x1a2/0x230 block/ioctl.c:173
> >  blkdev_reread_part+0x27/0x40 block/ioctl.c:193
> >  loop_reread_partitions+0x1c/0x40 drivers/block/loop.c:633
> >  loop_set_status+0xe57/0x1380 drivers/block/loop.c:1296
> >  loop_set_status64+0xc2/0x120 drivers/block/loop.c:1416
> >  lo_ioctl+0x8fc/0x2150 drivers/block/loop.c:1559
> >  __blkdev_driver_ioctl block/ioctl.c:303 [inline]
> >  blkdev_ioctl+0x6f2/0x1d10 block/ioctl.c:605
> >  block_ioctl+0xee/0x130 fs/block_dev.c:1933
> >  vfs_ioctl fs/ioctl.c:46 [inline]
> >  file_ioctl fs/ioctl.c:509 [inline]
> >  do_vfs_ioctl+0xd6e/0x1390 fs/ioctl.c:696
> >  ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
> >  __do_sys_ioctl fs/ioctl.c:720 [inline]
> >  __se_sys_ioctl fs/ioctl.c:718 [inline]
> >  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
> >  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> ioctl(..., BLKRRPART) blocked on ->s_umount in __get_super().
> The trouble is, the only things holding ->s_umount appears to be
> these:
> 
> > 2 locks held by syz-executor274/11716:
> >  #0: a19e2025 (&type->s_umount_key#38/1){+.+.}, at:
> > alloc_super+0x158/0x890 fs/super.c:228
> >  #1: bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
> > drivers/block/loop.c:1514 [inline]
> >  #1: bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
> > drivers/block/loop.c:1572
> 
> > 2 locks held by syz-executor274/11717:
> >  #0: e185c083 (&type->s_umount_key#38/1){+.+.}, at:
> > alloc_super+0x158/0x890 fs/super.c:228
> >  #1: bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
> > drivers/block/loop.c:1514 [inline]
> >  #1: bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
> > drivers/block/loop.c:1572
> 
> ... and that's bollocks.  ->s_umount held there is that on freshly allocated
> superblock.  It *MUST* be in mount(2); no other syscall should be able to
> call alloc_super() in the first place.  So what the hell is that doing
> trying to call lo_ioctl() inside mount(2)?  Something like isofs attempting
> cdrom ioctls on the underlying device?

Actually UDF also calls CDROMMULTISESSION ioctl during mount. So I could
see how we get to lo_simple_ioctl() and indeed that would acquire
loop_ctl_mutex under s_umount which is the other way around than in
BLKRRPART ioctl. 

> Why do we have loop_func_table->ioctl(), BTW?  All in-tree instances are
> either NULL or return -EINVAL unconditionally.  Considering that the
> caller is
> err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
> we could bloody well just get rid of cryptoloop_ioctl() (the only
> non-NULL instance) and get rid of calling lo_simple_ioctl() in
> lo_ioctl() switch's default.

Yeah, you're right. And if we push the patch a bit further to not take
loop_ctl_mutex for invalid ioctl number, that would fix the problem. I
can send a fix.

Honza

> 
> Something like this:
> 
> diff --git a/drivers/block/cryptoloop.c b/drivers/block/cryptoloop.c
> index 254ee7d54e91..f16468a562f5 100644
> --- a/drivers/block/cryptoloop.c
> +++ b/drivers/block/cryptoloop.c
> @@ -167,12 +167,6 @@ cryptoloop_transfer(struct loop_device *lo, int cmd,
>  }
>  
>  static int
> -cryptoloop_ioctl(struct loop_device *lo, int cmd, unsigned long arg)
> -{
> - return -EINVAL;
> -}
> -
> -static int
>  cryptoloop_release(struct loop_device *lo)
>  {
>   struct crypto_sync_skcipher *tfm = lo->key_data;
> @@ -188,7 +182,6 @@ cryptoloop_release(struct loop_device *lo)
>  static struct loop_func_table cryptoloop_funcs = {
>   .number = LO_CRYPT_CRYPTOAPI,
>   .init = cryptoloop_init,
> - .ioctl = cryptoloop_ioctl,
>   .transfer = cryptoloop_transfer,
>   .release = cryptoloop_release,
>   .owner = THIS_MODULE
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index bf1c61cab8eb..2ec162b80562 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -955,7 +955,6 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
> mode,
>   lo->lo_flags = lo_flags;
>   lo->lo_backing_file = file;
>   lo->transfer = NULL;
> - lo->ioctl = NULL;
>   lo->lo_sizelimit = 0;
>   lo->old_gfp_mask = mapping_gfp_mask(mapping);
>   mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
> @@ -1064,7 +1063,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool 
> release)

Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation

2019-04-29 Thread Linus Torvalds

On Mon, Apr 29, 2019 at 5:45 PM Sean Christopherson
 wrote:
>
> On Mon, Apr 29, 2019 at 05:08:46PM -0700, Sean Christopherson wrote:
> >
> > It's 486 based, but either way I suspect the answer is "yes".  IIRC,
> > Knights Corner, a.k.a. Larrabee, also had funkiness around SMM and that
> > was based on P54C, though I'm struggling to recall exactly what the
> > Larrabee weirdness was.
>
> Aha!  Found an ancient comment that explicitly states P5 does not block
> NMI/SMI in the STI shadow, while P6 does block NMI/SMI.

Ok, so the STI shadow really wouldn't be reliable on those machines. Scary.

Of course, the good news is that hopefully nobody has them any more,
and if they do, they presumably don't use fancy NMI profiling etc, so
any actual NMI's are probably relegated purely to largely rare and
effectively fatal errors anyway (ie memory parity errors).

 Linus

Re: [PATCH] quota: set init_needed flag only when successfully getting dquot

2019-04-29 Thread cgxu519


On 4/30/19 5:49 AM, Jan Kara wrote:

On Sun 28-04-19 13:39:21, Chengguang Xu wrote:

Set init_needed flag only when successfully getting dquot,
so that we can skip unnecessary subsequent operation.

Signed-off-by: Chengguang Xu 

Thanks for the patch but I don't think it's really useful. It will be very
rare that we race with quotaoff of dqget() fails due to error. So the
additional overhead of iterating over dquots doesn't really matter in that
case.


Hi Jan,

Thanks for the comment, I got it.

Chengguang.

Re: [PATCH V2] staging: fieldbus: anybus-s: force endiannes annotation

2019-04-29 Thread Nicholas Mc Guire

On Mon, Apr 29, 2019 at 10:03:36AM -0400, Sven Van Asbroeck wrote:
> On Mon, Apr 29, 2019 at 2:11 AM Nicholas Mc Guire  wrote:
> >
> > V2: As requested by Sven Van Asbroeck  make the
> > impact of the patch clear in the commit message.
> 
> Thank you, but did you miss my comment about creating a local variable
> instead? See:
> https://lkml.org/lkml/2019/4/28/97

Did not miss it - I just don't think that makes it any more
understandable - the __force __be16 makes it clear I believe
that this is correct, sparse does not like this though - so tell
sparse. The local variable would need to be explained as it is
functionally not necessary - therefor I find it more confusing
that using  __force here.

If that rational is wrong let me know.

thx!
hofrat

[PATCH] treewide: fix awk regexp over-escaping

2019-04-29 Thread Alex Xu (Hello71)

Fix "warning: regexp escape sequence is not a known regexp operator" on
gawk 5.0.0.

Results found by:

- grepping '\\[^\[\\^$.|?*+()a-z]' on *.awk
- grepping 'awk.*\\[^\[\\^$.|?*+()a-z]'
- running awk --lint -f /dev/null on *.awk

Signed-off-by: Alex Xu (Hello71) 
---
 Documentation/arm/Samsung/clksrc-change-registers.awk  | 2 +-
 arch/x86/tools/gen-insn-attr-x86.awk   | 4 ++--
 lib/raid6/unroll.awk   | 2 +-
 tools/objtool/arch/x86/tools/gen-insn-attr-x86.awk | 4 ++--
 tools/perf/arch/x86/tests/gen-insn-x86-dat.awk | 2 +-
 tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk | 4 ++--
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/Documentation/arm/Samsung/clksrc-change-registers.awk 
b/Documentation/arm/Samsung/clksrc-change-registers.awk
index 7be1b8aa7cd9..d853f750c861 100755
--- a/Documentation/arm/Samsung/clksrc-change-registers.awk
+++ b/Documentation/arm/Samsung/clksrc-change-registers.awk
@@ -67,7 +67,7 @@ BEGIN {
 # to replace and create an associative array of values
 
 while (getline line < ARGV[1] > 0) {
-   if (line ~ /\#define.*_MASK/ &&
+   if (line ~ /#define.*_MASK/ &&
!(line ~ /USB_SIG_MASK/)) {
splitdefine(line, fields)
name = fields[0]
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk 
b/arch/x86/tools/gen-insn-attr-x86.awk
index b02a36b2c14f..a42015b305f4 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -69,7 +69,7 @@ BEGIN {
 
lprefix1_expr = "\\((66|!F3)\\)"
lprefix2_expr = "\\(F3\\)"
-   lprefix3_expr = "\\((F2|!F3|66\\&F2)\\)"
+   lprefix3_expr = "\\((F2|!F3|66&F2)\\)"
lprefix_expr = "\\((66|F2|F3)\\)"
max_lprefix = 4
 
@@ -257,7 +257,7 @@ function convert_operands(count,opnd,   i,j,imm,mod)
return add_flags(imm, mod)
 }
 
-/^[0-9a-f]+\:/ {
+/^[0-9a-f]+:/ {
if (NR == 1)
next
# get index
diff --git a/lib/raid6/unroll.awk b/lib/raid6/unroll.awk
index c6aa03631df8..0809805a7e23 100644
--- a/lib/raid6/unroll.awk
+++ b/lib/raid6/unroll.awk
@@ -13,7 +13,7 @@ BEGIN {
for (i = 0; i < rep; ++i) {
tmp = $0
gsub(/\$\$/, i, tmp)
-   gsub(/\$\#/, n, tmp)
+   gsub(/\$#/, n, tmp)
gsub(/\$\*/, "$", tmp)
print tmp
}
diff --git a/tools/objtool/arch/x86/tools/gen-insn-attr-x86.awk 
b/tools/objtool/arch/x86/tools/gen-insn-attr-x86.awk
index b02a36b2c14f..a42015b305f4 100644
--- a/tools/objtool/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/objtool/arch/x86/tools/gen-insn-attr-x86.awk
@@ -69,7 +69,7 @@ BEGIN {
 
lprefix1_expr = "\\((66|!F3)\\)"
lprefix2_expr = "\\(F3\\)"
-   lprefix3_expr = "\\((F2|!F3|66\\&F2)\\)"
+   lprefix3_expr = "\\((F2|!F3|66&F2)\\)"
lprefix_expr = "\\((66|F2|F3)\\)"
max_lprefix = 4
 
@@ -257,7 +257,7 @@ function convert_operands(count,opnd,   i,j,imm,mod)
return add_flags(imm, mod)
 }
 
-/^[0-9a-f]+\:/ {
+/^[0-9a-f]+:/ {
if (NR == 1)
next
# get index
diff --git a/tools/perf/arch/x86/tests/gen-insn-x86-dat.awk 
b/tools/perf/arch/x86/tests/gen-insn-x86-dat.awk
index a21454835cd4..27585d032ee6 100644
--- a/tools/perf/arch/x86/tests/gen-insn-x86-dat.awk
+++ b/tools/perf/arch/x86/tests/gen-insn-x86-dat.awk
@@ -31,7 +31,7 @@ BEGIN {
going = 0
 }
 
-/^\s*[0-9a-fA-F]+\:/ {
+/^\s*[0-9a-fA-F]+:/ {
if (going) {
colon_pos = index($0, ":")
useful_line = substr($0, colon_pos + 1)
diff --git a/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk 
b/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
index ddd5c4c21129..606ccd154392 100644
--- a/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
+++ b/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
@@ -69,7 +69,7 @@ BEGIN {
 
lprefix1_expr = "\\((66|!F3)\\)"
lprefix2_expr = "\\(F3\\)"
-   lprefix3_expr = "\\((F2|!F3|66\\&F2)\\)"
+   lprefix3_expr = "\\((F2|!F3|66&F2)\\)"
lprefix_expr = "\\((66|F2|F3)\\)"
max_lprefix = 4
 
@@ -257,7 +257,7 @@ function convert_operands(count,opnd,   i,j,imm,mod)
return add_flags(imm, mod)
 }
 
-/^[0-9a-f]+\:/ {
+/^[0-9a-f]+:/ {
if (NR == 1)
next
# get index
-- 
2.21.0

1 2 3 4 5 6 7 8 >

1 - 100 of 757 matches

Mail list logo