date:20190425

Re: [PATCH 19/28] locking/lockdep: Optimize irq usage check when marking lock usage bit

2019-04-25 Thread Yuyang Du

Thanks for review.

On Fri, 26 Apr 2019 at 03:32, Peter Zijlstra  wrote:
>
> On Wed, Apr 24, 2019 at 06:19:25PM +0800, Yuyang Du wrote:
>
> After only a quick read of these next patches; this is the one that
> worries me most.
>
> You did mention Frederic's patches, but I'm not entirely sure you're
> aware why he's doing them. He's preparing to split the softirq state
> into one state per softirq vector.
>
> See here:
>
>   https://lkml.kernel.org/r/20190228171242.32144-14-frede...@kernel.org
>   https://lkml.kernel.org/r/20190228171242.32144-15-frede...@kernel.org
>
> IOW he's going to massively explode this storage.

If I understand correctly, he is not going to.

First of all, we can divide the whole usage thing into tracking and checking.

Frederic's fine-grained soft vector state is applied to usage
tracking, i.e., which specific vectors a lock is used or enabled.

But for usage checking, which vectors are does not really matter. So,
the current size of the arrays and bitmaps are good enough. Right?

[PATCH v4] cpufreq: qoriq: add support for lx2160a

2019-04-25 Thread Vabhav Sharma

Enable support of NXP SoC lx2160a to handle the
lx2160a SoC.

Signed-off-by: Tang Yuantian 
Signed-off-by: Yogesh Gaur 
Signed-off-by: Vabhav Sharma 
Acked-by: Scott Wood 
Acked-by: Stephen Boyd 
Acked-by: Viresh Kumar 
---
Changes for v4:
- Incorporated review comments from Stephen Boyd

Changes for v3:
- Incorporated review comments of Rafael J. Wysocki
- Updated commit message

Changes for v2:
- Subject line updated

 drivers/cpufreq/qoriq-cpufreq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/cpufreq/qoriq-cpufreq.c b/drivers/cpufreq/qoriq-cpufreq.c
index 4295e54..81f0288 100644
--- a/drivers/cpufreq/qoriq-cpufreq.c
+++ b/drivers/cpufreq/qoriq-cpufreq.c
@@ -284,6 +284,7 @@ static const struct of_device_id node_matches[] __initconst 
= {
{ .compatible = "fsl,ls1046a-clockgen", },
{ .compatible = "fsl,ls1088a-clockgen", },
{ .compatible = "fsl,ls2080a-clockgen", },
+   { .compatible = "fsl,lx2160a-clockgen", },
{ .compatible = "fsl,p4080-clockgen", },
{ .compatible = "fsl,qoriq-clockgen-1.0", },
{ .compatible = "fsl,qoriq-clockgen-2.0", },
-- 
2.7.4

[PATCH v4] clk: qoriq: add support for lx2160a

2019-04-25 Thread Vabhav Sharma

Add clockgen support and configuration for NXP SoC lx2160a
with compatible property as "fsl,lx2160a-clockgen".

Signed-off-by: Tang Yuantian 
Signed-off-by: Yogesh Gaur 
Signed-off-by: Vabhav Sharma 
Acked-by: Scott Wood 
Acked-by: Stephen Boyd 
Acked-by: Viresh Kumar 
---
Changes for v4:
- Incorporated review comments from Stephen Boyd
 
Changes for v3:
- Incorporated review comments of Rafael J. Wysocki
- Updated commit message

Changes for v2:
- Subject line updated

 drivers/clk/clk-qoriq.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/clk/clk-qoriq.c b/drivers/clk/clk-qoriq.c
index 3d51d7c..1a15201 100644
--- a/drivers/clk/clk-qoriq.c
+++ b/drivers/clk/clk-qoriq.c
@@ -570,6 +570,17 @@ static const struct clockgen_chipinfo chipinfo[] = {
.flags = CG_VER3 | CG_LITTLE_ENDIAN,
},
{
+   .compat = "fsl,lx2160a-clockgen",
+   .cmux_groups = {
+   &clockgen2_cmux_cga12, &clockgen2_cmux_cgb
+   },
+   .cmux_to_group = {
+   0, 0, 0, 0, 1, 1, 1, 1, -1
+   },
+   .pll_mask = 0x37,
+   .flags = CG_VER3 | CG_LITTLE_ENDIAN,
+   },
+   {
.compat = "fsl,p2041-clockgen",
.guts_compat = "fsl,qoriq-device-config-1.0",
.init_periph = p2041_init_periph,
@@ -1427,6 +1438,7 @@ CLK_OF_DECLARE(qoriq_clockgen_ls1043a, 
"fsl,ls1043a-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_ls1046a, "fsl,ls1046a-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_ls1088a, "fsl,ls1088a-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_ls2080a, "fsl,ls2080a-clockgen", clockgen_init);
+CLK_OF_DECLARE(qoriq_clockgen_lx2160a, "fsl,lx2160a-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_p2041, "fsl,p2041-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_p3041, "fsl,p3041-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_p4080, "fsl,p4080-clockgen", clockgen_init);
-- 
2.7.4

[PATCH] clk: imx: correct pfdv2 gate_bit/vld_bit operations

2019-04-25 Thread Anson Huang

The operations of pfdv2 gate_bit/valid_bit are incorrect,
they are defined as u8 for bit offset, but gate_bit is
actually assigned as mask which could be 32 bit long and
it causes overflow, and vld_bit is assigned as bit offset
based on incorrect gate_bit value, it causes incorrect
pfd clock gate status in clock tree, this patch fixes the
issue by assigning them as correct bit offset.

Fixes: 9fcb6be3b6c9 ("clk: imx: add pfdv2 support")
Signed-off-by: Anson Huang 
---
 drivers/clk/imx/clk-pfdv2.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/clk/imx/clk-pfdv2.c b/drivers/clk/imx/clk-pfdv2.c
index 7e9134b..fb567dc 100644
--- a/drivers/clk/imx/clk-pfdv2.c
+++ b/drivers/clk/imx/clk-pfdv2.c
@@ -43,7 +43,7 @@ static int clk_pfdv2_wait(struct clk_pfdv2 *pfd)
 {
u32 val;
 
-   return readl_poll_timeout(pfd->reg, val, val & pfd->vld_bit,
+   return readl_poll_timeout(pfd->reg, val, val & (1 << pfd->vld_bit),
  0, LOCK_TIMEOUT_US);
 }
 
@@ -55,7 +55,7 @@ static int clk_pfdv2_enable(struct clk_hw *hw)
 
spin_lock_irqsave(&pfd_lock, flags);
val = readl_relaxed(pfd->reg);
-   val &= ~pfd->gate_bit;
+   val &= ~(1 << pfd->gate_bit);
writel_relaxed(val, pfd->reg);
spin_unlock_irqrestore(&pfd_lock, flags);
 
@@ -70,7 +70,7 @@ static void clk_pfdv2_disable(struct clk_hw *hw)
 
spin_lock_irqsave(&pfd_lock, flags);
val = readl_relaxed(pfd->reg);
-   val |= pfd->gate_bit;
+   val |= (1 << pfd->gate_bit);
writel_relaxed(val, pfd->reg);
spin_unlock_irqrestore(&pfd_lock, flags);
 }
@@ -123,7 +123,7 @@ static int clk_pfdv2_is_enabled(struct clk_hw *hw)
 {
struct clk_pfdv2 *pfd = to_clk_pfdv2(hw);
 
-   if (readl_relaxed(pfd->reg) & pfd->gate_bit)
+   if (readl_relaxed(pfd->reg) & (1 << pfd->gate_bit))
return 0;
 
return 1;
@@ -180,7 +180,7 @@ struct clk_hw *imx_clk_pfdv2(const char *name, const char 
*parent_name,
return ERR_PTR(-ENOMEM);
 
pfd->reg = reg;
-   pfd->gate_bit = 1 << ((idx + 1) * 8 - 1);
+   pfd->gate_bit = (idx + 1) * 8 - 1;
pfd->vld_bit = pfd->gate_bit - 1;
pfd->frac_off = idx * 8;
 
-- 
2.7.4

Re: [PATCH] dt-bindings: Add silabs,si5341

2019-04-25 Thread Mike Looijmans

On 26-04-19 01:04, Stephen Boyd wrote:
> Quoting Mike Looijmans (2019-04-24 02:02:16)
>> Adds the devicetree bindings for the si5341 driver that supports the
>> Si5341 and Si5340 chips.
>>
>> Signed-off-by: Mike Looijmans 
>> ---
>>   .../bindings/clock/silabs,si5341.txt  | 141 ++
>>   1 file changed, 141 insertions(+)
>>   create mode 100644 
>> Documentation/devicetree/bindings/clock/silabs,si5341.txt
>>
>> diff --git a/Documentation/devicetree/bindings/clock/silabs,si5341.txt 
>> b/Documentation/devicetree/bindings/clock/silabs,si5341.txt
>> new file mode 100644
>> index ..1a00dd83100f
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/clock/silabs,si5341.txt
>> @@ -0,0 +1,141 @@
>> +Binding for Silicon Labs Si5341 and Si5340 programmable i2c clock generator.
>> +
>> +Reference
>> +[1] Si5341 Data Sheet
>> +
>> https://www.silabs.com/documents/public/reference-manuals/Si5341-40-D-RM.pdf
> 
> Thanks! I also had to look up the pinout in the datasheet, not the
> reference manual above.

Now you mention it, this is the "reference manual", not the datasheet. I'll 
add a reference to that as well.

>> +
>> +The Si5341 and Si5340 are programmable i2c clock generators with up to 10 
>> output
>> +clocks. The chip contains a PLL that sources 5 (or 4) multisynth clocks, 
>> which
>> +in turn can be directed to any of the 10 (or 4) outputs through a divider.
>> +The internal structure of the clock generators can be found in [1].
>> +
>> +The driver can be used in "as is" mode, reading the current settings from 
>> the
>> +chip at boot, in case you have a (pre-)programmed device. If the PLL is not
>> +configured when the driver probes, it assumes the driver must fully 
>> initialize
>> +it.
>> +
>> +The device type, speed grade and revision are determined runtime by probing.
>> +
>> +The driver currently only supports XTAL input mode, and does not support any
>> +fancy input configurations. They can still be programmed into the chip and
>> +the driver will leave them "as is".
>> +
>> +==I2C device node==
>> +
>> +Required properties:
>> +- compatible: shall be one of the following: "silabs,si5341", 
>> "silabs,si5340"
>> +- reg: i2c device address, usually 0x74
>> +- #clock-cells: from common clock binding; shall be set to 1.
>> +- clocks: from common clock binding; list of parent clock
>> +  handles, shall be xtal reference clock. Usually a fixed clock.
> 
> Is there only one possible clk parent? Looks like there's an optional
> xtal on the XA/XB pins and then up to three more input clks on IN0/1/2.
> So shouldn't this list all of those and then indicate that at least one
> should be specified at all times?
> 
>> +- clock-names: Shall be "xtal".
> 
> This should include the other clk inputs?

Some day maybe. That's what I meant when I wrote "does not support any fancy 
input configurations".

The input config is horrendously complex. We have never used anything but just 
the xtal input, and I think that goes for 99.9% of the use cases for this chip.

I already went way over budget with this one, my first intention was to write 
a driver that takes a firmware blob from the "clockbuilder" software, but 
while writing it I discovered that the whole damn thing could easily be 
controlled completely without it.

> 
>> +- #address-cells: shall be set to 1.
>> +- #size-cells: shall be set to 0.
> 
> I'd expect to see all the input voltage supplies here too.
> 
>   vdd-supply
>   vdda-supply
>   vdds-supply
>   vdd0-supply
>   vdd1-supply
>   vdd2-supply
>   vdd3-supply
>   vdd4-supply
>   vdd5-supply
>   vdd6-supply
>   vdd7-supply
>   vdd8-supply
>   vdd9-supply

I'll look into it. Might be useful for some register settings.

>> +
>> +Optional properties:
>> +- silabs,pll-m-num, silabs,pll-m-den: Numerator and denominator for PLL
>> +  feedback divider. Must be such that the PLL output is in the valid range. 
>> For
>> +  example, to create 14GHz from a 48MHz xtal, use m-num=14000 and m-den=48. 
>> Only
>> +  the fraction matters, using 3500 and 12 will deliver the exact same 
>> result.
>> +  If these are not specified, and the PLL is not yet programmed when the 
>> driver
>> +  probes, the PLL will be set to 14GHz.
> 
> Can this be done via assigned-clock-rates? Possibly with a table in the
> clk driver to tell us how to generate those rates.

The PLL frequency choice determines who'll get jitter and who won't. It's 
ridiculously accurate too.

For example, if you need a 26 MHz and a 100 MHz output, there's no solution 
for the PLL that makes both clocks an integer divider (SI is vague about it, 
but apparently integer dividers have less jitter on output). Only the enduser 
can say which clock will get the better quality.

> 
>> +- silabs,reprogram: When present, the driver will always assume the device 
>> must
>> +  be initialized, and always performs the soft-reset routine. Since this 
>> will
>> +  temporarily stop

RE: [EXT] Re: [PATCH v3] clk: qoriq: add support for lx2160a

2019-04-25 Thread Vabhav Sharma



> -Original Message-
> From: Stephen Boyd 
> Sent: Thursday, April 25, 2019 11:52 PM
> To: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> p...@vger.kernel.org; Vabhav Sharma 
> Cc: mturque...@baylibre.com; r...@rjwysocki.net; viresh.ku...@linaro.org;
> Yogesh Narayan Gaur ; Andy Tang
> ; Vabhav Sharma 
> Subject: [EXT] Re: [PATCH v3] clk: qoriq: add support for lx2160a
> 
> Caution: EXT Email
> 
> Quoting Vabhav Sharma (2019-04-25 06:57:05)
> > From: Yogesh Gaur 
> >
> > Add clockgen support and configuration for NXP SoC lx2160a in qoriq
> > clock driver with compatible property as "fsl,lx2160a-clockgen".
> >
> > qoriq-cpufreq driver is based on qoriq clock driver, enable support of
> > NXP SoC lx2160a in qoriq cpufreq driver to handle the lx2160a SoC.
> >
> > Signed-off-by: Tang Yuantian 
> > Signed-off-by: Yogesh Gaur 
> > Signed-off-by: Vabhav Sharma 
> > Acked-by: Scott Wood 
> > Acked-by: Stephen Boyd 
> > Acked-by: Viresh Kumar 
> > ---
> > Changes for v3:
> > - Incorporated review comments of Rafael J. Wysocki
> > - Updated commit message
> 
> If you can split it into clk and cpufreq that would be preferred. Then I can
> take the clk part and PM tree can take the cpufreq part. Otherwise, you have
> sent other patches to drivers/clk/clk-qoriq.c and I'm worried there will be
> cross tree conflicts if I take those other patches this cycle.
Agree, sure.
I will split the patch and sent it to clk and PM tree.

Re: [PATCH 20/28] locking/lockdep: Refactorize check_noncircular and check_redundant

2019-04-25 Thread Yuyang Du

Thanks for review.

On Fri, 26 Apr 2019 at 03:48, Peter Zijlstra  wrote:
>
> On Wed, Apr 24, 2019 at 06:19:26PM +0800, Yuyang Du wrote:
> > These two functions now handle different check results themselves. A new
> > check_path function is added to check whether there is a path in the
> > dependency graph. No functional change.
>
> This looks good, however I completely forgot we still had the redundant
> thing.
>
> It was added for cross-release (which has since been reverted) which
> would generate a lot of redundant links (IIRC) but having it makes the
> reports more convoluted -- basically, if we had an A-B-C relation, then
> A-C will not be added to the graph because it is already covered. This
> then means any report will include B, even though a shorter cycle might
> have been possible.
>
> Maybe we should make the whole redundant check depend on LOCKDEP_SMALL
> for now.

Sure. I can do that.

[PATCH v4 4/6] usb: roles: add API to get usb_role_switch by node

2019-04-25 Thread Chunfeng Yun

Add fwnode_usb_role_switch_get() to make easier to get
usb_role_switch by fwnode which register it.
It's useful when there is not device_connection registered
between two drivers and only knows the fwnode which register
usb_role_switch.

Signed-off-by: Chunfeng Yun 
---
v4 changes:
  1. use switch_fwnode_match() to find fwnode suggested by Heikki
  2. this patch now depends on [1]

 [1] [v6,08/13] usb: roles: Introduce stubs for the exiting functions in role.h
https://patchwork.kernel.org/patch/10909971/

v3 changes:
  1. use fwnodes instead of node suggested by Andy
  2. rebuild the API suggested by Heikki

v2 no changes
---
 drivers/usb/roles/class.c | 25 +
 include/linux/usb/role.h  |  8 
 2 files changed, 33 insertions(+)

diff --git a/drivers/usb/roles/class.c b/drivers/usb/roles/class.c
index f45d8df5cfb8..994fcb979795 100644
--- a/drivers/usb/roles/class.c
+++ b/drivers/usb/roles/class.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct class *role_class;
@@ -135,6 +136,30 @@ struct usb_role_switch *usb_role_switch_get(struct device 
*dev)
 }
 EXPORT_SYMBOL_GPL(usb_role_switch_get);
 
+/**
+ * fwnode_usb_role_switch_get - Find USB role switch by it's parent fwnode
+ * @fwnode: The fwnode that register USB role switch
+ *
+ * Finds and returns role switch registered by @fwnode. The reference count
+ * for the found switch is incremented.
+ */
+struct usb_role_switch *
+fwnode_usb_role_switch_get(struct fwnode_handle *fwnode)
+{
+   struct usb_role_switch *sw;
+   struct device *dev;
+
+   dev = class_find_device(role_class, NULL, fwnode, switch_fwnode_match);
+   if (!dev)
+   return ERR_PTR(-EPROBE_DEFER);
+
+   sw = to_role_switch(dev);
+   WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
+
+   return sw;
+}
+EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get);
+
 /**
  * usb_role_switch_put - Release handle to a switch
  * @sw: USB Role Switch
diff --git a/include/linux/usb/role.h b/include/linux/usb/role.h
index da2b9641b877..35d460f9ec40 100644
--- a/include/linux/usb/role.h
+++ b/include/linux/usb/role.h
@@ -48,6 +48,8 @@ int usb_role_switch_set_role(struct usb_role_switch *sw, enum 
usb_role role);
 enum usb_role usb_role_switch_get_role(struct usb_role_switch *sw);
 struct usb_role_switch *usb_role_switch_get(struct device *dev);
 void usb_role_switch_put(struct usb_role_switch *sw);
+struct usb_role_switch *
+fwnode_usb_role_switch_get(struct fwnode_handle *fwnode);
 
 struct usb_role_switch *
 usb_role_switch_register(struct device *parent,
@@ -72,6 +74,12 @@ static inline struct usb_role_switch 
*usb_role_switch_get(struct device *dev)
 
 static inline void usb_role_switch_put(struct usb_role_switch *sw) { }
 
+static inline struct usb_role_switch *
+fwnode_usb_role_switch_get(struct fwnode_handle *fwnode)
+{
+   return ERR_PTR(-ENODEV);
+}
+
 static inline struct usb_role_switch *
 usb_role_switch_register(struct device *parent,
 const struct usb_role_switch_desc *desc)
-- 
2.21.0

[PATCH v4 5/6] usb: roles: add USB Type-B GPIO connector driver

2019-04-25 Thread Chunfeng Yun

Due to the requirement of usb-connector.txt binding, the old way
using extcon to support USB Dual-Role switch is now deprecated
when use Type-B connector.
This patch introduces a driver of Type-B connector which typically
uses an input GPIO to detect USB ID pin, and try to replace the
function provided by extcon-usb-gpio driver

Signed-off-by: Chunfeng Yun 
---
v4 changes:
  1. remove linux/gpio.h suggested by Linus
  2. put node when error happens

v3 changes:
  1. treat bype-B connector as a virtual device;
  2. change file name again

v2 changes:
  1. file name is changed
  2. use new compatible
---
 drivers/usb/roles/Kconfig   |  11 +
 drivers/usb/roles/Makefile  |   1 +
 drivers/usb/roles/typeb-conn-gpio.c | 305 
 3 files changed, 317 insertions(+)
 create mode 100644 drivers/usb/roles/typeb-conn-gpio.c

diff --git a/drivers/usb/roles/Kconfig b/drivers/usb/roles/Kconfig
index f8b31aa67526..d1156e18a81a 100644
--- a/drivers/usb/roles/Kconfig
+++ b/drivers/usb/roles/Kconfig
@@ -26,4 +26,15 @@ config USB_ROLES_INTEL_XHCI
  To compile the driver as a module, choose M here: the module will
  be called intel-xhci-usb-role-switch.
 
+config TYPEB_CONN_GPIO
+   tristate "USB Type-B GPIO Connector"
+   depends on GPIOLIB
+   help
+ The driver supports USB role switch between host and device via GPIO
+ based USB cable detection, used typically if an input GPIO is used
+ to detect USB ID pin.
+
+ To compile the driver as a module, choose M here: the module will
+ be called typeb-conn-gpio.ko
+
 endif # USB_ROLE_SWITCH
diff --git a/drivers/usb/roles/Makefile b/drivers/usb/roles/Makefile
index 757a7d2797eb..5d5620d9d113 100644
--- a/drivers/usb/roles/Makefile
+++ b/drivers/usb/roles/Makefile
@@ -3,3 +3,4 @@
 obj-$(CONFIG_USB_ROLE_SWITCH)  += roles.o
 roles-y:= class.o
 obj-$(CONFIG_USB_ROLES_INTEL_XHCI) += intel-xhci-usb-role-switch.o
+obj-$(CONFIG_TYPEB_CONN_GPIO)  += typeb-conn-gpio.o
diff --git a/drivers/usb/roles/typeb-conn-gpio.c 
b/drivers/usb/roles/typeb-conn-gpio.c
new file mode 100644
index ..097d2ca12a12
--- /dev/null
+++ b/drivers/usb/roles/typeb-conn-gpio.c
@@ -0,0 +1,305 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * USB Type-B GPIO Connector Driver
+ *
+ * Copyright (C) 2019 MediaTek Inc.
+ *
+ * Author: Chunfeng Yun 
+ *
+ * Some code borrowed from drivers/extcon/extcon-usb-gpio.c
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define USB_GPIO_DEB_MS20  /* ms */
+#define USB_GPIO_DEB_US((USB_GPIO_DEB_MS) * 1000)  /* us */
+
+#define USB_CONN_IRQF  \
+   (IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING | IRQF_ONESHOT)
+
+struct usb_conn_info {
+   struct device *dev;
+   struct usb_role_switch *role_sw;
+   enum usb_role last_role;
+   struct regulator *vbus;
+   struct delayed_work dw_det;
+   unsigned long debounce_jiffies;
+
+   struct gpio_desc *id_gpiod;
+   struct gpio_desc *vbus_gpiod;
+   int id_irq;
+   int vbus_irq;
+};
+
+/**
+ * "DEVICE" = VBUS and "HOST" = !ID, so we have:
+ * Both "DEVICE" and "HOST" can't be set as active at the same time
+ * so if "HOST" is active (i.e. ID is 0)  we keep "DEVICE" inactive
+ * even if VBUS is on.
+ *
+ *  Role  |   ID  |  VBUS
+ * 
+ *  [1] DEVICE|   H   |   H
+ *  [2] NONE  |   H   |   L
+ *  [3] HOST  |   L   |   H
+ *  [4] HOST  |   L   |   L
+ *
+ * In case we have only one of these signals:
+ * - VBUS only - we want to distinguish between [1] and [2], so ID is always 1
+ * - ID only - we want to distinguish between [1] and [4], so VBUS = ID
+ */
+static void usb_conn_detect_cable(struct work_struct *work)
+{
+   struct usb_conn_info *info;
+   enum usb_role role;
+   int id, vbus, ret;
+
+   info = container_of(to_delayed_work(work),
+   struct usb_conn_info, dw_det);
+
+   /* check ID and VBUS */
+   id = info->id_gpiod ?
+   gpiod_get_value_cansleep(info->id_gpiod) : 1;
+   vbus = info->vbus_gpiod ?
+   gpiod_get_value_cansleep(info->vbus_gpiod) : id;
+
+   if (!id)
+   role = USB_ROLE_HOST;
+   else if (vbus)
+   role = USB_ROLE_DEVICE;
+   else
+   role = USB_ROLE_NONE;
+
+   dev_dbg(info->dev, "role %d/%d, gpios: id %d, vbus %d\n",
+   info->last_role, role, id, vbus);
+
+   if (info->last_role == role) {
+   dev_warn(info->dev, "repeated role: %d\n", role);
+   return;
+   }
+
+   if (info->last_role == USB_ROLE_HOST)
+   regulator_disable(info->vbus);
+
+   ret = usb_role_switch_set_role(info->role_sw, role);
+   if (ret)
+   dev_err(in

Re: [PATCH 22/28] locking/lockdep: Adjust new bit cases in mark_lock

2019-04-25 Thread Yuyang Du

Thanks for review.

On Fri, 26 Apr 2019 at 03:52, Peter Zijlstra  wrote:
> > + if (new_bit >= LOCK_USAGE_STATES) {
> > + WARN_ON(1);
>
> Does that want to be DEBUG_LOCKS_WARN_ON() ?

Indeed, it was.

[v4 PATCH 0/6] add USB Type-B GPIO connector driver

2019-04-25 Thread Chunfeng Yun

Because the USB Connector is introduced and the requirement of
usb-connector.txt binding, the old way using extcon to support
USB Dual-Role switch is now deprecated, meanwhile there is no
available common driver when use Type-B connector, typically
using an input GPIO to detect USB ID pin.
This patch series introduce a Type-B GPIO connector driver and try
to replace the function provided by extcon-usb-gpio driver.

v4 changes:
  1. use switch_fwnode_match() to find fwnode suggested by Heikki
  2. assign fwnode member of usb_role_switch struct suggested by Heikki
  3. make [4/6] depend on [2]
  3. remove linux/gpio.h suggested by Linus
  4. put node when error happens

  [4/6] usb: roles: add API to get usb_role_switch by node
  [2] [v6,08/13] usb: roles: Introduce stubs for the exiting functions in role.h
https://patchwork.kernel.org/patch/10909971/

v3 changes:
  1. add GPIO direction, and use fixed-regulator for GPIO controlled
VBUS regulator suggested by Rob;
  2. rebuild fwnode_usb_role_switch_get() suggested by Andy and Heikki
  3. treat the type-B connector as a virtual device;
  4. change file name of driver again
  5. select USB_ROLE_SWITCH in mtu3/Kconfig suggested by Heikki
  6. rename ssusb_mode_manual_switch() to ssusb_mode_switch()

v2 changes:
 1. make binding clear, and add a extra compatible suggested by Hans

Chunfeng Yun (6):
  dt-bindings: connector: add optional properties for Type-B
  dt-bindings: usb: add binding for Type-B GPIO connector driver
  dt-bindings: usb: mtu3: add properties about USB Role Switch
  usb: roles: add API to get usb_role_switch by node
  usb: roles: add USB Type-B GPIO connector driver
  usb: mtu3: register a USB Role Switch for dual role mode

 .../bindings/connector/usb-connector.txt  |  14 +
 .../devicetree/bindings/usb/mediatek,mtu3.txt |  10 +-
 .../bindings/usb/typeb-conn-gpio.txt  |  49 +++
 drivers/usb/mtu3/Kconfig  |   1 +
 drivers/usb/mtu3/mtu3.h   |   5 +
 drivers/usb/mtu3/mtu3_debugfs.c   |   4 +-
 drivers/usb/mtu3/mtu3_dr.c|  48 ++-
 drivers/usb/mtu3/mtu3_dr.h|   6 +-
 drivers/usb/mtu3/mtu3_plat.c  |   3 +-
 drivers/usb/roles/Kconfig |  11 +
 drivers/usb/roles/Makefile|   1 +
 drivers/usb/roles/class.c |  25 ++
 drivers/usb/roles/typeb-conn-gpio.c   | 305 ++
 include/linux/usb/role.h  |   8 +
 14 files changed, 481 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/usb/typeb-conn-gpio.txt
 create mode 100644 drivers/usb/roles/typeb-conn-gpio.c

-- 
2.21.0

Re: [PATCH 23/28] locking/lockdep: Update irqsafe lock bitmaps

2019-04-25 Thread Yuyang Du

Thanks for review.

On Fri, 26 Apr 2019 at 03:55, Peter Zijlstra  wrote:
> > + if (!dir) {
> > + unsigned long *bitmaps[4] = {
> > + lock_classes_hardirq_safe,
> > + lock_classes_hardirq_safe_read,
> > + lock_classes_softirq_safe,
> > + lock_classes_softirq_safe_read
>
> That again should be something CPP magic using lockdep_states.h.

Yes.

> Also, that array can be static const, right? It's just an index into the
> static bitmaps.

Sure.

[...]
> > +static inline void remove_irqsafe_lock_bitmap(struct lock_class *class)
> > +{
> > +#if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_PROVE_LOCKING)
> > + unsigned long usage = class->usage_mask;
> > +
> > + if (usage & LOCKF_USED_IN_HARDIRQ)
> > + __clear_bit(class - lock_classes, lock_classes_hardirq_safe);
> > + if (usage & LOCKF_USED_IN_HARDIRQ_READ)
> > + __clear_bit(class - lock_classes, 
> > lock_classes_hardirq_safe_read);
> > + if (usage & LOCKF_USED_IN_SOFTIRQ)
> > + __clear_bit(class - lock_classes, lock_classes_softirq_safe);
> > + if (usage & LOCKF_USED_IN_SOFTIRQ_READ)
> > + __clear_bit(class - lock_classes, 
> > lock_classes_softirq_safe_read);
>
> More CPP foo required here.

Definitely.

> Also, do we really need to test, we could
> just unconditionally clear the bits.

Actually, these tests are used later for another cause: we want to
know which safe usage may be changed by zapping this lock.

Re: [PATCH v2 05/11] powerpc/mm: get rid of mm_ctx_slice_mask_xxx()

2019-04-25 Thread Aneesh Kumar K.V

Christophe Leroy  writes:

> Now that slice_mask_for_size() is in mmu.h, the mm_ctx_slice_mask_xxx()
> are not needed anymore, so drop them. Note that the 8xx ones where
> not used anyway.
>

Reviewed-by: Aneesh Kumar K.V 

> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/book3s/64/mmu.h | 32 
> 
>  arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 17 ---
>  2 files changed, 4 insertions(+), 45 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
> b/arch/powerpc/include/asm/book3s/64/mmu.h
> index ad00355f874f..e3d7f1404e20 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -179,45 +179,21 @@ static inline void 
> mm_ctx_set_slb_addr_limit(mm_context_t *ctx, unsigned long li
>   ctx->hash_context->slb_addr_limit = limit;
>  }
>  
> -#ifdef CONFIG_PPC_64K_PAGES
> -static inline struct slice_mask *mm_ctx_slice_mask_64k(mm_context_t *ctx)
> -{
> - return &ctx->hash_context->mask_64k;
> -}
> -#endif
> -
> -static inline struct slice_mask *mm_ctx_slice_mask_4k(mm_context_t *ctx)
> -{
> - return &ctx->hash_context->mask_4k;
> -}
> -
> -#ifdef CONFIG_HUGETLB_PAGE
> -static inline struct slice_mask *mm_ctx_slice_mask_16m(mm_context_t *ctx)
> -{
> - return &ctx->hash_context->mask_16m;
> -}
> -
> -static inline struct slice_mask *mm_ctx_slice_mask_16g(mm_context_t *ctx)
> -{
> - return &ctx->hash_context->mask_16g;
> -}
> -#endif
> -
>  static inline struct slice_mask *slice_mask_for_size(mm_context_t *ctx, int 
> psize)
>  {
>  #ifdef CONFIG_PPC_64K_PAGES
>   if (psize == MMU_PAGE_64K)
> - return mm_ctx_slice_mask_64k(&ctx);
> + return &ctx->hash_context->mask_64k;
>  #endif
>  #ifdef CONFIG_HUGETLB_PAGE
>   if (psize == MMU_PAGE_16M)
> - return mm_ctx_slice_mask_16m(&ctx);
> + return &ctx->hash_context->mask_16m;
>   if (psize == MMU_PAGE_16G)
> - return mm_ctx_slice_mask_16g(&ctx);
> + return &ctx->hash_context->mask_16g;
>  #endif
>   VM_BUG_ON(psize != MMU_PAGE_4K);
>  
> - return mm_ctx_slice_mask_4k(&ctx);
> + return &ctx->hash_context->mask_4k;
>  }
>  
>  #ifdef CONFIG_PPC_SUBPAGE_PROT
> diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h 
> b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
> index a0f6844a1498..beded4df1f50 100644
> --- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
> +++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
> @@ -255,23 +255,6 @@ static inline void 
> mm_ctx_set_slb_addr_limit(mm_context_t *ctx, unsigned long li
>   ctx->slb_addr_limit = limit;
>  }
>  
> -static inline struct slice_mask *mm_ctx_slice_mask_base(mm_context_t *ctx)
> -{
> - return &ctx->mask_base_psize;
> -}
> -
> -#ifdef CONFIG_HUGETLB_PAGE
> -static inline struct slice_mask *mm_ctx_slice_mask_512k(mm_context_t *ctx)
> -{
> - return &ctx->mask_512k;
> -}
> -
> -static inline struct slice_mask *mm_ctx_slice_mask_8m(mm_context_t *ctx)
> -{
> - return &ctx->mask_8m;
> -}
> -#endif
> -
>  static inline struct slice_mask *slice_mask_for_size(mm_context_t *ctx, int 
> psize)
>  {
>  #ifdef CONFIG_HUGETLB_PAGE
> -- 
> 2.13.3

Re: [PATCH 2/2] HID: input: add mapping for KEY_KBD_LAYOUT_NEXT

2019-04-25 Thread Benjamin Tissoires

On Thu, Apr 25, 2019 at 6:38 PM Dmitry Torokhov
 wrote:
>
> HUTRR56 defined a new usage code on consumer page to cycle through
> set of keyboard layouts, let's add this mapping.
>
> Signed-off-by: Dmitry Torokhov 
> ---

Acked-by: Benjamin Tissoires 

I don't think this will collide with the HID tree, so IMO, you can
take this through yours if you want.

Cheers,
Benjamin

>  drivers/hid/hid-input.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/hid/hid-input.c b/drivers/hid/hid-input.c
> index b607286a0bc8..0579b8d3f912 100644
> --- a/drivers/hid/hid-input.c
> +++ b/drivers/hid/hid-input.c
> @@ -1051,6 +1051,8 @@ static void hidinput_configure_usage(struct hid_input 
> *hidinput, struct hid_fiel
> case 0x28b: map_key_clear(KEY_FORWARDMAIL); break;
> case 0x28c: map_key_clear(KEY_SEND);break;
>
> +   case 0x29d: map_key_clear(KEY_KBD_LAYOUT_NEXT); break;
> +
> case 0x2c7: map_key_clear(KEY_KBDINPUTASSIST_PREV);   
>   break;
> case 0x2c8: map_key_clear(KEY_KBDINPUTASSIST_NEXT);   
>   break;
> case 0x2c9: map_key_clear(KEY_KBDINPUTASSIST_PREVGROUP);  
>   break;
> --
> 2.21.0.593.g511ec345e18-goog
>

Re: [PATCH 26/28] locking/lockdep: Remove __bfs

2019-04-25 Thread Yuyang Du

Thanks for review.

On Fri, 26 Apr 2019 at 04:07, Peter Zijlstra  wrote:
>
> On Wed, Apr 24, 2019 at 06:19:32PM +0800, Yuyang Du wrote:
> > Since there is no need for backward dependecy searching, remove this
> > extra function layer.
>
> OK, so $subject confused the heck out of me, I thought you were going to
> remove the whole bfs machinery. May I suggest retaining
> __bfs_backwards() in the previous patch (which I'm _waay_ to tired for
> to look at now) and calling this patch: "Remove __bfs_backwards()".

Sure thing.

[RFC][PATCH] panic: make panic start/end messages consistent

2019-04-25 Thread Sergey Senozhatsky

We don't have consistency:
- we always print panic header
pr_emerg("Kernel panic - not syncing:")

- but we don't always print panic footer
pr_emerg("---[ end Kernel panic - not syncing:")

For instance, no panic footer (end panic) message will be
printed when panic_timeout is set - the kernel will either
reboot immediately after console_flush_on_panic() (emergency
restart) or after panic_timeout seconds. Additionally,
panic_print_sys_info() goes before panic footer line, which
doesn't look very right, panic_print_sys_info() is just
additional debugging into.

Let's make it consistent:

pr_emerg("Kernel panic - not syncing:")
dump_stack();
console_flush_on_panic();
pr_emerg("---[ end Kernel panic - not syncing:")

panic_print_sys_info();
/* the rest */
/* panic_timeout handling */

Signed-off-by: Sergey Senozhatsky 
---
 kernel/panic.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 40882dad9f70..6482e4b54f0b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -282,6 +282,7 @@ void panic(const char *fmt, ...)
 */
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+   pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
 
panic_print_sys_info();
 
@@ -331,8 +332,6 @@ void panic(const char *fmt, ...)
disabled_wait(caller);
}
 #endif
-   pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
-
/* Do not scroll important messages printed above */
suppress_printk = 1;
local_irq_enable();
-- 
2.21.0

Re: [PATCH 1/3] mfd: apple-ibridge: Add Apple iBridge MFD driver.

2019-04-25 Thread Benjamin Tissoires

On Fri, Apr 26, 2019 at 7:56 AM Life is hard, and then you die
 wrote:
>
>
>   Hi Benjamin,
>
> On Thu, Apr 25, 2019 at 11:39:12AM +0200, Benjamin Tissoires wrote:
> > On Thu, Apr 25, 2019 at 10:19 AM Life is hard, and then you die
> >  wrote:
> > >
> > >   Hi Benjamin,
> > >
> > > Thank you for looking at this.
> > >
> > > On Wed, Apr 24, 2019 at 04:18:23PM +0200, Benjamin Tissoires wrote:
> > > > On Mon, Apr 22, 2019 at 5:13 AM Ronald Tschalär  
> > > > wrote:
> > > > >
> > > > > The iBridge device provides access to several devices, including:
> > > > > - the Touch Bar
> > > > > - the iSight webcam
> > > > > - the light sensor
> > > > > - the fingerprint sensor
> > > > >
> > > > > This driver provides the core support for managing the iBridge device
> > > > > and the access to the underlying devices. In particular, since the
> > > > > functionality for the touch bar and light sensor is exposed via USB 
> > > > > HID
> > > > > interfaces, and the same HID device is used for multiple functions, 
> > > > > this
> > > > > driver provides a multiplexing layer that allows multiple HID drivers 
> > > > > to
> > > > > be registered for a given HID device. This allows the touch bar and 
> > > > > ALS
> > > > > driver to be separated out into their own modules.
> > > >
> > > > Sorry for coming late to the party, but IMO this series is far too
> > > > complex for what you need.
> > > >
> > > > As I read this and the first comment of drivers/mfd/apple-ibridge.c,
> > > > you need to have a HID driver that multiplex 2 other sub drivers
> > > > through one USB communication.
> > > > For that, you are using MFD, platform driver and you own sauce instead
> > > > of creating a bus.
> > >
> > > Basically correct. To be a bit more precise, there are currently two
> > > hid-devices and two drivers (touchbar and als) involved, with
> > > connections as follows (pardon the ugly ascii art):
> > >
> > >   hdev1  ---  tb-drv
> > >/
> > >   /
> > >  /
> > >   hdev2  ---  als-drv
> > >
> > > i.e. the touchbar driver talks to both hdev's, and hdev2's events
> > > (reports) are processed by both drivers (though each handles different
> > > reports).
> > >
> > > > So, how about we reuse entirely the HID subsystem which already
> > > > provides the capability you need (assuming I am correct above).
> > > > hid-logitech-dj already does the same kind of stuff and you could:
> > > > - create drivers/hid/hid-ibridge.c that handles USB_ID_PRODUCT_IBRIDGE
> > > > - hid-ibridge will then register itself to the hid subsystem with a
> > > > call to hid_hw_start(hdev, HID_CONNECT_HIDRAW) and
> > > > hid_device_io_start(hdev) to enable the events (so you don't create
> > > > useless input nodes for it)
> > > > - then you add your 2 new devices by calling hid_allocate_device() and
> > > > then hid_add_device(). You can even create a new HID group
> > > > APPLE_IBRIDGE and allocate 2 new PIDs for them to distinguish them
> > > > from the actual USB device.
> > > > - then you have 2 brand new HID devices you can create their driver as
> > > > a regular ones.
> > > >
> > > > hid-ibridge.c would just need to behave like any other hid transport
> > > > driver (see logi_dj_ll_driver in drivers/hid/hid-logitech-dj.c) and
> > > > you can get rid of at least the MFD and the platform part of your
> > > > drivers.
> > > >
> > > > Does it makes sense or am I missing something obvious in the middle?
> > >
> > > Yes, I think I understand, and I think this can work. Basically,
> > > instead of demux'ing at the hid-driver level as I am doing now (i.e.
> > > the iBridge hid-driver forwarding calls to the sub-hid-drivers), we
> > > demux at the hid-device level (events forwarded from iBridge hdev to
> > > all "virtual" sub-hdev's, and requests from sub-hdev's forwarded to
> > > the original hdev via an iBridge ll_driver attached to the
> > > sub-hdev's).
> > >
> > > So I would need to create 3 new "virtual" hid-devices (instances) as
> > > follows:
> > >
> > >   hdev1  ---  vhdev1  ---  tb-drv
> > > /
> > >   --  vhdev2  --
> > >  /
> > >   hdev2  ---  vhdev3  ---  als-drv
> > >
> > > (vhdev1 is probably not strictly necessary, but makes things more
> > > consistent).
> >
> > Oh, ok.
> >
> > How about the following:
> >
> > hdev1 and hdev2 are merged together in hid-apple-ibridge.c, and then
> > this driver creates 2 virtual hid drivers that are consistent
> >
> > like
> >
> > hdev1---ibridge-drv---vhdev1---tb-drv
> > hdev2--/   \--vhdev2---als-drv
>
> I don't think this will work. The problem is when the sub-drivers need
> to send a report or usb-command: how to they specify which hdev the
> report/command is destined for? While we could store the original hdev
> in each report (the hid_report's device field), that only works for
> hid_hw_request(), but not for things like hid_hw_raw_request() or
> hid_hw_output_report(). Now, currently I don't use the latter two; but
> I do need to send

Re: linux-next: build warning after merge of the char-misc tree

2019-04-25 Thread Greg KH

On Fri, Apr 26, 2019 at 03:56:53PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the char-misc tree, today's linux-next build (x86_64
> allmodconfig) produced this warning:
> 
> drivers/misc/aspeed-p2a-ctrl.c: In function 'aspeed_p2a_mmap':
> drivers/misc/aspeed-p2a-ctrl.c:110:2: warning: ISO C90 forbids mixed 
> declarations and code [-Wdeclaration-after-statement]
>   pgprot_t prot = vma->vm_page_prot;
>   ^~~~
> 
> Introduced by commit
> 
>   01c60dcea9f7 ("drivers/misc: Add Aspeed P2A control driver")

Patrick, I thought you fixed all of these already?  Can you send a patch
again?

Can you also make the driver so it can build with CONFIG_COMPILE_TEST
enabled so that others can find your problems earlier in the review
process?

thanks,

greg k-h

Re: [PATCH] tty: Don't force RISCV SBI console as preferred console

2019-04-25 Thread Anup Patel

On Fri, Apr 26, 2019 at 10:11 AM Atish Patra  wrote:
>
> On 4/25/19 6:35 AM, Anup Patel wrote:
> > The Linux kernel will auto-disables all boot consoles whenever it
> > gets a preferred real console.
> >
> > Currently on RISC-V systems, if we have a real console which is not
> > RISCV SBI console then boot consoles (such as earlycon=sbi) are not
> > auto-disabled when a real console (ttyS0 or ttySIF0) is available.
> > This results in duplicate prints at boot-time after kernel starts
> > using real console (i.e. ttyS0 or ttySIF0) if "earlycon=" kernel
> > parameter was passed by bootloader.
> >
> > The reason for above issue is that RISCV SBI console always adds
> > itself as preferred console which is causing other real consoles
> > to be not used as preferred console.
> >
>
> Do we even need HVC_SBI console to be enabled by default? Disabling
> CONFIG_HVC_RISCV_SBI seems to be fine while running in QEMU.

Actually, HVC_SBI console is useful on boards (such as SiFive Unleashed)
lacking upstream serial driver. It allows us to boot upstream kernel to prompt
on such boards with just timer driver (and probably irqchip driver).

Also, we should be able to use same kernel image on QEMU and SiFive
Unleashed board so disabling CONFIG_HVC_RISCV_SBI for QEMU is
a temporary solution.

>
> If we don't need it, I suggest we should remove the config option from
> defconfig in addition to this patch.

Like mentioned above, HVC_SBI is useful for newer SOCs and boards
where serial driver is not yet up-streamed.

Regards,
Anup

>
> Regards,
> Atish
> > Ideally "console=" kernel parameter passed by bootloaders should
> > be the one selecting a preferred real console.
> >
> > This patch fixes above issue by not forcing RISCV SBI console as
> > preferred console.
> >
> > Fixes: afa6b1ccfad5 ("tty: New RISC-V SBI console driver")
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Anup Patel 
> > ---
> >   drivers/tty/hvc/hvc_riscv_sbi.c | 1 -
> >   1 file changed, 1 deletion(-)
> >
> > diff --git a/drivers/tty/hvc/hvc_riscv_sbi.c 
> > b/drivers/tty/hvc/hvc_riscv_sbi.c
> > index 75155bde2b88..31f53fa77e4a 100644
> > --- a/drivers/tty/hvc/hvc_riscv_sbi.c
> > +++ b/drivers/tty/hvc/hvc_riscv_sbi.c
> > @@ -53,7 +53,6 @@ device_initcall(hvc_sbi_init);
> >   static int __init hvc_sbi_console_init(void)
> >   {
> >   hvc_instantiate(0, 0, &hvc_sbi_ops);
> > - add_preferred_console("hvc", 0, NULL);
> >
> >   return 0;
> >   }
> >
>

Re: [PATCH] tty: Don't force RISCV SBI console as preferred console

2019-04-25 Thread Christoph Hellwig

On Thu, Apr 25, 2019 at 09:41:21PM -0700, Atish Patra wrote:
> Do we even need HVC_SBI console to be enabled by default? Disabling
> CONFIG_HVC_RISCV_SBI seems to be fine while running in QEMU.
> 
> If we don't need it, I suggest we should remove the config option from
> defconfig in addition to this patch.

I think the whole concept of the SBI console is a little dangerous.
It means that for one piece of physical hardware (usually the uart)
we have two entiries (the M-mode firmware and the OS) in control,
which tends to rarely end well.

Re: [LKP] [btrfs] 302167c50b: fio.write_bw_MBps -12.4% regression

2019-04-25 Thread Huang, Ying

Hi, Josef,

kernel test robot  writes:

> Greeting,
>
> FYI, we noticed a -12.4% regression of fio.write_bw_MBps due to commit:
>
>
> commit: 302167c50b32e7fccc98994a91d40ddbbab04e52 ("btrfs: don't end the 
> transaction for delayed refs in throttle")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git pending-fixes
>
> in testcase: fio-basic
> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 
> 64G memory
> with following parameters:
>
>   runtime: 300s
>   nr_task: 8t
>   disk: 1SSD
>   fs: btrfs
>   rw: randwrite
>   bs: 4k
>   ioengine: sync
>   test_size: 400g
>   cpufreq_governor: performance
>   ucode: 0xb2e
>
> test-description: Fio is a tool that will spawn a number of threads or 
> processes doing a particular type of I/O action as specified by the user.
> test-url: https://github.com/axboe/fio
>
>

Do you have time to take a look at this regression?

Best Regards,
Huang, Ying

linux-next: manual merge of the staging tree with the v4l-dvb tree

2019-04-25 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the staging tree got conflicts in:

  drivers/staging/media/zoran/Kconfig
  drivers/staging/media/zoran/videocodec.c
  drivers/staging/media/zoran/videocodec.h
  drivers/staging/media/zoran/zoran.h
  drivers/staging/media/zoran/zoran_card.c
  drivers/staging/media/zoran/zoran_card.h
  drivers/staging/media/zoran/zoran_device.c
  drivers/staging/media/zoran/zoran_device.h
  drivers/staging/media/zoran/zoran_driver.c
  drivers/staging/media/zoran/zoran_procfs.c
  drivers/staging/media/zoran/zoran_procfs.h
  drivers/staging/media/zoran/zr36016.c
  drivers/staging/media/zoran/zr36016.h
  drivers/staging/media/zoran/zr36050.c
  drivers/staging/media/zoran/zr36050.h
  drivers/staging/media/zoran/zr36057.h
  drivers/staging/media/zoran/zr36060.c
  drivers/staging/media/zoran/zr36060.h

between commit:

  8dce4b265a53 ("media: zoran: remove deprecated driver")

from the v4l-dvb tree and various commits from the staging tree.

I fixed it up (I just removed the files) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpkyZw0Q9VzO.pgp
Description: OpenPGP digital signature

Re: [PATCH 2/2] mmc: sdhci_am654: Fix SLOTTYPE write

2019-04-25 Thread Adrian Hunter

On 25/04/19 6:57 PM, Faiz Abbas wrote:
> In the call to regmap_update_bits() for SLOTTYPE, the mask and value
> fields are exchanged. Fix this.

Could you also comment on whether this has any known effect on the driver.

> 
> Signed-off-by: Faiz Abbas 
> ---
>  drivers/mmc/host/sdhci_am654.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mmc/host/sdhci_am654.c b/drivers/mmc/host/sdhci_am654.c
> index 866a9082705f..613b151a73c5 100644
> --- a/drivers/mmc/host/sdhci_am654.c
> +++ b/drivers/mmc/host/sdhci_am654.c
> @@ -205,8 +205,8 @@ static int sdhci_am654_init(struct sdhci_host *host)
>   if (host->mmc->caps & MMC_CAP_NONREMOVABLE)
>   ctl_cfg_2 = SLOTTYPE_EMBEDDED;
>  
> - regmap_update_bits(sdhci_am654->base, CTL_CFG_2, ctl_cfg_2,
> -SLOTTYPE_MASK);
> + regmap_update_bits(sdhci_am654->base, CTL_CFG_2, SLOTTYPE_MASK,
> +ctl_cfg_2);
>  
>   return sdhci_add_host(host);
>  }
>

[PATCH 1/3] tty: simserial: drop unused iflag macro

2019-04-25 Thread Johan Hovold

Drop the RELEVANT_IFLAG() macro which hasn't been used for over a
decade.

Cc: Tony Luck 
Cc: Fenghua Yu 
Signed-off-by: Johan Hovold 
---
 arch/ia64/hp/sim/simserial.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/ia64/hp/sim/simserial.c b/arch/ia64/hp/sim/simserial.c
index 7aeb48a18576..1a338e541334 100644
--- a/arch/ia64/hp/sim/simserial.c
+++ b/arch/ia64/hp/sim/simserial.c
@@ -324,8 +324,6 @@ static int rs_ioctl(struct tty_struct *tty, unsigned int 
cmd, unsigned long arg)
return -ENOIOCTLCMD;
 }
 
-#define RELEVANT_IFLAG(iflag) (iflag & (IGNBRK|BRKINT|IGNPAR|PARMRK|INPCK))
-
 /*
  * This routine will shutdown a serial port; interrupts are disabled, and
  * DTR is dropped if the hangup on close termio flag is on.
-- 
2.21.0

[PATCH 0/3] tty: drop unused iflag macro

2019-04-25 Thread Johan Hovold

I noticed that the RELEVANT_IFLAG() macro was unused in USB serial and
turns out there were a few more instances that could be dropped.

I have some pending changes that may conflict with the corresponding
change to USB serial so I'll take that one separately through my tree,
but perhaps the rest could go through Greg's tty tree.

Johan


Johan Hovold (3):
  tty: simserial: drop unused iflag macro
  tty: ipoctal: drop unused iflag macro
  tty: cpm_uart: drop unused iflag macro

 arch/ia64/hp/sim/simserial.c| 2 --
 drivers/ipack/devices/ipoctal.h | 1 -
 drivers/tty/serial/cpm_uart/cpm_uart_core.c | 2 --
 3 files changed, 5 deletions(-)

-- 
2.21.0

[PATCH v2 13/17] powerpc/mm: cleanup HPAGE_SHIFT setup

2019-04-25 Thread Christophe Leroy

Only book3s/64 may select default among several HPAGE_SHIFT at runtime.
8xx always defines 512K pages as default
FSL_BOOK3E always defines 4M pages as default

This patch limits HUGETLB_PAGE_SIZE_VARIABLE to book3s/64
moves the definitions in subarches files.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  2 +-
 arch/powerpc/include/asm/hugetlb.h   |  2 ++
 arch/powerpc/include/asm/page.h  | 11 ---
 arch/powerpc/mm/hugetlbpage-hash64.c | 16 
 arch/powerpc/mm/hugetlbpage.c| 23 +++
 5 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5d8e692d6470..7815eb0cc2a5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -390,7 +390,7 @@ source "kernel/Kconfig.hz"
 
 config HUGETLB_PAGE_SIZE_VARIABLE
bool
-   depends on HUGETLB_PAGE
+   depends on HUGETLB_PAGE && PPC_BOOK3S_64
default y
 
 config MATH_EMULATION
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 84598c6b0959..20a101046cff 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -15,6 +15,8 @@
 
 extern bool hugetlb_disabled;
 
+void hugetlbpage_init_default(void);
+
 void flush_dcache_icache_hugepage(struct page *page);
 
 int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 6b508420d92b..dbc8c0679480 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -28,10 +28,15 @@
 #define PAGE_SIZE  (ASM_CONST(1) << PAGE_SHIFT)
 
 #ifndef __ASSEMBLY__
-#ifdef CONFIG_HUGETLB_PAGE
-extern unsigned int HPAGE_SHIFT;
-#else
+#ifndef CONFIG_HUGETLB_PAGE
 #define HPAGE_SHIFT PAGE_SHIFT
+#elif defined(CONFIG_PPC_BOOK3S_64)
+extern unsigned int hpage_shift;
+#define HPAGE_SHIFT hpage_shift
+#elif defined(CONFIG_PPC_8xx)
+#define HPAGE_SHIFT19  /* 512k pages */
+#elif defined(CONFIG_PPC_FSL_BOOK3E)
+#define HPAGE_SHIFT22  /* 4M pages */
 #endif
 #define HPAGE_SIZE ((1UL) << HPAGE_SHIFT)
 #define HPAGE_MASK (~(HPAGE_SIZE - 1))
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c 
b/arch/powerpc/mm/hugetlbpage-hash64.c
index b0d9209d9a86..7a58204c3688 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -15,6 +15,9 @@
 #include 
 #include 
 
+unsigned int hpage_shift;
+EXPORT_SYMBOL(hpage_shift);
+
 extern long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
  unsigned long pa, unsigned long rlags,
  unsigned long vflags, int psize, int ssize);
@@ -145,3 +148,16 @@ void huge_ptep_modify_prot_commit(struct vm_area_struct 
*vma, unsigned long addr
   old_pte, pte);
set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
 }
+
+void hugetlbpage_init_default(void)
+{
+   /* Set default large page size. Currently, we pick 16M or 1M
+* depending on what is available
+*/
+   if (mmu_psize_defs[MMU_PAGE_16M].shift)
+   hpage_shift = mmu_psize_defs[MMU_PAGE_16M].shift;
+   else if (mmu_psize_defs[MMU_PAGE_1M].shift)
+   hpage_shift = mmu_psize_defs[MMU_PAGE_1M].shift;
+   else if (mmu_psize_defs[MMU_PAGE_2M].shift)
+   hpage_shift = mmu_psize_defs[MMU_PAGE_2M].shift;
+}
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 828860a7492e..265bd6d04233 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -28,9 +28,6 @@
 
 bool hugetlb_disabled = false;
 
-unsigned int HPAGE_SHIFT;
-EXPORT_SYMBOL(HPAGE_SHIFT);
-
 #define hugepd_none(hpd)   (hpd_val(hpd) == 0)
 
 #define PTE_T_ORDER(__builtin_ffs(sizeof(pte_t)) - 
__builtin_ffs(sizeof(void *)))
@@ -647,23 +644,9 @@ static int __init hugetlbpage_init(void)
 #endif
}
 
-#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
-   /* Default hpage size = 4M on FSL_BOOK3E and 512k on 8xx */
-   if (mmu_psize_defs[MMU_PAGE_4M].shift)
-   HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_4M].shift;
-   else if (mmu_psize_defs[MMU_PAGE_512K].shift)
-   HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_512K].shift;
-#else
-   /* Set default large page size. Currently, we pick 16M or 1M
-* depending on what is available
-*/
-   if (mmu_psize_defs[MMU_PAGE_16M].shift)
-   HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_16M].shift;
-   else if (mmu_psize_defs[MMU_PAGE_1M].shift)
-   HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_1M].shift;
-   else if (mmu_psize_defs[MMU_PAGE_2M].shift)
-   HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_2M].shift;
-#endif
+   if (IS_ENABLED(HUGETLB_PAGE_SIZE_VARIABLE))
+   huget

[PATCH v2 14/17] powerpc/mm: cleanup remaining ifdef mess in hugetlbpage.c

2019-04-25 Thread Christophe Leroy

Only 3 subarches support huge pages. So when it is either 2 of them,
it is not the third one.

And mmu_has_feature() is known by all subarches so IS_ENABLED() can
be used instead of #ifdef

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/hugetlbpage.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 265bd6d04233..1d5c6ec04351 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -226,7 +226,7 @@ int __init alloc_bootmem_huge_page(struct hstate *h)
return __alloc_bootmem_huge_page(h);
 }
 
-#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
+#ifndef CONFIG_PPC_BOOK3S_64
 #define HUGEPD_FREELIST_SIZE \
((PAGE_SIZE - sizeof(struct hugepd_freelist)) / sizeof(pte_t))
 
@@ -597,10 +597,10 @@ static int __init hugetlbpage_init(void)
return 0;
}
 
-#if !defined(CONFIG_PPC_FSL_BOOK3E) && !defined(CONFIG_PPC_8xx)
-   if (!radix_enabled() && !mmu_has_feature(MMU_FTR_16M_PAGE))
+   if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !radix_enabled() &&
+   !mmu_has_feature(MMU_FTR_16M_PAGE))
return -ENODEV;
-#endif
+
for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
unsigned shift;
unsigned pdshift;
@@ -638,10 +638,8 @@ static int __init hugetlbpage_init(void)
pgtable_cache_add(PTE_INDEX_SIZE);
else if (pdshift > shift)
pgtable_cache_add(pdshift - shift);
-#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
-   else
+   else if (IS_ENABLED(CONFIG_PPC_FSL_BOOK3E) || 
IS_ENABLED(CONFIG_PPC_8xx))
pgtable_cache_add(PTE_T_ORDER);
-#endif
}
 
if (IS_ENABLED(HUGETLB_PAGE_SIZE_VARIABLE))
-- 
2.13.3

[PATCH 2/3] tty: ipoctal: drop unused iflag macro

2019-04-25 Thread Johan Hovold

Drop the RELEVANT_IFLAG() macro which has never been used.

Cc: Samuel Iglesias Gonsalvez 
Cc: Jens Taprogge 
Signed-off-by: Johan Hovold 
---
 drivers/ipack/devices/ipoctal.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/ipack/devices/ipoctal.h b/drivers/ipack/devices/ipoctal.h
index 7fede0eb6a0c..78e4fc81fb03 100644
--- a/drivers/ipack/devices/ipoctal.h
+++ b/drivers/ipack/devices/ipoctal.h
@@ -18,7 +18,6 @@
 #define NR_CHANNELS8
 #define IPOCTAL_MAX_BOARDS 16
 #define MAX_DEVICES(NR_CHANNELS * IPOCTAL_MAX_BOARDS)
-#define RELEVANT_IFLAG(iflag) ((iflag) & (IGNBRK|BRKINT|IGNPAR|PARMRK|INPCK))
 
 /**
  * struct ipoctal_stats -- Stats since last reset
-- 
2.21.0

[PATCH v2 09/17] powerpc/mm: split asm/hugetlb.h into dedicated subarch files

2019-04-25 Thread Christophe Leroy

Three subarches support hugepages:
- fsl book3e
- book3s/64
- 8xx

This patch splits asm/hugetlb.h to reduce the #ifdef mess.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/64/hugetlb.h | 40 +++
 arch/powerpc/include/asm/hugetlb.h   | 87 ++--
 arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 31 +
 arch/powerpc/include/asm/nohash/hugetlb-book3e.h | 31 +
 4 files changed, 106 insertions(+), 83 deletions(-)
 create mode 100644 arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
 create mode 100644 arch/powerpc/include/asm/nohash/hugetlb-book3e.h

diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h 
b/arch/powerpc/include/asm/book3s/64/hugetlb.h
index ec2a55a553c7..7c99f018f7b5 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
@@ -62,4 +62,44 @@ extern pte_t huge_ptep_modify_prot_start(struct 
vm_area_struct *vma,
 extern void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t old_pte, pte_t new_pte);
+/*
+ * This should work for other subarchs too. But right now we use the
+ * new format only for 64bit book3s
+ */
+static inline pte_t *hugepd_page(hugepd_t hpd)
+{
+   VM_BUG_ON(!hugepd_ok(hpd));
+   /*
+* We have only four bits to encode, MMU page size
+*/
+   BUILD_BUG_ON((MMU_PAGE_COUNT - 1) > 0xf);
+   return __va(hpd_val(hpd) & HUGEPD_ADDR_MASK);
+}
+
+static inline unsigned int hugepd_mmu_psize(hugepd_t hpd)
+{
+   return (hpd_val(hpd) & HUGEPD_SHIFT_MASK) >> 2;
+}
+
+static inline unsigned int hugepd_shift(hugepd_t hpd)
+{
+   return mmu_psize_to_shift(hugepd_mmu_psize(hpd));
+}
+static inline void flush_hugetlb_page(struct vm_area_struct *vma,
+ unsigned long vmaddr)
+{
+   if (radix_enabled())
+   return radix__flush_hugetlb_page(vma, vmaddr);
+}
+
+static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
+   unsigned int pdshift)
+{
+   unsigned long idx = (addr & ((1UL << pdshift) - 1)) >> 
hugepd_shift(hpd);
+
+   return hugepd_page(hpd) + idx;
+}
+
+void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
+
 #endif
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 7f1867e428c0..fd5c0873a57d 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -6,83 +6,13 @@
 #include 
 
 #ifdef CONFIG_PPC_BOOK3S_64
-
 #include 
-/*
- * This should work for other subarchs too. But right now we use the
- * new format only for 64bit book3s
- */
-static inline pte_t *hugepd_page(hugepd_t hpd)
-{
-   VM_BUG_ON(!hugepd_ok(hpd));
-   /*
-* We have only four bits to encode, MMU page size
-*/
-   BUILD_BUG_ON((MMU_PAGE_COUNT - 1) > 0xf);
-   return __va(hpd_val(hpd) & HUGEPD_ADDR_MASK);
-}
-
-static inline unsigned int hugepd_mmu_psize(hugepd_t hpd)
-{
-   return (hpd_val(hpd) & HUGEPD_SHIFT_MASK) >> 2;
-}
-
-static inline unsigned int hugepd_shift(hugepd_t hpd)
-{
-   return mmu_psize_to_shift(hugepd_mmu_psize(hpd));
-}
-static inline void flush_hugetlb_page(struct vm_area_struct *vma,
- unsigned long vmaddr)
-{
-   if (radix_enabled())
-   return radix__flush_hugetlb_page(vma, vmaddr);
-}
-
-#else
-
-static inline pte_t *hugepd_page(hugepd_t hpd)
-{
-   VM_BUG_ON(!hugepd_ok(hpd));
-#ifdef CONFIG_PPC_8xx
-   return (pte_t *)__va(hpd_val(hpd) & ~HUGEPD_SHIFT_MASK);
-#else
-   return (pte_t *)((hpd_val(hpd) &
- ~HUGEPD_SHIFT_MASK) | PD_HUGE);
-#endif
-}
-
-static inline unsigned int hugepd_shift(hugepd_t hpd)
-{
-#ifdef CONFIG_PPC_8xx
-   return ((hpd_val(hpd) & _PMD_PAGE_MASK) >> 1) + 17;
-#else
-   return hpd_val(hpd) & HUGEPD_SHIFT_MASK;
-#endif
-}
-
+#elif defined(CONFIG_PPC_FSL_BOOK3E)
+#include 
+#elif defined(CONFIG_PPC_8xx)
+#include 
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
-
-static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
-   unsigned pdshift)
-{
-   /*
-* On FSL BookE, we have multiple higher-level table entries that
-* point to the same hugepte.  Just use the first one since they're all
-* identical.  So for that case, idx=0.
-*/
-   unsigned long idx = 0;
-
-   pte_t *dir = hugepd_page(hpd);
-#ifdef CONFIG_PPC_8xx
-   idx = (addr & ((1UL << pdshift) - 1)) >> PAGE_SHIFT;
-#elif !defined(CONFIG_PPC_FSL_BOOK3E)
-   idx = (addr & ((1UL << pdshift) - 1)) >> hugepd_shift(hpd);
-#endif
-
-   return dir + idx;
-}
-
 void flush_dcache_icache_hugepage(struct page *page);
 
 int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
@@ -99,15 +29,6 @@ static

[PATCH v2 15/17] powerpc/mm: flatten function __find_linux_pte() step 1

2019-04-25 Thread Christophe Leroy

__find_linux_pte() is full of if/else which is hard to
follow allthough the handling is pretty simple.

This patch flattens the function by getting rid of as much if/else
as possible. In order to ease the review, this is done in three steps.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/pgtable.c | 32 ++--
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 9f4ccd15849f..d332abeedf0a 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -339,12 +339,16 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
 */
if (pgd_none(pgd))
return NULL;
-   else if (pgd_huge(pgd)) {
-   ret_pte = (pte_t *) pgdp;
+
+   if (pgd_huge(pgd)) {
+   ret_pte = (pte_t *)pgdp;
goto out;
-   } else if (is_hugepd(__hugepd(pgd_val(pgd
+   }
+   if (is_hugepd(__hugepd(pgd_val(pgd {
hpdp = (hugepd_t *)&pgd;
-   else {
+   goto out_huge;
+   }
+   {
/*
 * Even if we end up with an unmap, the pgtable will not
 * be freed, because we do an rcu free and here we are
@@ -356,12 +360,16 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
 
if (pud_none(pud))
return NULL;
-   else if (pud_huge(pud)) {
+
+   if (pud_huge(pud)) {
ret_pte = (pte_t *) pudp;
goto out;
-   } else if (is_hugepd(__hugepd(pud_val(pud
+   }
+   if (is_hugepd(__hugepd(pud_val(pud {
hpdp = (hugepd_t *)&pud;
-   else {
+   goto out_huge;
+   }
+   {
pdshift = PMD_SHIFT;
pmdp = pmd_offset(&pud, ea);
pmd  = READ_ONCE(*pmdp);
@@ -386,12 +394,16 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
if (pmd_huge(pmd) || pmd_large(pmd)) {
ret_pte = (pte_t *) pmdp;
goto out;
-   } else if (is_hugepd(__hugepd(pmd_val(pmd
+   }
+   if (is_hugepd(__hugepd(pmd_val(pmd {
hpdp = (hugepd_t *)&pmd;
-   else
-   return pte_offset_kernel(&pmd, ea);
+   goto out_huge;
+   }
+
+   return pte_offset_kernel(&pmd, ea);
}
}
+out_huge:
if (!hpdp)
return NULL;
 
-- 
2.13.3

[PATCH 3/3] tty: cpm_uart: drop unused iflag macro

2019-04-25 Thread Johan Hovold

Drop the RELEVANT_IFLAG() macro which hasn't been used at least since
the dawn of git.

Signed-off-by: Johan Hovold 
---
 drivers/tty/serial/cpm_uart/cpm_uart_core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c 
b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
index b929c7ae3a27..505262b1c6c2 100644
--- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
+++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
@@ -567,8 +567,6 @@ static void cpm_uart_set_termios(struct uart_port *port,
/*
 * Set up parity check flag
 */
-#define RELEVANT_IFLAG(iflag) (iflag & (IGNBRK|BRKINT|IGNPAR|PARMRK|INPCK))
-
port->read_status_mask = (BD_SC_EMPTY | BD_SC_OV);
if (termios->c_iflag & INPCK)
port->read_status_mask |= BD_SC_FR | BD_SC_PR;
-- 
2.21.0

[PATCH v2 17/17] powerpc/mm: flatten function __find_linux_pte() step 3

2019-04-25 Thread Christophe Leroy

__find_linux_pte() is full of if/else which is hard to
follow allthough the handling is pretty simple.

Previous patches left a { } block. This patch removes it.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/pgtable.c | 98 +++
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index c1c6d0b79baa..db4a6253df92 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -348,59 +348,59 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
hpdp = (hugepd_t *)&pgd;
goto out_huge;
}
-   {
-   /*
-* Even if we end up with an unmap, the pgtable will not
-* be freed, because we do an rcu free and here we are
-* irq disabled
-*/
-   pdshift = PUD_SHIFT;
-   pudp = pud_offset(&pgd, ea);
-   pud  = READ_ONCE(*pudp);
 
-   if (pud_none(pud))
-   return NULL;
+   /*
+* Even if we end up with an unmap, the pgtable will not
+* be freed, because we do an rcu free and here we are
+* irq disabled
+*/
+   pdshift = PUD_SHIFT;
+   pudp = pud_offset(&pgd, ea);
+   pud  = READ_ONCE(*pudp);
 
-   if (pud_huge(pud)) {
-   ret_pte = (pte_t *) pudp;
-   goto out;
-   }
-   if (is_hugepd(__hugepd(pud_val(pud {
-   hpdp = (hugepd_t *)&pud;
-   goto out_huge;
-   }
-   pdshift = PMD_SHIFT;
-   pmdp = pmd_offset(&pud, ea);
-   pmd  = READ_ONCE(*pmdp);
-   /*
-* A hugepage collapse is captured by pmd_none, because
-* it mark the pmd none and do a hpte invalidate.
-*/
-   if (pmd_none(pmd))
-   return NULL;
-
-   if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
-   if (is_thp)
-   *is_thp = true;
-   ret_pte = (pte_t *)pmdp;
-   goto out;
-   }
-   /*
-* pmd_large check below will handle the swap pmd pte
-* we need to do both the check because they are config
-* dependent.
-*/
-   if (pmd_huge(pmd) || pmd_large(pmd)) {
-   ret_pte = (pte_t *)pmdp;
-   goto out;
-   }
-   if (is_hugepd(__hugepd(pmd_val(pmd {
-   hpdp = (hugepd_t *)&pmd;
-   goto out_huge;
-   }
+   if (pud_none(pud))
+   return NULL;
 
-   return pte_offset_kernel(&pmd, ea);
+   if (pud_huge(pud)) {
+   ret_pte = (pte_t *)pudp;
+   goto out;
}
+   if (is_hugepd(__hugepd(pud_val(pud {
+   hpdp = (hugepd_t *)&pud;
+   goto out_huge;
+   }
+   pdshift = PMD_SHIFT;
+   pmdp = pmd_offset(&pud, ea);
+   pmd  = READ_ONCE(*pmdp);
+   /*
+* A hugepage collapse is captured by pmd_none, because
+* it mark the pmd none and do a hpte invalidate.
+*/
+   if (pmd_none(pmd))
+   return NULL;
+
+   if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
+   if (is_thp)
+   *is_thp = true;
+   ret_pte = (pte_t *)pmdp;
+   goto out;
+   }
+   /*
+* pmd_large check below will handle the swap pmd pte
+* we need to do both the check because they are config
+* dependent.
+*/
+   if (pmd_huge(pmd) || pmd_large(pmd)) {
+   ret_pte = (pte_t *)pmdp;
+   goto out;
+   }
+   if (is_hugepd(__hugepd(pmd_val(pmd {
+   hpdp = (hugepd_t *)&pmd;
+   goto out_huge;
+   }
+
+   return pte_offset_kernel(&pmd, ea);
+
 out_huge:
if (!hpdp)
return NULL;
-- 
2.13.3

Re: [RFC PATCH v5 3/4] x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector

2019-04-25 Thread Ingo Molnar



* Zhao, Yakui  wrote:

> > > > Does the hypervisor model the APIC EOI command, i.e. does it require the
> > > > APIC to be acked? I.e. would not acking the APIC create an IRQ storm?
> > > 
> > > The hypervisor requires that the APIC EOI should be acked. If the EOI APIC
> > > is not acked, the APIC ISR bit for the HYPERVISOR_CALLBACK_VECTOR will not
> > > be cleared and then it will block the interrupt whose vector is lower than
> > > HYPERVISOR_CALLBACK_VECTOR.
> > 
> > Ok!
> > 
> 
> I will add some comments for calling entering_ack_irq in
> acrn_hv_callback_handler. Is this ok to you?

Yeah, thanks!

Ingo

linux-next: build warning after merge of the char-misc tree

2019-04-25 Thread Stephen Rothwell

Hi all,

After merging the char-misc tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

drivers/misc/aspeed-p2a-ctrl.c: In function 'aspeed_p2a_mmap':
drivers/misc/aspeed-p2a-ctrl.c:110:2: warning: ISO C90 forbids mixed 
declarations and code [-Wdeclaration-after-statement]
  pgprot_t prot = vma->vm_page_prot;
  ^~~~

Introduced by commit

  01c60dcea9f7 ("drivers/misc: Add Aspeed P2A control driver")

-- 
Cheers,
Stephen Rothwell


pgpNvM_dCf8Uq.pgp
Description: OpenPGP digital signature

Re: [PATCH 1/3] mfd: apple-ibridge: Add Apple iBridge MFD driver.

2019-04-25 Thread Life is hard, and then you die



  Hi Benjamin,

On Thu, Apr 25, 2019 at 11:39:12AM +0200, Benjamin Tissoires wrote:
> On Thu, Apr 25, 2019 at 10:19 AM Life is hard, and then you die
>  wrote:
> >
> >   Hi Benjamin,
> >
> > Thank you for looking at this.
> >
> > On Wed, Apr 24, 2019 at 04:18:23PM +0200, Benjamin Tissoires wrote:
> > > On Mon, Apr 22, 2019 at 5:13 AM Ronald Tschalär  
> > > wrote:
> > > >
> > > > The iBridge device provides access to several devices, including:
> > > > - the Touch Bar
> > > > - the iSight webcam
> > > > - the light sensor
> > > > - the fingerprint sensor
> > > >
> > > > This driver provides the core support for managing the iBridge device
> > > > and the access to the underlying devices. In particular, since the
> > > > functionality for the touch bar and light sensor is exposed via USB HID
> > > > interfaces, and the same HID device is used for multiple functions, this
> > > > driver provides a multiplexing layer that allows multiple HID drivers to
> > > > be registered for a given HID device. This allows the touch bar and ALS
> > > > driver to be separated out into their own modules.
> > >
> > > Sorry for coming late to the party, but IMO this series is far too
> > > complex for what you need.
> > >
> > > As I read this and the first comment of drivers/mfd/apple-ibridge.c,
> > > you need to have a HID driver that multiplex 2 other sub drivers
> > > through one USB communication.
> > > For that, you are using MFD, platform driver and you own sauce instead
> > > of creating a bus.
> >
> > Basically correct. To be a bit more precise, there are currently two
> > hid-devices and two drivers (touchbar and als) involved, with
> > connections as follows (pardon the ugly ascii art):
> >
> >   hdev1  ---  tb-drv
> >/
> >   /
> >  /
> >   hdev2  ---  als-drv
> >
> > i.e. the touchbar driver talks to both hdev's, and hdev2's events
> > (reports) are processed by both drivers (though each handles different
> > reports).
> >
> > > So, how about we reuse entirely the HID subsystem which already
> > > provides the capability you need (assuming I am correct above).
> > > hid-logitech-dj already does the same kind of stuff and you could:
> > > - create drivers/hid/hid-ibridge.c that handles USB_ID_PRODUCT_IBRIDGE
> > > - hid-ibridge will then register itself to the hid subsystem with a
> > > call to hid_hw_start(hdev, HID_CONNECT_HIDRAW) and
> > > hid_device_io_start(hdev) to enable the events (so you don't create
> > > useless input nodes for it)
> > > - then you add your 2 new devices by calling hid_allocate_device() and
> > > then hid_add_device(). You can even create a new HID group
> > > APPLE_IBRIDGE and allocate 2 new PIDs for them to distinguish them
> > > from the actual USB device.
> > > - then you have 2 brand new HID devices you can create their driver as
> > > a regular ones.
> > >
> > > hid-ibridge.c would just need to behave like any other hid transport
> > > driver (see logi_dj_ll_driver in drivers/hid/hid-logitech-dj.c) and
> > > you can get rid of at least the MFD and the platform part of your
> > > drivers.
> > >
> > > Does it makes sense or am I missing something obvious in the middle?
> >
> > Yes, I think I understand, and I think this can work. Basically,
> > instead of demux'ing at the hid-driver level as I am doing now (i.e.
> > the iBridge hid-driver forwarding calls to the sub-hid-drivers), we
> > demux at the hid-device level (events forwarded from iBridge hdev to
> > all "virtual" sub-hdev's, and requests from sub-hdev's forwarded to
> > the original hdev via an iBridge ll_driver attached to the
> > sub-hdev's).
> >
> > So I would need to create 3 new "virtual" hid-devices (instances) as
> > follows:
> >
> >   hdev1  ---  vhdev1  ---  tb-drv
> > /
> >   --  vhdev2  --
> >  /
> >   hdev2  ---  vhdev3  ---  als-drv
> >
> > (vhdev1 is probably not strictly necessary, but makes things more
> > consistent).
> 
> Oh, ok.
> 
> How about the following:
> 
> hdev1 and hdev2 are merged together in hid-apple-ibridge.c, and then
> this driver creates 2 virtual hid drivers that are consistent
> 
> like
> 
> hdev1---ibridge-drv---vhdev1---tb-drv
> hdev2--/   \--vhdev2---als-drv

I don't think this will work. The problem is when the sub-drivers need
to send a report or usb-command: how to they specify which hdev the
report/command is destined for? While we could store the original hdev
in each report (the hid_report's device field), that only works for
hid_hw_request(), but not for things like hid_hw_raw_request() or
hid_hw_output_report(). Now, currently I don't use the latter two; but
I do need to send raw usb control messages in the touchbar driver
(some commands are not proper hid reports), so it definitely breaks
down there.

Or am I missing something?


  Cheers,

  Ronald

[PATCH v2] arm64: dts: ls1028a: Add USB dt nodes

2019-04-25 Thread Ran Wang

This patch adds USB dt nodes for LS1028A.

Signed-off-by: Ran Wang 
---
Changes in v2:
  - Rename node from usb3@... to usb@... to meet DTSpec

 arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
index 8dd3501..188cfb8 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
@@ -144,6 +144,26 @@
clocks = <&sysclk>;
};
 
+   usb0:usb@310 {
+   compatible= "snps,dwc3";
+   reg= <0x0 0x310 0x0 0x1>;
+   interrupts= <0 80 0x4>;
+   dr_mode= "host";
+   snps,dis_rxdet_inp3_quirk;
+   snps,quirk-frame-length-adjustment = <0x20>;
+   snps,incr-burst-type-adjustment = <1>, <4>, <8>, <16>;
+   };
+
+   usb1:usb@311 {
+   compatible= "snps,dwc3";
+   reg= <0x0 0x311 0x0 0x1>;
+   interrupts= <0 81 0x4>;
+   dr_mode= "host";
+   snps,dis_rxdet_inp3_quirk;
+   snps,quirk-frame-length-adjustment = <0x20>;
+   snps,incr-burst-type-adjustment = <1>, <4>, <8>, <16>;
+   };
+
i2c0: i2c@200 {
compatible = "fsl,vf610-i2c";
#address-cells = <1>;
-- 
1.7.1

Re: [PATCH 1/2] mmc: sdhci_am654: Fix minor phy configurations

2019-04-25 Thread Adrian Hunter

On 25/04/19 6:57 PM, Faiz Abbas wrote:
> Fix the following minor things:
> 
> 1. Line wrapping with the regmap_*() functions is way more conservative
> than required by the 80 character rule. Expand the function calls out to
> use less number of lines.
> 
> 2. Add an error message if the DLL fails to lock.

Please make the white space changes a separate patch.

Also I would prefer not to use "fix" in the subject unless the patch fixes
driver behaviour.

> 
> Signed-off-by: Faiz Abbas 
> ---
>  drivers/mmc/host/sdhci_am654.c | 37 --
>  1 file changed, 17 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/mmc/host/sdhci_am654.c b/drivers/mmc/host/sdhci_am654.c
> index eea183e90f1b..866a9082705f 100644
> --- a/drivers/mmc/host/sdhci_am654.c
> +++ b/drivers/mmc/host/sdhci_am654.c
> @@ -88,8 +88,7 @@ static void sdhci_am654_set_clock(struct sdhci_host *host, 
> unsigned int clock)
>   int ret;
>  
>   if (sdhci_am654->dll_on) {
> - regmap_update_bits(sdhci_am654->base, PHY_CTRL1,
> -ENDLL_MASK, 0);
> + regmap_update_bits(sdhci_am654->base, PHY_CTRL1, ENDLL_MASK, 0);
>  
>   sdhci_am654->dll_on = false;
>   }
> @@ -101,8 +100,7 @@ static void sdhci_am654_set_clock(struct sdhci_host 
> *host, unsigned int clock)
>   mask = OTAPDLYENA_MASK | OTAPDLYSEL_MASK;
>   val = (1 << OTAPDLYENA_SHIFT) |
> (sdhci_am654->otap_del_sel << OTAPDLYSEL_SHIFT);
> - regmap_update_bits(sdhci_am654->base, PHY_CTRL4,
> -mask, val);
> + regmap_update_bits(sdhci_am654->base, PHY_CTRL4, mask, val);
>   switch (clock) {
>   case 2:
>   sel50 = 0;
> @@ -120,8 +118,7 @@ static void sdhci_am654_set_clock(struct sdhci_host 
> *host, unsigned int clock)
>   /* Configure PHY DLL frequency */
>   mask = SEL50_MASK | SEL100_MASK;
>   val = (sel50 << SEL50_SHIFT) | (sel100 << SEL100_SHIFT);
> - regmap_update_bits(sdhci_am654->base, PHY_CTRL5,
> -mask, val);
> + regmap_update_bits(sdhci_am654->base, PHY_CTRL5, mask, val);
>   /* Configure DLL TRIM */
>   mask = DLL_TRIM_ICP_MASK;
>   val = sdhci_am654->trm_icp << DLL_TRIM_ICP_SHIFT;
> @@ -129,19 +126,21 @@ static void sdhci_am654_set_clock(struct sdhci_host 
> *host, unsigned int clock)
>   /* Configure DLL driver strength */
>   mask |= DR_TY_MASK;
>   val |= sdhci_am654->drv_strength << DR_TY_SHIFT;
> - regmap_update_bits(sdhci_am654->base, PHY_CTRL1,
> -mask, val);
> + regmap_update_bits(sdhci_am654->base, PHY_CTRL1, mask, val);
>   /* Enable DLL */
> - regmap_update_bits(sdhci_am654->base, PHY_CTRL1,
> -ENDLL_MASK, 0x1 << ENDLL_SHIFT);
> + regmap_update_bits(sdhci_am654->base, PHY_CTRL1, ENDLL_MASK,
> +0x1 << ENDLL_SHIFT);
>   /*
>* Poll for DLL ready. Use a one second timeout.
>* Works in all experiments done so far
>*/
> - ret = regmap_read_poll_timeout(sdhci_am654->base,
> -  PHY_STAT1, val,
> -  val & DLLRDY_MASK,
> -  1000, 100);
> + ret = regmap_read_poll_timeout(sdhci_am654->base, PHY_STAT1,
> +val, val & DLLRDY_MASK, 1000,
> +100);
> + if (ret) {
> + dev_err(mmc_dev(host->mmc), "DLL failed to relock\n");
> + return;
> + }
>  
>   sdhci_am654->dll_on = true;
>   }
> @@ -186,8 +185,7 @@ static int sdhci_am654_init(struct sdhci_host *host)
>  
>   /* Reset OTAP to default value */
>   mask = OTAPDLYENA_MASK | OTAPDLYSEL_MASK;
> - regmap_update_bits(sdhci_am654->base, PHY_CTRL4,
> -mask, 0x0);
> + regmap_update_bits(sdhci_am654->base, PHY_CTRL4, mask, 0x0);
>  
>   regmap_read(sdhci_am654->base, PHY_STAT1, &val);
>   if (~val & CALDONE_MASK) {
> @@ -201,15 +199,14 @@ static int sdhci_am654_init(struct sdhci_host *host)
>   }
>  
>   /* Enable pins by setting IO mux to 0 */
> - regmap_update_bits(sdhci_am654->base, PHY_CTRL1,
> -IOMUX_ENABLE_MASK, 0);
> + regmap_update_bits(sdhci_am654->base, PHY_CTRL1, IOMUX_ENABLE_MASK, 0);
>  
>   /* Set slot type based on SD or eMMC */
>   if (host->mmc->caps & MMC_CAP_NONREMOVABLE)
>   ctl_cfg_2 = SLOTTYPE_EMBEDDED;
>  
> - regmap_update_bits(sdhci_am654->base, CTL_CFG_2,
> -

Re: [PATCH 1/2] RISC-V: Add DT documentation for SiFive L2 Cache Controller

2019-04-25 Thread Yash Shah

On Thu, Apr 25, 2019 at 3:43 PM Sudeep Holla  wrote:
>
> On Thu, Apr 25, 2019 at 11:24:55AM +0530, Yash Shah wrote:
> > Add device tree bindings for SiFive FU540 L2 cache controller driver
> >
> > Signed-off-by: Yash Shah 
> > ---
> >  .../devicetree/bindings/riscv/sifive-l2-cache.txt  | 53 
> > ++
> >  1 file changed, 53 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt
> >
> > diff --git a/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt 
> > b/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt
> > new file mode 100644
> > index 000..15132e2
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt
> > @@ -0,0 +1,53 @@
> > +SiFive L2 Cache Controller
> > +--
> > +The SiFive Level 2 Cache Controller is used to provide access to fast 
> > copies
> > +of memory for masters in a Core Complex. The Level 2 Cache Controller also
> > +acts as directory-based coherency manager.
> > +
> > +Required Properties:
> > +
> > +- compatible: Should be "sifive,fu540-c000-ccache"
> > +
> > +- cache-block-size: Specifies the block size in bytes of the cache
> > +
> > +- cache-level: Should be set to 2 for a level 2 cache
> > +
> > +- cache-sets: Specifies the number of associativity sets of the cache
> > +
> > +- cache-size: Specifies the size in bytes of the cache
> > +
> > +- cache-unified: Specifies the cache is a unified cache
> > +
> > +- interrupt-parent: Must be core interrupt controller
> > +
> > +- interrupts: Must contain 3 entries (DirError, DataError and DataFail 
> > signals)
> > +
> > +- reg: Physical base address and size of L2 cache controller registers map
> > +
> > +- reg-names: Should be "control"
> > +
>
> It would be good if you mark the properties that are present in DT
> specification and those that are added for sifive,fu540-c000-ccache

I believe there isn't any property which is added explicitly for
sifive,fu540-c000-ccache.

> explicitly. Also I assume you can retain the stardard "cache" compatible
> in addition to above. I am interested to see if the cacheinfo infrastructure
> can be used without any issues.

Yes, I will add the "cache" string to the compatible property.

>
> --
> Regards,
> Sudeep

Thanks for your comments.
- Yash

Re: [PATCHv2 4/4] printk: make sure we always print console disabled message

2019-04-25 Thread Sergey Senozhatsky



Forgot to mention that the series is still in RFC phase.


On (04/26/19 14:33), Sergey Senozhatsky wrote:
[..]
> +++ b/kernel/printk/printk.c
> @@ -2613,6 +2613,12 @@ static int __unregister_console(struct console 
> *console)
>   pr_info("%sconsole [%s%d] disabled\n",
>   (console->flags & CON_BOOT) ? "boot" : "",
>   console->name, console->index);
> + /*
> +  * Print 'console disabled' on all the consoles, including the
> +  * one we are about to unregister.
> +  */
> + console_unlock();
> + console_lock();
>  
>   res = _braille_unregister_console(console);
>   if (res)

Need to think more if this is race free...

-ss

Re: [PATCH 02/28] locking/lockdep: Add description and explanation in lockdep design doc

2019-04-25 Thread Yuyang Du

Thank you very much for review.

You mean class can go away? Before Bart's addition, it can go away.
Right? I think maybe the original point of "never go away" in that
context did not intend to talk about a class's real disappearance.

Anyway, the points should be made comprehensive. You want me to resend
the patch or you modify it?

On Thu, 25 Apr 2019 at 22:01, Peter Zijlstra  wrote:
>
> On Wed, Apr 24, 2019 at 06:19:08PM +0800, Yuyang Du wrote:
> > +Unlike a lock instance, a lock-class itself never goes away: when a
> > +lock-class's instance is used for the first time after bootup the class 
> > gets
> > +registered, and all (subsequent) instances of that lock-class will be 
> > mapped
> > +to the lock-class.
>
> That's not entirely accurate anymore. Bart van Assche recently added
> lockdep_{,un}register_key().

Re: [PATCH 2/2] RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs

2019-04-25 Thread Yash Shah

On Thu, Apr 25, 2019 at 3:48 PM Sudeep Holla  wrote:
>
> On Thu, Apr 25, 2019 at 11:24:56AM +0530, Yash Shah wrote:
> > The driver currently supports only SiFive FU540-C000 platform.
> >
> > The initial version of L2 cache controller driver includes:
> > - Initial configuration reporting at boot up.
> > - Support for ECC related functionality.
> >
> > Signed-off-by: Yash Shah 
>
> []
>
> > +static const struct file_operations l2_fops = {
> > + .owner = THIS_MODULE,
> > + .open = simple_open,
> > + .write = l2_write
> > +};
> > +
> > +static void setup_sifive_debug(void)
> > +{
> > + sifive_test = debugfs_create_dir("sifive_l2_cache", NULL);
> > + if (!sifive_test)
>
> Drop the conditional check above, Greg K H removed lots of them recently.
> In his words: When calling debugfs functions, there is no need to ever
> check the return value.  The function can work or not, but the code
> logic should never do something different based on this.
>
> He may not like to see this :)

Sure, thanks for pointing it out. Will drop all the conditional check
in debugfs functions.

>
> > + return;
> > +
> > + if (!debugfs_create_file("sifive_debug_inject_error", 0200,
> > +  sifive_test, NULL, &l2_fops))
>
> Ditto.
>
> > + debugfs_remove_recursive(sifive_test);
> > +}
>
> --
> Regards,
> Sudeep

Thanks for your comments.

- Yash

Re: [PATCH 1/3] mfd: apple-ibridge: Add Apple iBridge MFD driver.

2019-04-25 Thread Life is hard, and then you die



  Hi Jonathan,

On Wed, Apr 24, 2019 at 08:13:17PM +0100, Jonathan Cameron wrote:
> On Wed, 24 Apr 2019 03:47:18 -0700
> "Life is hard, and then you die"  wrote:
> 
> >   Hi Jonathan,
> > 
> > On Mon, Apr 22, 2019 at 12:34:26PM +0100, Jonathan Cameron wrote:
> > > On Sun, 21 Apr 2019 20:12:49 -0700
> > > Ronald Tschalär  wrote:
> > >   
> > > > The iBridge device provides access to several devices, including:
> > > > - the Touch Bar
> > > > - the iSight webcam
> > > > - the light sensor
> > > > - the fingerprint sensor
> > > > 
> > > > This driver provides the core support for managing the iBridge device
> > > > and the access to the underlying devices. In particular, since the
> > > > functionality for the touch bar and light sensor is exposed via USB HID
> > > > interfaces, and the same HID device is used for multiple functions, this
> > > > driver provides a multiplexing layer that allows multiple HID drivers to
> > > > be registered for a given HID device. This allows the touch bar and ALS
> > > > driver to be separated out into their own modules.
> > > > 
> > > > Signed-off-by: Ronald Tschalär  > > Hi Ronald,
> > > 
> > > I've only taken a fairly superficial look at this.  A few global
> > > things to note though.  
> > 
> > Thanks for this review.
[snip]

I've applied all your feedback in my tree, but it now looks like this
module is going to be redone differently. I'll try to keep all your
comments in mind during the rewrite, though, so they're not wasted.


  Cheers,

  Ronald

[PATCHv2 2/4] printk: remove invalid register_console() comment

2019-04-25 Thread Sergey Senozhatsky

We don't iterate consoles twice, since commit 8259cf434202
("printk: Ensure that "console enabled" messages are printed
 on the console"), so the comment is not valid anymore, and
can be removed, as was suggested by Petr.

The patch also invokes pr_info("%sconsole [%s%d] enabled\n")
before we unlock_consoles(), just to make sure that we really
print that message on every registered and enabled console.

Suggested-by: Petr Mladek 
Signed-off-by: Sergey Senozhatsky 
---
 kernel/printk/printk.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b0e361ca1bea..3ac71701afa3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2806,9 +2806,22 @@ void register_console(struct console *newcon)
exclusive_console_stop_seq = console_seq;
logbuf_unlock_irqrestore(flags);
}
+
+   /*
+* We are still under console_sem, pr_info() will only add the message
+* to the kernel's log buffer. console_unlock() will print it on all
+* registered and enabled consoles.
+*/
+   pr_info("%sconsole [%s%d] enabled\n",
+   (newcon->flags & CON_BOOT) ? "boot" : "",
+   newcon->name, newcon->index);
+
console_unlock();
console_sysfs_notify();
 
+   if (keep_bootcon)
+   return;
+
/*
 * By unregistering the bootconsoles after we enable the real console
 * we get the "console xxx enabled" message on all the consoles -
@@ -2816,19 +2829,8 @@ void register_console(struct console *newcon)
 * users know there might be something in the kernel's log buffer that
 * went to the bootconsole (that they do not see on the real console)
 */
-   pr_info("%sconsole [%s%d] enabled\n",
-   (newcon->flags & CON_BOOT) ? "boot" : "" ,
-   newcon->name, newcon->index);
-
-   if (keep_bootcon)
-   return;
-
if (bcon && (newcon->flags & (CON_CONSDEV|CON_BOOT)) == CON_CONSDEV) {
console_lock();
-   /*
-* We need to iterate through all boot consoles, to make
-* sure we print everything out, before we unregister them.
-*/
for_each_console(bcon)
if (bcon->flags & CON_BOOT)
__unregister_console(bcon);
-- 
2.21.0

[PATCHv2 4/4] printk: make sure we always print console disabled message

2019-04-25 Thread Sergey Senozhatsky

Make sure that we print 'console disabled' messages on all
the consoles, including the one we are about to unregister.
Otherwise, unregistered console will not have that message,
because pr_info() under console_sem doesn't print anything.

We do the same thing in __register_console() with the
'console enabled' message.

Signed-off-by: Sergey Senozhatsky 
---
 kernel/printk/printk.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 3b36e26d4a51..20c702b963a9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2613,6 +2613,12 @@ static int __unregister_console(struct console *console)
pr_info("%sconsole [%s%d] disabled\n",
(console->flags & CON_BOOT) ? "boot" : "",
console->name, console->index);
+   /*
+* Print 'console disabled' on all the consoles, including the
+* one we are about to unregister.
+*/
+   console_unlock();
+   console_lock();
 
res = _braille_unregister_console(console);
if (res)
-- 
2.21.0

[PATCHv2 3/4] printk: factor out register_console() code

2019-04-25 Thread Sergey Senozhatsky

We need to take console_sem lock when we iterate console drivers
list. Otherwise, another CPU can concurrently modify console drivers
list or console drivers. Current register_console() has several
race conditions - for_each_console() must be done under console_sem.

Factor out console registration code and hold console_sem throughout
entire registration process. Note that we need to unlock console_sem
and lock it again after we added new console to the list and before
we unregister boot consoles. This might look a bit weird, but this
is how we print pending logbuf messages to all registered and
available consoles.

Signed-off-by: Sergey Senozhatsky 
---
 kernel/printk/printk.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 3ac71701afa3..3b36e26d4a51 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2666,7 +2666,7 @@ static int __unregister_console(struct console *console)
  *  - Once a "real" console is registered, any attempt to register a
  *bootconsoles will be rejected
  */
-void register_console(struct console *newcon)
+static void __register_console(struct console *newcon)
 {
int i;
unsigned long flags;
@@ -2771,7 +2771,6 @@ void register_console(struct console *newcon)
 *  Put this console in the list - keep the
 *  preferred driver at the head of the list.
 */
-   console_lock();
if ((newcon->flags & CON_CONSDEV) || console_drivers == NULL) {
newcon->next = console_drivers;
console_drivers = newcon;
@@ -2818,6 +2817,7 @@ void register_console(struct console *newcon)
 
console_unlock();
console_sysfs_notify();
+   console_lock();
 
if (keep_bootcon)
return;
@@ -2830,14 +2830,19 @@ void register_console(struct console *newcon)
 * went to the bootconsole (that they do not see on the real console)
 */
if (bcon && (newcon->flags & (CON_CONSDEV|CON_BOOT)) == CON_CONSDEV) {
-   console_lock();
for_each_console(bcon)
if (bcon->flags & CON_BOOT)
__unregister_console(bcon);
-   console_unlock();
-   console_sysfs_notify();
}
 }
+
+void register_console(struct console *newcon)
+{
+   console_lock();
+   __register_console(newcon);
+   console_unlock();
+   console_sysfs_notify();
+}
 EXPORT_SYMBOL(register_console);
 
 int unregister_console(struct console *console)
-- 
2.21.0

[PATCH] Revert "drm/qxl: drop prime import/export callbacks"

2019-04-25 Thread Gerd Hoffmann

This reverts commit f4c34b1e2a37d5676180901fa6ff188bcb6371f8.

Simliar to commit a0cecc23cfcb Revert "drm/virtio: drop prime
import/export callbacks".  We have to do the same with qxl,
for the same reasons (it breaks DRI3).

Drop the WARN_ON_ONCE().

Fixes: f4c34b1e2a37d5676180901fa6ff188bcb6371f8
Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/qxl/qxl_drv.c   |  4 
 drivers/gpu/drm/qxl/qxl_prime.c | 12 
 2 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c
index 578d867a81d5..f33e349c4ec5 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.c
+++ b/drivers/gpu/drm/qxl/qxl_drv.c
@@ -255,10 +255,14 @@ static struct drm_driver qxl_driver = {
 #if defined(CONFIG_DEBUG_FS)
.debugfs_init = qxl_debugfs_init,
 #endif
+   .prime_handle_to_fd = drm_gem_prime_handle_to_fd,
+   .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
.gem_prime_export = drm_gem_prime_export,
.gem_prime_import = drm_gem_prime_import,
.gem_prime_pin = qxl_gem_prime_pin,
.gem_prime_unpin = qxl_gem_prime_unpin,
+   .gem_prime_get_sg_table = qxl_gem_prime_get_sg_table,
+   .gem_prime_import_sg_table = qxl_gem_prime_import_sg_table,
.gem_prime_vmap = qxl_gem_prime_vmap,
.gem_prime_vunmap = qxl_gem_prime_vunmap,
.gem_prime_mmap = qxl_gem_prime_mmap,
diff --git a/drivers/gpu/drm/qxl/qxl_prime.c b/drivers/gpu/drm/qxl/qxl_prime.c
index 8b448eca1cd9..114653b471c6 100644
--- a/drivers/gpu/drm/qxl/qxl_prime.c
+++ b/drivers/gpu/drm/qxl/qxl_prime.c
@@ -42,6 +42,18 @@ void qxl_gem_prime_unpin(struct drm_gem_object *obj)
qxl_bo_unpin(bo);
 }
 
+struct sg_table *qxl_gem_prime_get_sg_table(struct drm_gem_object *obj)
+{
+   return ERR_PTR(-ENOSYS);
+}
+
+struct drm_gem_object *qxl_gem_prime_import_sg_table(
+   struct drm_device *dev, struct dma_buf_attachment *attach,
+   struct sg_table *table)
+{
+   return ERR_PTR(-ENOSYS);
+}
+
 void *qxl_gem_prime_vmap(struct drm_gem_object *obj)
 {
struct qxl_bo *bo = gem_to_qxl_bo(obj);
-- 
2.18.1

[PATCHv2 0/4] Access console drivers list under console_sem

2019-04-25 Thread Sergey Senozhatsky

Hello,

Normally, we grab console_sem lock before we iterate consoles
list, which is necessary if we want to be race free. The only exception
to this rule is console_flush_on_panic(). However, it seems that we are
not fully race free - register_console() iterates console drivers list
in unsafe manner in several places. E.g. the following scenarion:

CPU0CPU1
register_console()  unregister_console()
 console_lock()
  for_each_console()  // modify console_drivers
con->fookfree(con)

I factored out register_console() and unregister_console() and now
the bulk of the work is done under console_sem. Both in register
and unregister paths we now have that oddly looking thing

pr_info("console enabled/disabled");
console_unlock();
console_lock();

Which is not really odd, in fact. This is to make sure that we always
print messages on all the consoles.

v2:
- removed outdated comment (Petr)
- factor out register_console() and always run it under console_sem (Petr)
- added a patch which enusures that we always print "console disabled'
  on every console, before we unregister one of them

Sergey Senozhatsky (4):
  printk: factor out __unregister_console() code
  printk: remove invalid register_console() comment
  printk: factor out register_console() code
  printk: make sure we always print console disabled message

 kernel/printk/printk.c | 125 +
 1 file changed, 76 insertions(+), 49 deletions(-)

-- 
2.21.0

[PATCHv2 1/4] printk: factor out __unregister_console() code

2019-04-25 Thread Sergey Senozhatsky

The following pattern in register_console() is not completely safe:

 for_each_console(bcon)
 if (bcon->flags & CON_BOOT)
 unregister_console(bcon);

Because, in theory, console drivers list and console drivers
can be modified concurrently from another CPU. We need to grab
console_sem lock, which protects console drivers list and console
drivers, before we start iterating console drivers list.

Factor out __unregister_console(), which will be called from
unregister_console() and register_console(), in both cases
under console_sem lock.

Signed-off-by: Sergey Senozhatsky 
---
 kernel/printk/printk.c | 98 --
 1 file changed, 56 insertions(+), 42 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 17102fd4c136..b0e361ca1bea 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2605,6 +2605,48 @@ static int __init keep_bootcon_setup(char *str)
 
 early_param("keep_bootcon", keep_bootcon_setup);
 
+static int __unregister_console(struct console *console)
+{
+   struct console *a, *b;
+   int res;
+
+   pr_info("%sconsole [%s%d] disabled\n",
+   (console->flags & CON_BOOT) ? "boot" : "",
+   console->name, console->index);
+
+   res = _braille_unregister_console(console);
+   if (res)
+   return res;
+
+   res = 1;
+   if (console_drivers == console) {
+   console_drivers = console->next;
+   res = 0;
+   } else if (console_drivers) {
+   for (a = console_drivers->next, b = console_drivers;
+a; b = a, a = b->next) {
+   if (a == console) {
+   b->next = a->next;
+   res = 0;
+   break;
+   }
+   }
+   }
+
+   if (!res && (console->flags & CON_EXTENDED))
+   nr_ext_console_drivers--;
+
+   /*
+* If this isn't the last console and it has CON_CONSDEV set, we
+* need to set it on the next preferred console.
+*/
+   if (console_drivers != NULL && console->flags & CON_CONSDEV)
+   console_drivers->flags |= CON_CONSDEV;
+
+   console->flags &= ~CON_ENABLED;
+   return res;
+}
+
 /*
  * The console driver calls this routine during kernel initialization
  * to register the console printing procedure with printk() and to
@@ -2777,62 +2819,34 @@ void register_console(struct console *newcon)
pr_info("%sconsole [%s%d] enabled\n",
(newcon->flags & CON_BOOT) ? "boot" : "" ,
newcon->name, newcon->index);
-   if (bcon &&
-   ((newcon->flags & (CON_CONSDEV | CON_BOOT)) == CON_CONSDEV) &&
-   !keep_bootcon) {
-   /* We need to iterate through all boot consoles, to make
+
+   if (keep_bootcon)
+   return;
+
+   if (bcon && (newcon->flags & (CON_CONSDEV|CON_BOOT)) == CON_CONSDEV) {
+   console_lock();
+   /*
+* We need to iterate through all boot consoles, to make
 * sure we print everything out, before we unregister them.
 */
for_each_console(bcon)
if (bcon->flags & CON_BOOT)
-   unregister_console(bcon);
+   __unregister_console(bcon);
+   console_unlock();
+   console_sysfs_notify();
}
 }
 EXPORT_SYMBOL(register_console);
 
 int unregister_console(struct console *console)
 {
-struct console *a, *b;
-   int res;
-
-   pr_info("%sconsole [%s%d] disabled\n",
-   (console->flags & CON_BOOT) ? "boot" : "" ,
-   console->name, console->index);
-
-   res = _braille_unregister_console(console);
-   if (res)
-   return res;
+   int ret;
 
-   res = 1;
console_lock();
-   if (console_drivers == console) {
-   console_drivers=console->next;
-   res = 0;
-   } else if (console_drivers) {
-   for (a=console_drivers->next, b=console_drivers ;
-a; b=a, a=b->next) {
-   if (a == console) {
-   b->next = a->next;
-   res = 0;
-   break;
-   }
-   }
-   }
-
-   if (!res && (console->flags & CON_EXTENDED))
-   nr_ext_console_drivers--;
-
-   /*
-* If this isn't the last console and it has CON_CONSDEV set, we
-* need to set it on the next preferred console.
-*/
-   if (console_drivers != NULL && console->flags & CON_CONSDEV)
-   console_drivers->flags |= CON_CONSDEV;
-
-   console->flags &= ~CON_ENABLED;
+   ret = __unregister_console(console);
console_unlock();
console_sysf

Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release

2019-04-25 Thread Michal Hocko

On Thu 25-04-19 14:42:52, Jann Horn wrote:
> On Thu, Apr 25, 2019 at 2:14 PM Michal Hocko  wrote:
> [...]
> > On Wed 24-04-19 14:10:39, Matthew Garrett wrote:
> > > From: Matthew Garrett 
> > >
> > > Applications that hold secrets and wish to avoid them leaking can use
> > > mlock() to prevent the page from being pushed out to swap and
> > > MADV_DONTDUMP to prevent it from being included in core dumps. 
> > > Applications
> > > can also use atexit() handlers to overwrite secrets on application exit.
> > > However, if an attacker can reboot the system into another OS, they can
> > > dump the contents of RAM and extract secrets. We can avoid this by setting
> > > CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that 
> > > the
> > > firmware wipe the contents of RAM before booting another OS, but this 
> > > means
> > > rebooting takes a *long* time - the expected behaviour is for a clean
> > > shutdown to remove the request after scrubbing secrets from RAM in order 
> > > to
> > > avoid this.
> > >
> > > Unfortunately, if an application exits uncleanly, its secrets may still be
> > > present in RAM. This can't be easily fixed in userland (eg, if the OOM
> > > killer decides to kill a process holding secrets, we're not going to be 
> > > able
> > > to avoid that), so this patch adds a new flag to madvise() to allow 
> > > userland
> > > to request that the kernel clear the covered pages whenever the page
> > > reference count hits zero. Since vm_flags is already full on 32-bit, it
> > > will only work on 64-bit systems.
> [...]
> > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > index 21a7881a2db4..989c2fde15cf 100644
> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -92,6 +92,22 @@ static long madvise_behavior(struct vm_area_struct 
> > > *vma,
> > >   case MADV_KEEPONFORK:
> > >   new_flags &= ~VM_WIPEONFORK;
> > >   break;
> > > + case MADV_WIPEONRELEASE:
> > > + /* MADV_WIPEONRELEASE is only supported on anonymous 
> > > memory. */
> > > + if (VM_WIPEONRELEASE == 0 || vma->vm_file ||
> > > + vma->vm_flags & VM_SHARED) {
> > > + error = -EINVAL;
> > > + goto out;
> > > + }
> > > + new_flags |= VM_WIPEONRELEASE;
> > > + break;
> 
> An interesting effect of this is that it will be possible to set this
> on a CoW anon VMA in a fork() child, and then the semantics in the
> parent will be subtly different - e.g. if the parent vmsplice()d a
> CoWed page into a pipe, then forked an unprivileged child, the child

Maybe a stupid question. How do you fork an unprivileged child (without
exec)? Child would have to drop priviledges on its own, no?

> set MADV_WIPEONRELEASE on its VMA, the parent died somehow, and then
> the child died, the page in the pipe would be zeroed out. A child
> should not be able to affect its parent like this, I think. If this
> was an mmap() flag instead of a madvise() command, that issue could be
> avoided.

With a VMA flag underneath, I think you can do an early CoW during fork
to prevent from that.

> Alternatively, if adding more mmap() flags doesn't work,
> perhaps you could scan the VMA and ensure that it contains no pages
> yet, or something like that?
> 
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index ab650c21bccd..ff78b527660e 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -1091,6 +1091,9 @@ static unsigned long zap_pte_range(struct 
> > > mmu_gather *tlb,
> > >   page_remove_rmap(page, false);
> > >   if (unlikely(page_mapcount(page) < 0))
> > >   print_bad_pte(vma, addr, ptent, page);
> > > + if (unlikely(vma->vm_flags & VM_WIPEONRELEASE) &&
> > > + page_mapcount(page) == 0)
> > > + clear_highpage(page);
> > >   if (unlikely(__tlb_remove_page(tlb, page))) {
> > >   force_flush = 1;
> > >   addr += PAGE_SIZE;
> 
> Should something like this perhaps be added in page_remove_rmap()
> instead? That's where the mapcount is decremented; and looking at
> other callers of page_remove_rmap(), in particular the following ones
> look interesting:

Well spotted!

-- 
Michal Hocko
SUSE Labs

Re: [PATCH] nvme: determine the number of IO queues

2019-04-25 Thread Aaron Ma



On 4/25/19 10:39 PM, Christoph Hellwig wrote:
> Honestly, unless this is a device shiping in a max market consumer
> product already I don't think we should work around this crap at all,
> given that this device has obviously never been tested at all.  It
> really needs a firmware fix instead of a host workaround.


Already pushed this issue to firmware eng team.
They will try to fix it.
As far as I know we don't need this host workaround.

Thanks,
Aaron

Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release

2019-04-25 Thread Michal Hocko

On Thu 25-04-19 13:39:01, Matthew Garrett wrote:
> On Thu, Apr 25, 2019 at 5:37 AM Michal Hocko  wrote:
> > Besides that you inherently assume that the user would do mlock because
> > you do not try to wipe the swap content. Is this intentional?
> 
> Yes, given MADV_DONTDUMP doesn't imply mlock I thought it'd be more
> consistent to keep those independent.

Do we want to fail madvise call on VMAs that are not mlocked then? What
if the munlock happens later after the madvise is called?

-- 
Michal Hocko
SUSE Labs

Re: [PATCH 1/2] serial: 8250-mtk: add follow control

2019-04-25 Thread Long Cheng

On Thu, 2019-04-25 at 12:40 +0200, Matthias Brugger wrote:
> 
> On 25/04/2019 10:41, Long Cheng wrote:
> > Add SW and HW follow control function.
> 
> Can you please explain a bit more what you are doing in this patch.
> You change the setting of the registers for different baud rates. Please
> elaborate what is happening there.
> 
Clock source is different. Sometimes, baudrate is greater than or equal
to 115200, we use highspeed of 3 algorithm and fractional divider to
ensure more accurate baudrate.

Next release version, I will update this to commit message

> > 
> > Signed-off-by: Long Cheng 
> > ---
> >  drivers/tty/serial/8250/8250_mtk.c |   60 
> > ++--
> >  1 file changed, 37 insertions(+), 23 deletions(-)
> > 
> > diff --git a/drivers/tty/serial/8250/8250_mtk.c 
> > b/drivers/tty/serial/8250/8250_mtk.c
> > index c1fdbc0..959fd85 100644
> > --- a/drivers/tty/serial/8250/8250_mtk.c
> > +++ b/drivers/tty/serial/8250/8250_mtk.c
> > @@ -21,12 +21,14 @@
> >  
> >  #include "8250.h"
> >  
> > -#define UART_MTK_HIGHS 0x09/* Highspeed register */
> > -#define UART_MTK_SAMPLE_COUNT  0x0a/* Sample count register */
> > -#define UART_MTK_SAMPLE_POINT  0x0b/* Sample point register */
> > +#define MTK_UART_HIGHS 0x09/* Highspeed register */
> > +#define MTK_UART_SAMPLE_COUNT  0x0a/* Sample count register */
> > +#define MTK_UART_SAMPLE_POINT  0x0b/* Sample point register */
> 
> Rename looks good to me. But I'd prefer to have it in a separate patch.
> 
OK.

> >  #define MTK_UART_RATE_FIX  0x0d/* UART Rate Fix Register */
> > -
> >  #define MTK_UART_DMA_EN0x13/* DMA Enable register */
> > +#define MTK_UART_RXTRI_AD  0x14/* RX Trigger address */
> > +#define MTK_UART_FRACDIV_L 0x15/* Fractional divider LSB address */
> > +#define MTK_UART_FRACDIV_M 0x16/* Fractional divider MSB address */
> >  #define MTK_UART_DMA_EN_TX 0x2
> >  #define MTK_UART_DMA_EN_RX 0x5
> >  
> > @@ -46,6 +48,7 @@ enum dma_rx_status {
> >  struct mtk8250_data {
> > int line;
> > unsigned intrx_pos;
> > +   unsigned intclk_count;
> 
> What is that for, not used in this patch.
> 
It's for other patch. Sorry, I will remove it.

> > struct clk  *uart_clk;
> > struct clk  *bus_clk;
> > struct uart_8250_dma*dma;
> > @@ -196,9 +199,15 @@ static void mtk8250_shutdown(struct uart_port *port)
> >  mtk8250_set_termios(struct uart_port *port, struct ktermios *termios,
> > struct ktermios *old)
> >  {
> > +   unsigned short fraction_L_mapping[] = {
> > +   0, 1, 0x5, 0x15, 0x55, 0x57, 0x57, 0x77, 0x7F, 0xFF, 0xFF
> > +   };
> > +   unsigned short fraction_M_mapping[] = {
> > +   0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 3
> > +   };
> > struct uart_8250_port *up = up_to_u8250p(port);
> > +   unsigned int baud, quot, fraction;
> > unsigned long flags;
> > -   unsigned int baud, quot;
> >  
> >  #ifdef CONFIG_SERIAL_8250_DMA
> > if (up->dma) {
> > @@ -214,7 +223,7 @@ static void mtk8250_shutdown(struct uart_port *port)
> > serial8250_do_set_termios(port, termios, old);
> >  
> > /*
> > -* Mediatek UARTs use an extra highspeed register (UART_MTK_HIGHS)
> > +* Mediatek UARTs use an extra highspeed register (MTK_UART_HIGHS)
> >  *
> >  * We need to recalcualte the quot register, as the claculation depends
> >  * on the vaule in the highspeed register.
> > @@ -230,18 +239,11 @@ static void mtk8250_shutdown(struct uart_port *port)
> >   port->uartclk / 16 / UART_DIV_MAX,
> >   port->uartclk);
> >  
> > -   if (baud <= 115200) {
> > -   serial_port_out(port, UART_MTK_HIGHS, 0x0);
> > +   if (baud < 115200) {
> > +   serial_port_out(port, MTK_UART_HIGHS, 0x0);
> > quot = uart_get_divisor(port, baud);
> > -   } else if (baud <= 576000) {
> > -   serial_port_out(port, UART_MTK_HIGHS, 0x2);
> > -
> > -   /* Set to next lower baudrate supported */
> > -   if ((baud == 50) || (baud == 576000))
> > -   baud = 460800;
> > -   quot = DIV_ROUND_UP(port->uartclk, 4 * baud);
> 
> So we allow now also these baud rates? Then you have to update the comment as 
> well.
> 
Yes.

When clock source is different, data sometimes is error by the previous
algorithm. It's not good. So we update new method to fix the issue.


> Regards,
> Matthias
> 
> > } else {
> > -   serial_port_out(port, UART_MTK_HIGHS, 0x3);
> > +   serial_port_out(port, MTK_UART_HIGHS, 0x3);
> > quot = DIV_ROUND_UP(port->uartclk, 256 * baud);
> > }
> >  
> > @@ -258,17 +260,29 @@ static void mtk8250_shutdown(struct uart_port *port)
> > /* reset DLAB */
> > serial_port_out(port, UART_LCR, up->lcr);
> >  
> > -   if (baud > 460800) {
> > +   if (baud >= 115200

Re: [PATCH] sparc: vdso: add FORCE to the build rule of %.so

2019-04-25 Thread David Miller

From: Masahiro Yamada 
Date: Fri, 26 Apr 2019 09:40:46 +0900

> Hi David,
> 
> 
> On Wed, Apr 3, 2019 at 5:34 PM Masahiro Yamada
>  wrote:
>>
>> $(call if_changed,...) must have FORCE as a prerequisite.
>>
>> Signed-off-by: Masahiro Yamada 
>> ---
> 
> Ping?

Sorry, I'm really busy and taking a short vacation before the LSF/MM
summit.

I will get to this when I have a chance.

Thank you.

[PATCH v4 26/27] userfaultfd: selftests: refactor statistics

2019-04-25 Thread Peter Xu

Introduce uffd_stats structure for statistics of the self test, at the
same time refactor the code to always pass in the uffd_stats for either
read() or poll() typed fault handling threads instead of using two
different ways to return the statistic results.  No functional change.

With the new structure, it's very easy to introduce new statistics.

Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 tools/testing/selftests/vm/userfaultfd.c | 76 +++-
 1 file changed, 49 insertions(+), 27 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index 5d1db824f73a..e5d12c209e09 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -88,6 +88,12 @@ static char *area_src, *area_src_alias, *area_dst, 
*area_dst_alias;
 static char *zeropage;
 pthread_attr_t attr;
 
+/* Userfaultfd test statistics */
+struct uffd_stats {
+   int cpu;
+   unsigned long missing_faults;
+};
+
 /* pthread_mutex_t starts at page offset 0 */
 #define area_mutex(___area, ___nr) \
((pthread_mutex_t *) ((___area) + (___nr)*page_size))
@@ -127,6 +133,17 @@ static void usage(void)
exit(1);
 }
 
+static void uffd_stats_reset(struct uffd_stats *uffd_stats,
+unsigned long n_cpus)
+{
+   int i;
+
+   for (i = 0; i < n_cpus; i++) {
+   uffd_stats[i].cpu = i;
+   uffd_stats[i].missing_faults = 0;
+   }
+}
+
 static int anon_release_pages(char *rel_area)
 {
int ret = 0;
@@ -469,8 +486,8 @@ static int uffd_read_msg(int ufd, struct uffd_msg *msg)
return 0;
 }
 
-/* Return 1 if page fault handled by us; otherwise 0 */
-static int uffd_handle_page_fault(struct uffd_msg *msg)
+static void uffd_handle_page_fault(struct uffd_msg *msg,
+  struct uffd_stats *stats)
 {
unsigned long offset;
 
@@ -485,18 +502,19 @@ static int uffd_handle_page_fault(struct uffd_msg *msg)
offset = (char *)(unsigned long)msg->arg.pagefault.address - area_dst;
offset &= ~(page_size-1);
 
-   return copy_page(uffd, offset);
+   if (copy_page(uffd, offset))
+   stats->missing_faults++;
 }
 
 static void *uffd_poll_thread(void *arg)
 {
-   unsigned long cpu = (unsigned long) arg;
+   struct uffd_stats *stats = (struct uffd_stats *)arg;
+   unsigned long cpu = stats->cpu;
struct pollfd pollfd[2];
struct uffd_msg msg;
struct uffdio_register uffd_reg;
int ret;
char tmp_chr;
-   unsigned long userfaults = 0;
 
pollfd[0].fd = uffd;
pollfd[0].events = POLLIN;
@@ -526,7 +544,7 @@ static void *uffd_poll_thread(void *arg)
msg.event), exit(1);
break;
case UFFD_EVENT_PAGEFAULT:
-   userfaults += uffd_handle_page_fault(&msg);
+   uffd_handle_page_fault(&msg, stats);
break;
case UFFD_EVENT_FORK:
close(uffd);
@@ -545,28 +563,27 @@ static void *uffd_poll_thread(void *arg)
break;
}
}
-   return (void *)userfaults;
+
+   return NULL;
 }
 
 pthread_mutex_t uffd_read_mutex = PTHREAD_MUTEX_INITIALIZER;
 
 static void *uffd_read_thread(void *arg)
 {
-   unsigned long *this_cpu_userfaults;
+   struct uffd_stats *stats = (struct uffd_stats *)arg;
struct uffd_msg msg;
 
-   this_cpu_userfaults = (unsigned long *) arg;
-   *this_cpu_userfaults = 0;
-
pthread_mutex_unlock(&uffd_read_mutex);
/* from here cancellation is ok */
 
for (;;) {
if (uffd_read_msg(uffd, &msg))
continue;
-   (*this_cpu_userfaults) += uffd_handle_page_fault(&msg);
+   uffd_handle_page_fault(&msg, stats);
}
-   return (void *)NULL;
+
+   return NULL;
 }
 
 static void *background_thread(void *arg)
@@ -582,13 +599,12 @@ static void *background_thread(void *arg)
return NULL;
 }
 
-static int stress(unsigned long *userfaults)
+static int stress(struct uffd_stats *uffd_stats)
 {
unsigned long cpu;
pthread_t locking_threads[nr_cpus];
pthread_t uffd_threads[nr_cpus];
pthread_t background_threads[nr_cpus];
-   void **_userfaults = (void **) userfaults;
 
finished = 0;
for (cpu = 0; cpu < nr_cpus; cpu++) {
@@ -597,12 +613,13 @@ static int stress(unsigned long *userfaults)
return 1;
if (bounces & BOUNCE_POLL) {
if (pthread_create(&uffd_threads[cpu], &attr,
-  uffd_poll_thread, (void *)cpu))
+  uffd_poll_thread,
+  (void *)&uffd_stats[cpu]))

[PATCH v4 27/27] userfaultfd: selftests: add write-protect test

2019-04-25 Thread Peter Xu

This patch adds uffd tests for write protection.

Instead of introducing new tests for it, let's simply squashing uffd-wp
tests into existing uffd-missing test cases.  Changes are:

(1) Bouncing tests

  We do the write-protection in two ways during the bouncing test:

  - By using UFFDIO_COPY_MODE_WP when resolving MISSING pages: then
we'll make sure for each bounce process every single page will be
at least fault twice: once for MISSING, once for WP.

  - By direct call UFFDIO_WRITEPROTECT on existing faulted memories:
To further torture the explicit page protection procedures of
uffd-wp, we split each bounce procedure into two halves (in the
background thread): the first half will be MISSING+WP for each
page as explained above.  After the first half, we write protect
the faulted region in the background thread to make sure at least
half of the pages will be write protected again which is the first
half to test the new UFFDIO_WRITEPROTECT call.  Then we continue
with the 2nd half, which will contain both MISSING and WP faulting
tests for the 2nd half and WP-only faults from the 1st half.

(2) Event/Signal test

  Mostly previous tests but will do MISSING+WP for each page.  For
  sigbus-mode test we'll need to provide standalone path to handle the
  write protection faults.

For all tests, do statistics as well for uffd-wp pages.

Signed-off-by: Peter Xu 
---
 tools/testing/selftests/vm/userfaultfd.c | 157 +++
 1 file changed, 133 insertions(+), 24 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index e5d12c209e09..bf1e10db72f5 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -56,6 +56,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../kselftest.h"
 
@@ -78,6 +79,8 @@ static int test_type;
 #define ALARM_INTERVAL_SECS 10
 static volatile bool test_uffdio_copy_eexist = true;
 static volatile bool test_uffdio_zeropage_eexist = true;
+/* Whether to test uffd write-protection */
+static bool test_uffdio_wp = false;
 
 static bool map_shared;
 static int huge_fd;
@@ -92,6 +95,7 @@ pthread_attr_t attr;
 struct uffd_stats {
int cpu;
unsigned long missing_faults;
+   unsigned long wp_faults;
 };
 
 /* pthread_mutex_t starts at page offset 0 */
@@ -141,9 +145,29 @@ static void uffd_stats_reset(struct uffd_stats *uffd_stats,
for (i = 0; i < n_cpus; i++) {
uffd_stats[i].cpu = i;
uffd_stats[i].missing_faults = 0;
+   uffd_stats[i].wp_faults = 0;
}
 }
 
+static void uffd_stats_report(struct uffd_stats *stats, int n_cpus)
+{
+   int i;
+   unsigned long long miss_total = 0, wp_total = 0;
+
+   for (i = 0; i < n_cpus; i++) {
+   miss_total += stats[i].missing_faults;
+   wp_total += stats[i].wp_faults;
+   }
+
+   printf("userfaults: %llu missing (", miss_total);
+   for (i = 0; i < n_cpus; i++)
+   printf("%lu+", stats[i].missing_faults);
+   printf("\b), %llu wp (", wp_total);
+   for (i = 0; i < n_cpus; i++)
+   printf("%lu+", stats[i].wp_faults);
+   printf("\b)\n");
+}
+
 static int anon_release_pages(char *rel_area)
 {
int ret = 0;
@@ -264,10 +288,15 @@ struct uffd_test_ops {
void (*alias_mapping)(__u64 *start, size_t len, unsigned long offset);
 };
 
-#define ANON_EXPECTED_IOCTLS   ((1 << _UFFDIO_WAKE) | \
+#define SHMEM_EXPECTED_IOCTLS  ((1 << _UFFDIO_WAKE) | \
 (1 << _UFFDIO_COPY) | \
 (1 << _UFFDIO_ZEROPAGE))
 
+#define ANON_EXPECTED_IOCTLS   ((1 << _UFFDIO_WAKE) | \
+(1 << _UFFDIO_COPY) | \
+(1 << _UFFDIO_ZEROPAGE) | \
+(1 << _UFFDIO_WRITEPROTECT))
+
 static struct uffd_test_ops anon_uffd_test_ops = {
.expected_ioctls = ANON_EXPECTED_IOCTLS,
.allocate_area  = anon_allocate_area,
@@ -276,7 +305,7 @@ static struct uffd_test_ops anon_uffd_test_ops = {
 };
 
 static struct uffd_test_ops shmem_uffd_test_ops = {
-   .expected_ioctls = ANON_EXPECTED_IOCTLS,
+   .expected_ioctls = SHMEM_EXPECTED_IOCTLS,
.allocate_area  = shmem_allocate_area,
.release_pages  = shmem_release_pages,
.alias_mapping = noop_alias_mapping,
@@ -300,6 +329,21 @@ static int my_bcmp(char *str1, char *str2, size_t n)
return 0;
 }
 
+static void wp_range(int ufd, __u64 start, __u64 len, bool wp)
+{
+   struct uffdio_writeprotect prms = { 0 };
+
+   /* Write protection page faults */
+   prms.range.start = start;
+   prms.range.len = len;
+   /* Undo write-protect, do wakeup after that */
+   prms.mode = wp ? UFFDIO_WRITEPROTECT_MODE_WP : 0;
+
+   if (ioctl(ufd, UFFDIO_WRITEP

[PATCH v4 25/27] userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally

2019-04-25 Thread Peter Xu

Only declare _UFFDIO_WRITEPROTECT if the user specified
UFFDIO_REGISTER_MODE_WP and if all the checks passed.  Then when the
user registers regions with shmem/hugetlbfs we won't expose the new
ioctl to them.  Even with complete anonymous memory range, we'll only
expose the new WP ioctl bit if the register mode has MODE_WP.

Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 fs/userfaultfd.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index f1f61a0278c2..7f87e9e4fb9b 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1456,14 +1456,24 @@ static int userfaultfd_register(struct userfaultfd_ctx 
*ctx,
up_write(&mm->mmap_sem);
mmput(mm);
if (!ret) {
+   __u64 ioctls_out;
+
+   ioctls_out = basic_ioctls ? UFFD_API_RANGE_IOCTLS_BASIC :
+   UFFD_API_RANGE_IOCTLS;
+
+   /*
+* Declare the WP ioctl only if the WP mode is
+* specified and all checks passed with the range
+*/
+   if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_WP))
+   ioctls_out &= ~((__u64)1 << _UFFDIO_WRITEPROTECT);
+
/*
 * Now that we scanned all vmas we can already tell
 * userland which ioctls methods are guaranteed to
 * succeed on this range.
 */
-   if (put_user(basic_ioctls ? UFFD_API_RANGE_IOCTLS_BASIC :
-UFFD_API_RANGE_IOCTLS,
-&user_uffdio_register->ioctls))
+   if (put_user(ioctls_out, &user_uffdio_register->ioctls))
ret = -EFAULT;
}
 out:
-- 
2.17.1

[PATCH v4 22/27] userfaultfd: wp: enabled write protection in userfaultfd API

2019-04-25 Thread Peter Xu

From: Shaohua Li 

Now it's safe to enable write protection in userfaultfd API

Cc: Andrea Arcangeli 
Cc: Pavel Emelyanov 
Cc: Rik van Riel 
Cc: Kirill A. Shutemov 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Signed-off-by: Shaohua Li 
Signed-off-by: Andrea Arcangeli 
Reviewed-by: Jerome Glisse 
Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 include/uapi/linux/userfaultfd.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index 95c4a160e5f8..e7e98bde221f 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -19,7 +19,8 @@
  * means the userland is reading).
  */
 #define UFFD_API ((__u64)0xAA)
-#define UFFD_API_FEATURES (UFFD_FEATURE_EVENT_FORK |   \
+#define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP |\
+  UFFD_FEATURE_EVENT_FORK |\
   UFFD_FEATURE_EVENT_REMAP |   \
   UFFD_FEATURE_EVENT_REMOVE |  \
   UFFD_FEATURE_EVENT_UNMAP |   \
@@ -34,7 +35,8 @@
 #define UFFD_API_RANGE_IOCTLS  \
((__u64)1 << _UFFDIO_WAKE | \
 (__u64)1 << _UFFDIO_COPY | \
-(__u64)1 << _UFFDIO_ZEROPAGE)
+(__u64)1 << _UFFDIO_ZEROPAGE | \
+(__u64)1 << _UFFDIO_WRITEPROTECT)
 #define UFFD_API_RANGE_IOCTLS_BASIC\
((__u64)1 << _UFFDIO_WAKE | \
 (__u64)1 << _UFFDIO_COPY)
-- 
2.17.1

[PATCH v4 19/27] userfaultfd: introduce helper vma_find_uffd

2019-04-25 Thread Peter Xu

We've have multiple (and more coming) places that would like to find a
userfault enabled VMA from a mm struct that covers a specific memory
range.  This patch introduce the helper for it, meanwhile apply it to
the code.

Suggested-by: Mike Rapoport 
Reviewed-by: Jerome Glisse 
Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 mm/userfaultfd.c | 54 +++-
 1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 240de2a8492d..2606409572b2 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -20,6 +20,34 @@
 #include 
 #include "internal.h"
 
+/*
+ * Find a valid userfault enabled VMA region that covers the whole
+ * address range, or NULL on failure.  Must be called with mmap_sem
+ * held.
+ */
+static struct vm_area_struct *vma_find_uffd(struct mm_struct *mm,
+   unsigned long start,
+   unsigned long len)
+{
+   struct vm_area_struct *vma = find_vma(mm, start);
+
+   if (!vma)
+   return NULL;
+
+   /*
+* Check the vma is registered in uffd, this is required to
+* enforce the VM_MAYWRITE check done at uffd registration
+* time.
+*/
+   if (!vma->vm_userfaultfd_ctx.ctx)
+   return NULL;
+
+   if (start < vma->vm_start || start + len > vma->vm_end)
+   return NULL;
+
+   return vma;
+}
+
 static int mcopy_atomic_pte(struct mm_struct *dst_mm,
pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
@@ -228,20 +256,9 @@ static __always_inline ssize_t 
__mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 */
if (!dst_vma) {
err = -ENOENT;
-   dst_vma = find_vma(dst_mm, dst_start);
+   dst_vma = vma_find_uffd(dst_mm, dst_start, len);
if (!dst_vma || !is_vm_hugetlb_page(dst_vma))
goto out_unlock;
-   /*
-* Check the vma is registered in uffd, this is
-* required to enforce the VM_MAYWRITE check done at
-* uffd registration time.
-*/
-   if (!dst_vma->vm_userfaultfd_ctx.ctx)
-   goto out_unlock;
-
-   if (dst_start < dst_vma->vm_start ||
-   dst_start + len > dst_vma->vm_end)
-   goto out_unlock;
 
err = -EINVAL;
if (vma_hpagesize != vma_kernel_pagesize(dst_vma))
@@ -488,20 +505,9 @@ static __always_inline ssize_t __mcopy_atomic(struct 
mm_struct *dst_mm,
 * both valid and fully within a single existing vma.
 */
err = -ENOENT;
-   dst_vma = find_vma(dst_mm, dst_start);
+   dst_vma = vma_find_uffd(dst_mm, dst_start, len);
if (!dst_vma)
goto out_unlock;
-   /*
-* Check the vma is registered in uffd, this is required to
-* enforce the VM_MAYWRITE check done at uffd registration
-* time.
-*/
-   if (!dst_vma->vm_userfaultfd_ctx.ctx)
-   goto out_unlock;
-
-   if (dst_start < dst_vma->vm_start ||
-   dst_start + len > dst_vma->vm_end)
-   goto out_unlock;
 
err = -EINVAL;
/*
-- 
2.17.1

[PATCH v4 13/27] mm: introduce do_wp_page_cont()

2019-04-25 Thread Peter Xu

The userfaultfd handling in do_wp_page() is very special comparing to
the rest of the function because it only postpones the real handling
of the page fault to the userspace program.  Isolate the handling part
of do_wp_page() into a new function called do_wp_page_cont() so that
we can use it somewhere else when resolving the userfault page fault.

Signed-off-by: Peter Xu 
---
 include/linux/mm.h | 2 ++
 mm/memory.c| 8 
 2 files changed, 10 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a5ac81188523..a2911de04cdd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -445,6 +445,8 @@ struct vm_fault {
 */
 };
 
+vm_fault_t do_wp_page_cont(struct vm_fault *vmf);
+
 /* page entry size for vm->huge_fault() */
 enum page_entry_size {
PE_SIZE_PTE = 0,
diff --git a/mm/memory.c b/mm/memory.c
index 64bd8075f054..ab98a1eb4702 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2497,6 +2497,14 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
return handle_userfault(vmf, VM_UFFD_WP);
}
 
+   return do_wp_page_cont(vmf);
+}
+
+vm_fault_t do_wp_page_cont(struct vm_fault *vmf)
+   __releases(vmf->ptl)
+{
+   struct vm_area_struct *vma = vmf->vma;
+
vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
if (!vmf->page) {
/*
-- 
2.17.1

[PATCH v4 24/27] userfaultfd: wp: UFFDIO_REGISTER_MODE_WP documentation update

2019-04-25 Thread Peter Xu

From: Martin Cracauer 

Adds documentation about the write protection support.

Signed-off-by: Martin Cracauer 
Signed-off-by: Andrea Arcangeli 
[peterx: rewrite in rst format; fixups here and there]
Reviewed-by: Jerome Glisse 
Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 Documentation/admin-guide/mm/userfaultfd.rst | 51 
 1 file changed, 51 insertions(+)

diff --git a/Documentation/admin-guide/mm/userfaultfd.rst 
b/Documentation/admin-guide/mm/userfaultfd.rst
index 5048cf661a8a..c30176e67900 100644
--- a/Documentation/admin-guide/mm/userfaultfd.rst
+++ b/Documentation/admin-guide/mm/userfaultfd.rst
@@ -108,6 +108,57 @@ UFFDIO_COPY. They're atomic as in guaranteeing that 
nothing can see an
 half copied page since it'll keep userfaulting until the copy has
 finished.
 
+Notes:
+
+- If you requested UFFDIO_REGISTER_MODE_MISSING when registering then
+  you must provide some kind of page in your thread after reading from
+  the uffd.  You must provide either UFFDIO_COPY or UFFDIO_ZEROPAGE.
+  The normal behavior of the OS automatically providing a zero page on
+  an annonymous mmaping is not in place.
+
+- None of the page-delivering ioctls default to the range that you
+  registered with.  You must fill in all fields for the appropriate
+  ioctl struct including the range.
+
+- You get the address of the access that triggered the missing page
+  event out of a struct uffd_msg that you read in the thread from the
+  uffd.  You can supply as many pages as you want with UFFDIO_COPY or
+  UFFDIO_ZEROPAGE.  Keep in mind that unless you used DONTWAKE then
+  the first of any of those IOCTLs wakes up the faulting thread.
+
+- Be sure to test for all errors including (pollfd[0].revents &
+  POLLERR).  This can happen, e.g. when ranges supplied were
+  incorrect.
+
+Write Protect Notifications
+---
+
+This is equivalent to (but faster than) using mprotect and a SIGSEGV
+signal handler.
+
+Firstly you need to register a range with UFFDIO_REGISTER_MODE_WP.
+Instead of using mprotect(2) you use ioctl(uffd, UFFDIO_WRITEPROTECT,
+struct *uffdio_writeprotect) while mode = UFFDIO_WRITEPROTECT_MODE_WP
+in the struct passed in.  The range does not default to and does not
+have to be identical to the range you registered with.  You can write
+protect as many ranges as you like (inside the registered range).
+Then, in the thread reading from uffd the struct will have
+msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP set. Now you send
+ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect) again
+while pagefault.mode does not have UFFDIO_WRITEPROTECT_MODE_WP set.
+This wakes up the thread which will continue to run with writes. This
+allows you to do the bookkeeping about the write in the uffd reading
+thread before the ioctl.
+
+If you registered with both UFFDIO_REGISTER_MODE_MISSING and
+UFFDIO_REGISTER_MODE_WP then you need to think about the sequence in
+which you supply a page and undo write protect.  Note that there is a
+difference between writes into a WP area and into a !WP area.  The
+former will have UFFD_PAGEFAULT_FLAG_WP set, the latter
+UFFD_PAGEFAULT_FLAG_WRITE.  The latter did not fail on protection but
+you still need to supply a page when UFFDIO_REGISTER_MODE_MISSING was
+used.
+
 QEMU/KVM
 
 
-- 
2.17.1

[PATCH v4 21/27] userfaultfd: wp: add the writeprotect API to userfaultfd ioctl

2019-04-25 Thread Peter Xu

From: Andrea Arcangeli 

v1: From: Shaohua Li 

v2: cleanups, remove a branch.

[peterx writes up the commit message, as below...]

This patch introduces the new uffd-wp APIs for userspace.

Firstly, we'll allow to do UFFDIO_REGISTER with write protection
tracking using the new UFFDIO_REGISTER_MODE_WP flag.  Note that this
flag can co-exist with the existing UFFDIO_REGISTER_MODE_MISSING, in
which case the userspace program can not only resolve missing page
faults, and at the same time tracking page data changes along the way.

Secondly, we introduced the new UFFDIO_WRITEPROTECT API to do page
level write protection tracking.  Note that we will need to register
the memory region with UFFDIO_REGISTER_MODE_WP before that.

Signed-off-by: Andrea Arcangeli 
[peterx: remove useless block, write commit message, check against
 VM_MAYWRITE rather than VM_WRITE when register]
Reviewed-by: Jerome Glisse 
Signed-off-by: Peter Xu 
---
 fs/userfaultfd.c | 82 +---
 include/uapi/linux/userfaultfd.h | 23 +
 2 files changed, 89 insertions(+), 16 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 3092885c9d2c..81962d62520c 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -304,8 +304,11 @@ static inline bool userfaultfd_must_wait(struct 
userfaultfd_ctx *ctx,
if (!pmd_present(_pmd))
goto out;
 
-   if (pmd_trans_huge(_pmd))
+   if (pmd_trans_huge(_pmd)) {
+   if (!pmd_write(_pmd) && (reason & VM_UFFD_WP))
+   ret = true;
goto out;
+   }
 
/*
 * the pmd is stable (as in !pmd_trans_unstable) so we can re-read it
@@ -318,6 +321,8 @@ static inline bool userfaultfd_must_wait(struct 
userfaultfd_ctx *ctx,
 */
if (pte_none(*pte))
ret = true;
+   if (!pte_write(*pte) && (reason & VM_UFFD_WP))
+   ret = true;
pte_unmap(pte);
 
 out:
@@ -1251,10 +1256,13 @@ static __always_inline int validate_range(struct 
mm_struct *mm,
return 0;
 }
 
-static inline bool vma_can_userfault(struct vm_area_struct *vma)
+static inline bool vma_can_userfault(struct vm_area_struct *vma,
+unsigned long vm_flags)
 {
-   return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
-   vma_is_shmem(vma);
+   /* FIXME: add WP support to hugetlbfs and shmem */
+   return vma_is_anonymous(vma) ||
+   ((is_vm_hugetlb_page(vma) || vma_is_shmem(vma)) &&
+!(vm_flags & VM_UFFD_WP));
 }
 
 static int userfaultfd_register(struct userfaultfd_ctx *ctx,
@@ -1286,15 +1294,8 @@ static int userfaultfd_register(struct userfaultfd_ctx 
*ctx,
vm_flags = 0;
if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING)
vm_flags |= VM_UFFD_MISSING;
-   if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) {
+   if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP)
vm_flags |= VM_UFFD_WP;
-   /*
-* FIXME: remove the below error constraint by
-* implementing the wprotect tracking mode.
-*/
-   ret = -EINVAL;
-   goto out;
-   }
 
ret = validate_range(mm, uffdio_register.range.start,
 uffdio_register.range.len);
@@ -1342,7 +1343,7 @@ static int userfaultfd_register(struct userfaultfd_ctx 
*ctx,
 
/* check not compatible vmas */
ret = -EINVAL;
-   if (!vma_can_userfault(cur))
+   if (!vma_can_userfault(cur, vm_flags))
goto out_unlock;
 
/*
@@ -1370,6 +1371,8 @@ static int userfaultfd_register(struct userfaultfd_ctx 
*ctx,
if (end & (vma_hpagesize - 1))
goto out_unlock;
}
+   if ((vm_flags & VM_UFFD_WP) && !(cur->vm_flags & VM_MAYWRITE))
+   goto out_unlock;
 
/*
 * Check that this vma isn't already owned by a
@@ -1399,7 +1402,7 @@ static int userfaultfd_register(struct userfaultfd_ctx 
*ctx,
do {
cond_resched();
 
-   BUG_ON(!vma_can_userfault(vma));
+   BUG_ON(!vma_can_userfault(vma, vm_flags));
BUG_ON(vma->vm_userfaultfd_ctx.ctx &&
   vma->vm_userfaultfd_ctx.ctx != ctx);
WARN_ON(!(vma->vm_flags & VM_MAYWRITE));
@@ -1534,7 +1537,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx 
*ctx,
 * provides for more strict behavior to notice
 * unregistration errors.
 */
-   if (!vma_can_userfault(cur))
+   if (!vma_can_userfault(cur, cur->vm_flags))
goto out_unlock;
 
found = true;
@@ -1548,7 +1551,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx

[PATCH v4 23/27] userfaultfd: wp: don't wake up when doing write protect

2019-04-25 Thread Peter Xu

It does not make sense to try to wake up any waiting thread when we're
write-protecting a memory region.  Only wake up when resolving a write
protected page fault.

Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 fs/userfaultfd.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 81962d62520c..f1f61a0278c2 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1771,6 +1771,7 @@ static int userfaultfd_writeprotect(struct 
userfaultfd_ctx *ctx,
struct uffdio_writeprotect uffdio_wp;
struct uffdio_writeprotect __user *user_uffdio_wp;
struct userfaultfd_wake_range range;
+   bool mode_wp, mode_dontwake;
 
if (READ_ONCE(ctx->mmap_changing))
return -EAGAIN;
@@ -1789,18 +1790,20 @@ static int userfaultfd_writeprotect(struct 
userfaultfd_ctx *ctx,
if (uffdio_wp.mode & ~(UFFDIO_WRITEPROTECT_MODE_DONTWAKE |
   UFFDIO_WRITEPROTECT_MODE_WP))
return -EINVAL;
-   if ((uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP) &&
-(uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_DONTWAKE))
+
+   mode_wp = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP;
+   mode_dontwake = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_DONTWAKE;
+
+   if (mode_wp && mode_dontwake)
return -EINVAL;
 
ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start,
- uffdio_wp.range.len, uffdio_wp.mode &
- UFFDIO_WRITEPROTECT_MODE_WP,
+ uffdio_wp.range.len, mode_wp,
  &ctx->mmap_changing);
if (ret)
return ret;
 
-   if (!(uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_DONTWAKE)) {
+   if (!mode_wp && !mode_dontwake) {
range.start = uffdio_wp.range.start;
range.len = uffdio_wp.range.len;
wake_userfault(ctx, &range);
-- 
2.17.1

[PATCH v4 15/27] userfaultfd: wp: drop _PAGE_UFFD_WP properly when fork

2019-04-25 Thread Peter Xu

UFFD_EVENT_FORK support for uffd-wp should be already there, except
that we should clean the uffd-wp bit if uffd fork event is not
enabled.  Detect that to avoid _PAGE_UFFD_WP being set even if the VMA
is not being tracked by VM_UFFD_WP.  Do this for both small PTEs and
huge PMDs.

Reviewed-by: Jerome Glisse 
Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 mm/huge_memory.c | 8 
 mm/memory.c  | 8 
 2 files changed, 16 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3885747d4901..cf8f11d6e6cd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -976,6 +976,14 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
ret = -EAGAIN;
pmd = *src_pmd;
 
+   /*
+* Make sure the _PAGE_UFFD_WP bit is cleared if the new VMA
+* does not have the VM_UFFD_WP, which means that the uffd
+* fork event is not enabled.
+*/
+   if (!(vma->vm_flags & VM_UFFD_WP))
+   pmd = pmd_clear_uffd_wp(pmd);
+
 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
if (unlikely(is_swap_pmd(pmd))) {
swp_entry_t entry = pmd_to_swp_entry(pmd);
diff --git a/mm/memory.c b/mm/memory.c
index 965d974bb9bd..2abf0934ad7f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -789,6 +789,14 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
pte = pte_mkclean(pte);
pte = pte_mkold(pte);
 
+   /*
+* Make sure the _PAGE_UFFD_WP bit is cleared if the new VMA
+* does not have the VM_UFFD_WP, which means that the uffd
+* fork event is not enabled.
+*/
+   if (!(vm_flags & VM_UFFD_WP))
+   pte = pte_clear_uffd_wp(pte);
+
page = vm_normal_page(vma, addr, pte);
if (page) {
get_page(page);
-- 
2.17.1

[PATCH v4 20/27] userfaultfd: wp: support write protection for userfault vma range

2019-04-25 Thread Peter Xu

From: Shaohua Li 

Add API to enable/disable writeprotect a vma range. Unlike mprotect,
this doesn't split/merge vmas.

Cc: Andrea Arcangeli 
Cc: Rik van Riel 
Cc: Kirill A. Shutemov 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Signed-off-by: Shaohua Li 
Signed-off-by: Andrea Arcangeli 
[peterx:
 - use the helper to find VMA;
 - return -ENOENT if not found to match mcopy case;
 - use the new MM_CP_UFFD_WP* flags for change_protection
 - check against mmap_changing for failures]
Reviewed-by: Jerome Glisse 
Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 include/linux/userfaultfd_k.h |  3 ++
 mm/userfaultfd.c  | 54 +++
 2 files changed, 57 insertions(+)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 765ce884cec0..8f6e6ed544fb 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
  unsigned long dst_start,
  unsigned long len,
  bool *mmap_changing);
+extern int mwriteprotect_range(struct mm_struct *dst_mm,
+  unsigned long start, unsigned long len,
+  bool enable_wp, bool *mmap_changing);
 
 /* mm helpers */
 static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 2606409572b2..70cea2ff3960 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned 
long start,
 {
return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0);
 }
+
+int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
+   unsigned long len, bool enable_wp, bool *mmap_changing)
+{
+   struct vm_area_struct *dst_vma;
+   pgprot_t newprot;
+   int err;
+
+   /*
+* Sanitize the command parameters:
+*/
+   BUG_ON(start & ~PAGE_MASK);
+   BUG_ON(len & ~PAGE_MASK);
+
+   /* Does the address range wrap, or is the span zero-sized? */
+   BUG_ON(start + len <= start);
+
+   down_read(&dst_mm->mmap_sem);
+
+   /*
+* If memory mappings are changing because of non-cooperative
+* operation (e.g. mremap) running in parallel, bail out and
+* request the user to retry later
+*/
+   err = -EAGAIN;
+   if (mmap_changing && READ_ONCE(*mmap_changing))
+   goto out_unlock;
+
+   err = -ENOENT;
+   dst_vma = vma_find_uffd(dst_mm, start, len);
+   /*
+* Make sure the vma is not shared, that the dst range is
+* both valid and fully within a single existing vma.
+*/
+   if (!dst_vma || (dst_vma->vm_flags & VM_SHARED))
+   goto out_unlock;
+   if (!userfaultfd_wp(dst_vma))
+   goto out_unlock;
+   if (!vma_is_anonymous(dst_vma))
+   goto out_unlock;
+
+   if (enable_wp)
+   newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE));
+   else
+   newprot = vm_get_page_prot(dst_vma->vm_flags);
+
+   change_protection(dst_vma, start, start + len, newprot,
+ enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE);
+
+   err = 0;
+out_unlock:
+   up_read(&dst_mm->mmap_sem);
+   return err;
+}
-- 
2.17.1

[PATCH v4 18/27] khugepaged: skip collapse if uffd-wp detected

2019-04-25 Thread Peter Xu

Don't collapse the huge PMD if there is any userfault write protected
small PTEs.  The problem is that the write protection is in small page
granularity and there's no way to keep all these write protection
information if the small pages are going to be merged into a huge PMD.

The same thing needs to be considered for swap entries and migration
entries.  So do the check as well disregarding khugepaged_max_ptes_swap.

Reviewed-by: Jerome Glisse 
Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 include/trace/events/huge_memory.h |  1 +
 mm/khugepaged.c| 23 +++
 2 files changed, 24 insertions(+)

diff --git a/include/trace/events/huge_memory.h 
b/include/trace/events/huge_memory.h
index dd4db334bd63..2d7bad9cb976 100644
--- a/include/trace/events/huge_memory.h
+++ b/include/trace/events/huge_memory.h
@@ -13,6 +13,7 @@
EM( SCAN_PMD_NULL,  "pmd_null") \
EM( SCAN_EXCEED_NONE_PTE,   "exceed_none_pte")  \
EM( SCAN_PTE_NON_PRESENT,   "pte_non_present")  \
+   EM( SCAN_PTE_UFFD_WP,   "pte_uffd_wp")  \
EM( SCAN_PAGE_RO,   "no_writable_page") \
EM( SCAN_LACK_REFERENCED_PAGE,  "lack_referenced_page") \
EM( SCAN_PAGE_NULL, "page_null")\
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 449044378782..6aa9935317d4 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -29,6 +29,7 @@ enum scan_result {
SCAN_PMD_NULL,
SCAN_EXCEED_NONE_PTE,
SCAN_PTE_NON_PRESENT,
+   SCAN_PTE_UFFD_WP,
SCAN_PAGE_RO,
SCAN_LACK_REFERENCED_PAGE,
SCAN_PAGE_NULL,
@@ -1124,6 +1125,15 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
pte_t pteval = *_pte;
if (is_swap_pte(pteval)) {
if (++unmapped <= khugepaged_max_ptes_swap) {
+   /*
+* Always be strict with uffd-wp
+* enabled swap entries.  Please see
+* comment below for pte_uffd_wp().
+*/
+   if (pte_swp_uffd_wp(pteval)) {
+   result = SCAN_PTE_UFFD_WP;
+   goto out_unmap;
+   }
continue;
} else {
result = SCAN_EXCEED_SWAP_PTE;
@@ -1143,6 +1153,19 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
result = SCAN_PTE_NON_PRESENT;
goto out_unmap;
}
+   if (pte_uffd_wp(pteval)) {
+   /*
+* Don't collapse the page if any of the small
+* PTEs are armed with uffd write protection.
+* Here we can also mark the new huge pmd as
+* write protected if any of the small ones is
+* marked but that could bring uknown
+* userfault messages that falls outside of
+* the registered range.  So, just be simple.
+*/
+   result = SCAN_PTE_UFFD_WP;
+   goto out_unmap;
+   }
if (pte_write(pteval))
writable = true;
 
-- 
2.17.1

[PATCH v4 17/27] userfaultfd: wp: support swap and page migration

2019-04-25 Thread Peter Xu

For either swap and page migration, we all use the bit 2 of the entry to
identify whether this entry is uffd write-protected.  It plays a similar
role as the existing soft dirty bit in swap entries but only for keeping
the uffd-wp tracking for a specific PTE/PMD.

Something special here is that when we want to recover the uffd-wp bit
from a swap/migration entry to the PTE bit we'll also need to take care
of the _PAGE_RW bit and make sure it's cleared, otherwise even with the
_PAGE_UFFD_WP bit we can't trap it at all.

In change_pte_range() we do nothing for uffd if the PTE is a swap
entry.  That can lead to data mismatch if the page that we are going
to write protect is swapped out when sending the UFFDIO_WRITEPROTECT.
This patch also applies/removes the uffd-wp bit even for the swap
entries.

Signed-off-by: Peter Xu 
---
 include/linux/swapops.h |  2 ++
 mm/huge_memory.c|  3 +++
 mm/memory.c |  8 
 mm/migrate.c|  6 ++
 mm/mprotect.c   | 28 +---
 mm/rmap.c   |  6 ++
 6 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 4d961668e5fc..0c2923b1cdb7 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -68,6 +68,8 @@ static inline swp_entry_t pte_to_swp_entry(pte_t pte)
 
if (pte_swp_soft_dirty(pte))
pte = pte_swp_clear_soft_dirty(pte);
+   if (pte_swp_uffd_wp(pte))
+   pte = pte_swp_clear_uffd_wp(pte);
arch_entry = __pte_to_swp_entry(pte);
return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index cf8f11d6e6cd..998a7e5d625e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2212,6 +2212,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct 
*vma, pmd_t *pmd,
write = is_write_migration_entry(entry);
young = false;
soft_dirty = pmd_swp_soft_dirty(old_pmd);
+   uffd_wp = pmd_swp_uffd_wp(old_pmd);
} else {
page = pmd_page(old_pmd);
if (pmd_dirty(old_pmd))
@@ -2244,6 +2245,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct 
*vma, pmd_t *pmd,
entry = swp_entry_to_pte(swp_entry);
if (soft_dirty)
entry = pte_swp_mksoft_dirty(entry);
+   if (uffd_wp)
+   entry = pte_swp_mkuffd_wp(entry);
} else {
entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot));
entry = maybe_mkwrite(entry, vma);
diff --git a/mm/memory.c b/mm/memory.c
index 2abf0934ad7f..f53f54592ddc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -737,6 +737,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
pte = swp_entry_to_pte(entry);
if (pte_swp_soft_dirty(*src_pte))
pte = pte_swp_mksoft_dirty(pte);
+   if (pte_swp_uffd_wp(*src_pte))
+   pte = pte_swp_mkuffd_wp(pte);
set_pte_at(src_mm, addr, src_pte, pte);
}
} else if (is_device_private_entry(entry)) {
@@ -766,6 +768,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
is_cow_mapping(vm_flags)) {
make_device_private_entry_read(&entry);
pte = swp_entry_to_pte(entry);
+   if (pte_swp_uffd_wp(*src_pte))
+   pte = pte_swp_mkuffd_wp(pte);
set_pte_at(src_mm, addr, src_pte, pte);
}
}
@@ -2854,6 +2858,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
flush_icache_page(vma, page);
if (pte_swp_soft_dirty(vmf->orig_pte))
pte = pte_mksoft_dirty(pte);
+   if (pte_swp_uffd_wp(vmf->orig_pte)) {
+   pte = pte_mkuffd_wp(pte);
+   pte = pte_wrprotect(pte);
+   }
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte);
vmf->orig_pte = pte;
diff --git a/mm/migrate.c b/mm/migrate.c
index 663a5449367a..deff1f8c20af 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -241,11 +241,15 @@ static bool remove_migration_pte(struct page *page, 
struct vm_area_struct *vma,
entry = pte_to_swp_entry(*pvmw.pte);
if (is_write_migration_entry(entry))
pte = maybe_mkwrite(pte, vma);
+   else if (pte_swp_uffd_wp(*pvmw.pte))
+   pte = pte_mkuffd_wp(pte);
 
if (unlikely(is_zone_device_page(new))

[PATCH v4 16/27] userfaultfd: wp: add pmd_swp_*uffd_wp() helpers

2019-04-25 Thread Peter Xu

Adding these missing helpers for uffd-wp operations with pmd
swap/migration entries.

Reviewed-by: Jerome Glisse 
Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 arch/x86/include/asm/pgtable.h | 15 +++
 include/asm-generic/pgtable_uffd.h | 15 +++
 2 files changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 6863236e8484..18a815d6f4ea 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1401,6 +1401,21 @@ static inline pte_t pte_swp_clear_uffd_wp(pte_t pte)
 {
return pte_clear_flags(pte, _PAGE_SWP_UFFD_WP);
 }
+
+static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd)
+{
+   return pmd_set_flags(pmd, _PAGE_SWP_UFFD_WP);
+}
+
+static inline int pmd_swp_uffd_wp(pmd_t pmd)
+{
+   return pmd_flags(pmd) & _PAGE_SWP_UFFD_WP;
+}
+
+static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd)
+{
+   return pmd_clear_flags(pmd, _PAGE_SWP_UFFD_WP);
+}
 #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
 
 #define PKRU_AD_BIT 0x1
diff --git a/include/asm-generic/pgtable_uffd.h 
b/include/asm-generic/pgtable_uffd.h
index 643d1bf559c2..828966d4c281 100644
--- a/include/asm-generic/pgtable_uffd.h
+++ b/include/asm-generic/pgtable_uffd.h
@@ -46,6 +46,21 @@ static __always_inline pte_t pte_swp_clear_uffd_wp(pte_t pte)
 {
return pte;
 }
+
+static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd)
+{
+   return pmd;
+}
+
+static inline int pmd_swp_uffd_wp(pmd_t pmd)
+{
+   return 0;
+}
+
+static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd)
+{
+   return pmd;
+}
 #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
 
 #endif /* _ASM_GENERIC_PGTABLE_UFFD_H */
-- 
2.17.1

[PATCH v4 14/27] userfaultfd: wp: handle COW properly for uffd-wp

2019-04-25 Thread Peter Xu

This allows uffd-wp to support write-protected pages for COW.

For example, the uffd write-protected PTE could also be write-protected
by other usages like COW or zero pages.  When that happens, we can't
simply set the write bit in the PTE since otherwise it'll change the
content of every single reference to the page.  Instead, we should do
the COW first if necessary, then handle the uffd-wp fault.

To correctly copy the page, we'll also need to carry over the
_PAGE_UFFD_WP bit if it was set in the original PTE.

For huge PMDs, we just simply split the huge PMDs where we want to
resolve an uffd-wp page fault always.  That matches what we do with
general huge PMD write protections.  In that way, we resolved the huge
PMD copy-on-write issue into PTE copy-on-write.

Signed-off-by: Peter Xu 
---
 mm/memory.c   |  5 -
 mm/mprotect.c | 55 ---
 2 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index ab98a1eb4702..965d974bb9bd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2299,7 +2299,10 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
}
flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
entry = mk_pte(new_page, vma->vm_page_prot);
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+   if (pte_uffd_wp(vmf->orig_pte))
+   entry = pte_mkuffd_wp(entry);
+   else
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
/*
 * Clear the pte entry and flush it first, before updating the
 * pte with the new entry. This will avoid a race condition
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 732d9b6d1d21..1f40662182f8 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -73,18 +73,18 @@ static unsigned long change_pte_range(struct vm_area_struct 
*vma, pmd_t *pmd,
flush_tlb_batched_pending(vma->vm_mm);
arch_enter_lazy_mmu_mode();
do {
+retry_pte:
oldpte = *pte;
if (pte_present(oldpte)) {
pte_t ptent;
bool preserve_write = prot_numa && pte_write(oldpte);
+   struct page *page;
 
/*
 * Avoid trapping faults against the zero or KSM
 * pages. See similar comment in change_huge_pmd.
 */
if (prot_numa) {
-   struct page *page;
-
page = vm_normal_page(vma, addr, oldpte);
if (!page || PageKsm(page))
continue;
@@ -114,6 +114,45 @@ static unsigned long change_pte_range(struct 
vm_area_struct *vma, pmd_t *pmd,
continue;
}
 
+   /*
+* Detect whether we'll need to COW before
+* resolving an uffd-wp fault.  Note that this
+* includes detection of the zero page (where
+* page==NULL)
+*/
+   if (uffd_wp_resolve) {
+   struct vm_fault vmf = {
+   .vma = vma,
+   .address = addr & PAGE_MASK,
+   .orig_pte = oldpte,
+   .pmd = pmd,
+   .pte = pte,
+   .ptl = ptl,
+   };
+   vm_fault_t ret;
+
+   /* If the fault is resolved already, skip */
+   if (!pte_uffd_wp(*pte))
+   continue;
+
+   arch_leave_lazy_mmu_mode();
+   /* With PTE lock held */
+   ret = do_wp_page_cont(&vmf);
+   if (ret != VM_FAULT_WRITE && ret != 0)
+   /* Probably OOM */
+   return pages;
+   pte = pte_offset_map_lock(vma->vm_mm, pmd,
+ addr, &ptl);
+   arch_enter_lazy_mmu_mode();
+   if (ret == 0 || !pte_present(*pte))
+   /*
+* This PTE could have been modified
+* during or after COW before taking
+* the lock; retry.
+*/
+   goto retry_pte;
+   }
+

[PATCH v4 12/27] userfaultfd: wp: apply _PAGE_UFFD_WP bit

2019-04-25 Thread Peter Xu

Firstly, introduce two new flags MM_CP_UFFD_WP[_RESOLVE] for
change_protection() when used with uffd-wp and make sure the two new
flags are exclusively used.  Then,

  - For MM_CP_UFFD_WP: apply the _PAGE_UFFD_WP bit and remove _PAGE_RW
when a range of memory is write protected by uffd

  - For MM_CP_UFFD_WP_RESOLVE: remove the _PAGE_UFFD_WP bit and recover
_PAGE_RW when write protection is resolved from userspace

And use this new interface in mwriteprotect_range() to replace the old
MM_CP_DIRTY_ACCT.

Do this change for both PTEs and huge PMDs.  Then we can start to
identify which PTE/PMD is write protected by general (e.g., COW or soft
dirty tracking), and which is for userfaultfd-wp.

Since we should keep the _PAGE_UFFD_WP when doing pte_modify(), add it
into _PAGE_CHG_MASK as well.  Meanwhile, since we have this new bit, we
can be even more strict when detecting uffd-wp page faults in either
do_wp_page() or wp_huge_pmd().

Reviewed-by: Jerome Glisse 
Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 include/linux/mm.h |  5 +
 mm/huge_memory.c   | 14 +-
 mm/memory.c|  4 ++--
 mm/mprotect.c  | 12 
 mm/userfaultfd.c   |  8 ++--
 5 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 086e69d4439d..a5ac81188523 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1652,6 +1652,11 @@ extern unsigned long move_page_tables(struct 
vm_area_struct *vma,
 #define  MM_CP_DIRTY_ACCT  (1UL << 0)
 /* Whether this protection change is for NUMA hints */
 #define  MM_CP_PROT_NUMA   (1UL << 1)
+/* Whether this change is for write protecting */
+#define  MM_CP_UFFD_WP (1UL << 2) /* do wp */
+#define  MM_CP_UFFD_WP_RESOLVE (1UL << 3) /* Resolve wp */
+#define  MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \
+   MM_CP_UFFD_WP_RESOLVE)
 
 extern unsigned long change_protection(struct vm_area_struct *vma, unsigned 
long start,
  unsigned long end, pgprot_t newprot,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 64d26b1989d2..3885747d4901 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1907,6 +1907,8 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t 
*pmd,
bool preserve_write;
int ret;
bool prot_numa = cp_flags & MM_CP_PROT_NUMA;
+   bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
+   bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
 
ptl = __pmd_trans_huge_lock(pmd, vma);
if (!ptl)
@@ -1973,6 +1975,13 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t 
*pmd,
entry = pmd_modify(entry, newprot);
if (preserve_write)
entry = pmd_mk_savedwrite(entry);
+   if (uffd_wp) {
+   entry = pmd_wrprotect(entry);
+   entry = pmd_mkuffd_wp(entry);
+   } else if (uffd_wp_resolve) {
+   entry = pmd_mkwrite(entry);
+   entry = pmd_clear_uffd_wp(entry);
+   }
ret = HPAGE_PMD_NR;
set_pmd_at(mm, addr, pmd, entry);
BUG_ON(vma_is_anonymous(vma) && !preserve_write && pmd_write(entry));
@@ -2120,7 +2129,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct 
*vma, pmd_t *pmd,
struct page *page;
pgtable_t pgtable;
pmd_t old_pmd, _pmd;
-   bool young, write, soft_dirty, pmd_migration = false;
+   bool young, write, soft_dirty, pmd_migration = false, uffd_wp = false;
unsigned long addr;
int i;
 
@@ -2202,6 +2211,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct 
*vma, pmd_t *pmd,
write = pmd_write(old_pmd);
young = pmd_young(old_pmd);
soft_dirty = pmd_soft_dirty(old_pmd);
+   uffd_wp = pmd_uffd_wp(old_pmd);
}
VM_BUG_ON_PAGE(!page_count(page), page);
page_ref_add(page, HPAGE_PMD_NR - 1);
@@ -2235,6 +2245,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct 
*vma, pmd_t *pmd,
entry = pte_mkold(entry);
if (soft_dirty)
entry = pte_mksoft_dirty(entry);
+   if (uffd_wp)
+   entry = pte_mkuffd_wp(entry);
}
pte = pte_offset_map(&_pmd, addr);
BUG_ON(!pte_none(*pte));
diff --git a/mm/memory.c b/mm/memory.c
index 8ccd4927b58d..64bd8075f054 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2492,7 +2492,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
 
-   if (userfaultfd_wp(vma)) {
+   if (userfaultfd_pte_wp(vma, *vmf->pte)) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
return handle_userfault(vmf, VM_UFFD_WP);
}
@@ -3713,7 +3713,7 @@ static inline vm_fault_t create_huge_pmd(struct vm

[PATCH v4 10/27] userfaultfd: wp: add UFFDIO_COPY_MODE_WP

2019-04-25 Thread Peter Xu

From: Andrea Arcangeli 

This allows UFFDIO_COPY to map pages write-protected.

Signed-off-by: Andrea Arcangeli 
[peterx: switch to VM_WARN_ON_ONCE in mfill_atomic_pte; add brackets
 around "dst_vma->vm_flags & VM_WRITE"; fix wordings in comments and
 commit messages]
Reviewed-by: Jerome Glisse 
Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 fs/userfaultfd.c |  5 +++--
 include/linux/userfaultfd_k.h|  2 +-
 include/uapi/linux/userfaultfd.h | 11 +-
 mm/userfaultfd.c | 36 ++--
 4 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index b397bc3b954d..3092885c9d2c 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1683,11 +1683,12 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
ret = -EINVAL;
if (uffdio_copy.src + uffdio_copy.len <= uffdio_copy.src)
goto out;
-   if (uffdio_copy.mode & ~UFFDIO_COPY_MODE_DONTWAKE)
+   if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP))
goto out;
if (mmget_not_zero(ctx->mm)) {
ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
-  uffdio_copy.len, &ctx->mmap_changing);
+  uffdio_copy.len, &ctx->mmap_changing,
+  uffdio_copy.mode);
mmput(ctx->mm);
} else {
return -ESRCH;
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index c6590c58ce28..765ce884cec0 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -34,7 +34,7 @@ extern vm_fault_t handle_userfault(struct vm_fault *vmf, 
unsigned long reason);
 
 extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
unsigned long src_start, unsigned long len,
-   bool *mmap_changing);
+   bool *mmap_changing, __u64 mode);
 extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
  unsigned long dst_start,
  unsigned long len,
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index 48f1a7c2f1f0..340f23bc251d 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -203,13 +203,14 @@ struct uffdio_copy {
__u64 dst;
__u64 src;
__u64 len;
+#define UFFDIO_COPY_MODE_DONTWAKE  ((__u64)1<<0)
/*
-* There will be a wrprotection flag later that allows to map
-* pages wrprotected on the fly. And such a flag will be
-* available if the wrprotection ioctl are implemented for the
-* range according to the uffdio_register.ioctls.
+* UFFDIO_COPY_MODE_WP will map the page write protected on
+* the fly.  UFFDIO_COPY_MODE_WP is available only if the
+* write protected ioctl is implemented for the range
+* according to the uffdio_register.ioctls.
 */
-#define UFFDIO_COPY_MODE_DONTWAKE  ((__u64)1<<0)
+#define UFFDIO_COPY_MODE_WP((__u64)1<<1)
__u64 mode;
 
/*
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index d59b5a73dfb3..eaecc21806da 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -25,7 +25,8 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
-   struct page **pagep)
+   struct page **pagep,
+   bool wp_copy)
 {
struct mem_cgroup *memcg;
pte_t _dst_pte, *dst_pte;
@@ -71,9 +72,9 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
if (mem_cgroup_try_charge(page, dst_mm, GFP_KERNEL, &memcg, false))
goto out_release;
 
-   _dst_pte = mk_pte(page, dst_vma->vm_page_prot);
-   if (dst_vma->vm_flags & VM_WRITE)
-   _dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte));
+   _dst_pte = pte_mkdirty(mk_pte(page, dst_vma->vm_page_prot));
+   if ((dst_vma->vm_flags & VM_WRITE) && !wp_copy)
+   _dst_pte = pte_mkwrite(_dst_pte);
 
dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
if (dst_vma->vm_file) {
@@ -399,7 +400,8 @@ static __always_inline ssize_t mfill_atomic_pte(struct 
mm_struct *dst_mm,
unsigned long dst_addr,
unsigned long src_addr,
struct page **page,
-   bool zeropage)
+   bool zeropage,
+   bool wp_copy)
 {
ssize_t

[PATCH v4 11/27] mm: merge parameters for change_protection()

2019-04-25 Thread Peter Xu

change_protection() was used by either the NUMA or mprotect() code,
there's one parameter for each of the callers (dirty_accountable and
prot_numa).  Further, these parameters are passed along the calls:

  - change_protection_range()
  - change_p4d_range()
  - change_pud_range()
  - change_pmd_range()
  - ...

Now we introduce a flag for change_protect() and all these helpers to
replace these parameters.  Then we can avoid passing multiple parameters
multiple times along the way.

More importantly, it'll greatly simplify the work if we want to
introduce any new parameters to change_protection().  In the follow up
patches, a new parameter for userfaultfd write protection will be
introduced.

No functional change at all.

Reviewed-by: Jerome Glisse 
Signed-off-by: Peter Xu 
---
 include/linux/huge_mm.h |  2 +-
 include/linux/mm.h  | 14 +-
 mm/huge_memory.c|  3 ++-
 mm/mempolicy.c  |  2 +-
 mm/mprotect.c   | 29 -
 5 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 381e872bfde0..1550fb12dbd4 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -46,7 +46,7 @@ extern bool move_huge_pmd(struct vm_area_struct *vma, 
unsigned long old_addr,
 pmd_t *old_pmd, pmd_t *new_pmd);
 extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, pgprot_t newprot,
-   int prot_numa);
+   unsigned long cp_flags);
 vm_fault_t vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd, pfn_t pfn, bool write);
 vm_fault_t vmf_insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bad93704abc8..086e69d4439d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1641,9 +1641,21 @@ extern unsigned long move_page_tables(struct 
vm_area_struct *vma,
unsigned long old_addr, struct vm_area_struct *new_vma,
unsigned long new_addr, unsigned long len,
bool need_rmap_locks);
+
+/*
+ * Flags used by change_protection().  For now we make it a bitmap so
+ * that we can pass in multiple flags just like parameters.  However
+ * for now all the callers are only use one of the flags at the same
+ * time.
+ */
+/* Whether we should allow dirty bit accounting */
+#define  MM_CP_DIRTY_ACCT  (1UL << 0)
+/* Whether this protection change is for NUMA hints */
+#define  MM_CP_PROT_NUMA   (1UL << 1)
+
 extern unsigned long change_protection(struct vm_area_struct *vma, unsigned 
long start,
  unsigned long end, pgprot_t newprot,
- int dirty_accountable, int prot_numa);
+ unsigned long cp_flags);
 extern int mprotect_fixup(struct vm_area_struct *vma,
  struct vm_area_struct **pprev, unsigned long start,
  unsigned long end, unsigned long newflags);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 165ea46bf149..64d26b1989d2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1899,13 +1899,14 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned 
long old_addr,
  *  - HPAGE_PMD_NR is protections changed and TLB flush necessary
  */
 int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
-   unsigned long addr, pgprot_t newprot, int prot_numa)
+   unsigned long addr, pgprot_t newprot, unsigned long cp_flags)
 {
struct mm_struct *mm = vma->vm_mm;
spinlock_t *ptl;
pmd_t entry;
bool preserve_write;
int ret;
+   bool prot_numa = cp_flags & MM_CP_PROT_NUMA;
 
ptl = __pmd_trans_huge_lock(pmd, vma);
if (!ptl)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 2219e747df49..825053818bcb 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -575,7 +575,7 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
 {
int nr_updated;
 
-   nr_updated = change_protection(vma, addr, end, PAGE_NONE, 0, 1);
+   nr_updated = change_protection(vma, addr, end, PAGE_NONE, 
MM_CP_PROT_NUMA);
if (nr_updated)
count_vm_numa_events(NUMA_PTE_UPDATES, nr_updated);
 
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 028c724dcb1a..98091408bd11 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -37,13 +37,15 @@
 
 static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end, pgprot_t newprot,
-   int dirty_accountable, int prot_numa)
+   unsigned long cp_flags)
 {
struct mm_struct *mm = vma->vm_mm;
pte_t *pte, oldpte;
spinlock_t *ptl;
unsigned long pages = 0;
int target_node = NUMA_NO_NODE;
+   bool dirty_accountable = cp_flag

[PATCH v4 06/27] userfaultfd: wp: add helper for writeprotect check

2019-04-25 Thread Peter Xu

From: Shaohua Li 

add helper for writeprotect check. Will use it later.

Cc: Andrea Arcangeli 
Cc: Pavel Emelyanov 
Cc: Rik van Riel 
Cc: Kirill A. Shutemov 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Signed-off-by: Shaohua Li 
Signed-off-by: Andrea Arcangeli 
Reviewed-by: Jerome Glisse 
Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 include/linux/userfaultfd_k.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 37c9eba75c98..38f748e7186e 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -50,6 +50,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct 
*vma)
return vma->vm_flags & VM_UFFD_MISSING;
 }
 
+static inline bool userfaultfd_wp(struct vm_area_struct *vma)
+{
+   return vma->vm_flags & VM_UFFD_WP;
+}
+
 static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 {
return vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP);
@@ -94,6 +99,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct 
*vma)
return false;
 }
 
+static inline bool userfaultfd_wp(struct vm_area_struct *vma)
+{
+   return false;
+}
+
 static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 {
return false;
-- 
2.17.1

[PATCH v4 09/27] userfaultfd: wp: userfaultfd_pte/huge_pmd_wp() helpers

2019-04-25 Thread Peter Xu

From: Andrea Arcangeli 

Implement helpers methods to invoke userfaultfd wp faults more
selectively: not only when a wp fault triggers on a vma with
vma->vm_flags VM_UFFD_WP set, but only if the _PAGE_UFFD_WP bit is set
in the pagetable too.

Signed-off-by: Andrea Arcangeli 
Reviewed-by: Jerome Glisse 
Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 include/linux/userfaultfd_k.h | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 38f748e7186e..c6590c58ce28 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -14,6 +14,8 @@
 #include  /* linux/include/uapi/linux/userfaultfd.h */
 
 #include 
+#include 
+#include 
 
 /*
  * CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining
@@ -55,6 +57,18 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma)
return vma->vm_flags & VM_UFFD_WP;
 }
 
+static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma,
+ pte_t pte)
+{
+   return userfaultfd_wp(vma) && pte_uffd_wp(pte);
+}
+
+static inline bool userfaultfd_huge_pmd_wp(struct vm_area_struct *vma,
+  pmd_t pmd)
+{
+   return userfaultfd_wp(vma) && pmd_uffd_wp(pmd);
+}
+
 static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 {
return vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP);
@@ -104,6 +118,19 @@ static inline bool userfaultfd_wp(struct vm_area_struct 
*vma)
return false;
 }
 
+static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma,
+ pte_t pte)
+{
+   return false;
+}
+
+static inline bool userfaultfd_huge_pmd_wp(struct vm_area_struct *vma,
+  pmd_t pmd)
+{
+   return false;
+}
+
+
 static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 {
return false;
-- 
2.17.1

[PATCH v4 08/27] userfaultfd: wp: add WP pagetable tracking to x86

2019-04-25 Thread Peter Xu

From: Andrea Arcangeli 

Accurate userfaultfd WP tracking is possible by tracking exactly which
virtual memory ranges were writeprotected by userland. We can't relay
only on the RW bit of the mapped pagetable because that information is
destroyed by fork() or KSM or swap. If we were to relay on that, we'd
need to stay on the safe side and generate false positive wp faults
for every swapped out page.

Signed-off-by: Andrea Arcangeli 
[peterx: append _PAGE_UFD_WP to _PAGE_CHG_MASK]
Reviewed-by: Jerome Glisse 
Reviewed-by: Mike Rapoport 
Signed-off-by: Peter Xu 
---
 arch/x86/Kconfig |  1 +
 arch/x86/include/asm/pgtable.h   | 52 
 arch/x86/include/asm/pgtable_64.h|  8 -
 arch/x86/include/asm/pgtable_types.h | 11 +-
 include/asm-generic/pgtable.h|  1 +
 include/asm-generic/pgtable_uffd.h   | 51 +++
 init/Kconfig |  5 +++
 7 files changed, 127 insertions(+), 2 deletions(-)
 create mode 100644 include/asm-generic/pgtable_uffd.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5ad92419be19..70d369fe08d7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -208,6 +208,7 @@ config X86
select USER_STACKTRACE_SUPPORT
select VIRT_TO_BUS
select X86_FEATURE_NAMESif PROC_FS
+   select HAVE_ARCH_USERFAULTFD_WP if USERFAULTFD
 
 config INSTRUCTION_DECODER
def_bool y
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 2779ace16d23..6863236e8484 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -23,6 +23,7 @@
 
 #ifndef __ASSEMBLY__
 #include 
+#include 
 
 extern pgd_t early_top_pgt[PTRS_PER_PGD];
 int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
@@ -293,6 +294,23 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t 
clear)
return native_make_pte(v & ~clear);
 }
 
+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+static inline int pte_uffd_wp(pte_t pte)
+{
+   return pte_flags(pte) & _PAGE_UFFD_WP;
+}
+
+static inline pte_t pte_mkuffd_wp(pte_t pte)
+{
+   return pte_set_flags(pte, _PAGE_UFFD_WP);
+}
+
+static inline pte_t pte_clear_uffd_wp(pte_t pte)
+{
+   return pte_clear_flags(pte, _PAGE_UFFD_WP);
+}
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
 static inline pte_t pte_mkclean(pte_t pte)
 {
return pte_clear_flags(pte, _PAGE_DIRTY);
@@ -372,6 +390,23 @@ static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t 
clear)
return native_make_pmd(v & ~clear);
 }
 
+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+static inline int pmd_uffd_wp(pmd_t pmd)
+{
+   return pmd_flags(pmd) & _PAGE_UFFD_WP;
+}
+
+static inline pmd_t pmd_mkuffd_wp(pmd_t pmd)
+{
+   return pmd_set_flags(pmd, _PAGE_UFFD_WP);
+}
+
+static inline pmd_t pmd_clear_uffd_wp(pmd_t pmd)
+{
+   return pmd_clear_flags(pmd, _PAGE_UFFD_WP);
+}
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
 static inline pmd_t pmd_mkold(pmd_t pmd)
 {
return pmd_clear_flags(pmd, _PAGE_ACCESSED);
@@ -1351,6 +1386,23 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
 #endif
 #endif
 
+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+static inline pte_t pte_swp_mkuffd_wp(pte_t pte)
+{
+   return pte_set_flags(pte, _PAGE_SWP_UFFD_WP);
+}
+
+static inline int pte_swp_uffd_wp(pte_t pte)
+{
+   return pte_flags(pte) & _PAGE_SWP_UFFD_WP;
+}
+
+static inline pte_t pte_swp_clear_uffd_wp(pte_t pte)
+{
+   return pte_clear_flags(pte, _PAGE_SWP_UFFD_WP);
+}
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
 #define PKRU_AD_BIT 0x1
 #define PKRU_WD_BIT 0x2
 #define PKRU_BITS_PER_PKEY 2
diff --git a/arch/x86/include/asm/pgtable_64.h 
b/arch/x86/include/asm/pgtable_64.h
index 0bb566315621..627666b1c3c0 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -189,7 +189,7 @@ extern void sync_global_pgds(unsigned long start, unsigned 
long end);
  *
  * | ...| 11| 10|  9|8|7|6|5| 4| 3|2| 1|0| <- bit number
  * | ...|SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names
- * | TYPE (59-63) | ~OFFSET (9-58)  |0|0|X|X| X| X|X|SD|0| <- swp entry
+ * | TYPE (59-63) | ~OFFSET (9-58)  |0|0|X|X| X| X|F|SD|0| <- swp entry
  *
  * G (8) is aliased and used as a PROT_NONE indicator for
  * !present ptes.  We need to start storing swap entries above
@@ -197,9 +197,15 @@ extern void sync_global_pgds(unsigned long start, unsigned 
long end);
  * erratum where they can be incorrectly set by hardware on
  * non-present PTEs.
  *
+ * SD Bits 1-4 are not used in non-present format and available for
+ * special use described below:
+ *
  * SD (1) in swp entry is used to store soft dirty bit, which helps us
  * remember soft dirty over page migration
  *
+ * F (2) in swp entry is used to record when a pagetable is
+ * writeprotected by userfaultfd WP support.
+ *
  * Bit 7 in swp entry should be 0 because pmd_present checks not only P,
  *

[PATCH v4 04/27] mm: allow VM_FAULT_RETRY for multiple times

2019-04-25 Thread Peter Xu

The idea comes from a discussion between Linus and Andrea [1].

Before this patch we only allow a page fault to retry once.  We
achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time.  This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle
the page fault on a single page.  However that should hardly happen,
and after all for each code path to return a VM_FAULT_RETRY we'll
first wait for a condition (during which time we should possibly yield
the cpu) to happen before VM_FAULT_RETRY is really returned.

This patch removes the restriction by keeping the
FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY.  It means
that the page fault handler now can retry the page fault for multiple
times if necessary without the need to generate another page fault
event.  Meanwhile we still keep the FAULT_FLAG_TRIED flag so page
fault handler can still identify whether a page fault is the first
attempt or not.

Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):

  - ALLOW_RETRY and !TRIED:  this means the page fault allows to
 retry, and this is the first try

  - ALLOW_RETRY and TRIED:   this means the page fault allows to
 retry, and this is not the first try

  - !ALLOW_RETRY and !TRIED: this means the page fault does not allow
 to retry at all

  - !ALLOW_RETRY and TRIED:  this is forbidden and should never be used

In existing code we have multiple places that has taken special care
of the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY).  This patch introduces a simple helper to
detect the first retry of a page fault by checking against
both (fault_flags & FAULT_FLAG_ALLOW_RETRY) and !(fault_flag &
FAULT_FLAG_TRIED) because now even the 2nd try will have the
ALLOW_RETRY set, then use that helper in all existing special paths.
One example is in __lock_page_or_retry(), now we'll drop the mmap_sem
only in the first attempt of page fault and we'll keep it in follow up
retries, so old locking behavior will be retained.

This will be a nice enhancement for current code [2] at the same time
a supporting material for the future userfaultfd-writeprotect work,
since in that work there will always be an explicit userfault
writeprotect retry for protected pages, and if that cannot resolve the
page fault (e.g., when userfaultfd-writeprotect is used in conjunction
with swapped pages) then we'll possibly need a 3rd retry of the page
fault.  It might also benefit other potential users who will have
similar requirement like userfault write-protection.

GUP code is not touched yet and will be covered in follow up patch.

Please read the thread below for more information.

[1] https://lkml.org/lkml/2017/11/2/833
[2] https://lkml.org/lkml/2018/12/30/64

Suggested-by: Linus Torvalds 
Suggested-by: Andrea Arcangeli 
Reviewed-by: Jerome Glisse 
Signed-off-by: Peter Xu 
---
 arch/alpha/mm/fault.c   |  2 +-
 arch/arc/mm/fault.c |  1 -
 arch/arm/mm/fault.c |  3 ---
 arch/arm64/mm/fault.c   |  5 
 arch/hexagon/mm/vm_fault.c  |  1 -
 arch/ia64/mm/fault.c|  1 -
 arch/m68k/mm/fault.c|  3 ---
 arch/microblaze/mm/fault.c  |  1 -
 arch/mips/mm/fault.c|  1 -
 arch/nds32/mm/fault.c   |  1 -
 arch/nios2/mm/fault.c   |  3 ---
 arch/openrisc/mm/fault.c|  1 -
 arch/parisc/mm/fault.c  |  4 +---
 arch/powerpc/mm/fault.c |  6 -
 arch/riscv/mm/fault.c   |  5 
 arch/s390/mm/fault.c|  5 +---
 arch/sh/mm/fault.c  |  1 -
 arch/sparc/mm/fault_32.c|  1 -
 arch/sparc/mm/fault_64.c|  1 -
 arch/um/kernel/trap.c   |  1 -
 arch/unicore32/mm/fault.c   |  4 +---
 arch/x86/mm/fault.c |  2 --
 arch/xtensa/mm/fault.c  |  1 -
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 +++---
 include/linux/mm.h  | 41 -
 mm/filemap.c|  2 +-
 mm/shmem.c  |  2 +-
 27 files changed, 55 insertions(+), 56 deletions(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index 8a2ef90b4bfc..6a02c0fb36b9 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -169,7 +169,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
-   flags &= ~FAULT_FLAG_ALLOW_RETRY;
+   flags |= FAULT_FLAG_TRIED;
 
 /* No need to up_read(&mm->mmap_sem) as we would
 * have already released it in __lock_page_or_retry
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 9e9e6eb1f7d0..e7d2947ba72c 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/

[PATCH v4 07/27] userfaultfd: wp: hook userfault handler to write protection fault

2019-04-25 Thread Peter Xu

From: Andrea Arcangeli 

There are several cases write protection fault happens. It could be a
write to zero page, swaped page or userfault write protected
page. When the fault happens, there is no way to know if userfault
write protect the page before. Here we just blindly issue a userfault
notification for vma with VM_UFFD_WP regardless if app write protects
it yet. Application should be ready to handle such wp fault.

v1: From: Shaohua Li 

v2: Handle the userfault in the common do_wp_page. If we get there a
pagetable is present and readonly so no need to do further processing
until we solve the userfault.

In the swapin case, always swapin as readonly. This will cause false
positive userfaults. We need to decide later if to eliminate them with
a flag like soft-dirty in the swap entry (see _PAGE_SWP_SOFT_DIRTY).

hugetlbfs wouldn't need to worry about swapouts but and tmpfs would
be handled by a swap entry bit like anonymous memory.

The main problem with no easy solution to eliminate the false
positives, will be if/when userfaultfd is extended to real filesystem
pagecache. When the pagecache is freed by reclaim we can't leave the
radix tree pinned if the inode and in turn the radix tree is reclaimed
as well.

The estimation is that full accuracy and lack of false positives could
be easily provided only to anonymous memory (as long as there's no
fork or as long as MADV_DONTFORK is used on the userfaultfd anonymous
range) tmpfs and hugetlbfs, it's most certainly worth to achieve it
but in a later incremental patch.

v3: Add hooking point for THP wrprotect faults.

CC: Shaohua Li 
Signed-off-by: Andrea Arcangeli 
[peterx: don't conditionally drop FAULT_FLAG_WRITE in do_swap_page]
Reviewed-by: Mike Rapoport 
Reviewed-by: Jerome Glisse 
Signed-off-by: Peter Xu 
---
 mm/memory.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index ab650c21bccd..8ccd4927b58d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2492,6 +2492,11 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
 
+   if (userfaultfd_wp(vma)) {
+   pte_unmap_unlock(vmf->pte, vmf->ptl);
+   return handle_userfault(vmf, VM_UFFD_WP);
+   }
+
vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
if (!vmf->page) {
/*
@@ -3707,8 +3712,11 @@ static inline vm_fault_t create_huge_pmd(struct vm_fault 
*vmf)
 /* `inline' is required to avoid gcc 4.1.2 build error */
 static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf, pmd_t orig_pmd)
 {
-   if (vma_is_anonymous(vmf->vma))
+   if (vma_is_anonymous(vmf->vma)) {
+   if (userfaultfd_wp(vmf->vma))
+   return handle_userfault(vmf, VM_UFFD_WP);
return do_huge_pmd_wp_page(vmf, orig_pmd);
+   }
if (vmf->vma->vm_ops->huge_fault)
return vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PMD);
 
-- 
2.17.1

[PATCH v4 02/27] mm: userfault: return VM_FAULT_RETRY on signals

2019-04-25 Thread Peter Xu

The idea comes from the upstream discussion between Linus and Andrea:

  https://lkml.org/lkml/2017/10/30/560

A summary to the issue: there was a special path in handle_userfault()
in the past that we'll return a VM_FAULT_NOPAGE when we detected
non-fatal signals when waiting for userfault handling.  We did that by
reacquiring the mmap_sem before returning.  However that brings a risk
in that the vmas might have changed when we retake the mmap_sem and
even we could be holding an invalid vma structure.

This patch removes the special path and we'll return a VM_FAULT_RETRY
with the common path even if we have got such signals.  Then for all
the architectures that is passing in VM_FAULT_ALLOW_RETRY into
handle_mm_fault(), we check not only for SIGKILL but for all the rest
of userspace pending signals right after we returned from
handle_mm_fault().  This can allow the userspace to handle nonfatal
signals faster than before.

This patch is a preparation work for the next patch to finally remove
the special code path mentioned above in handle_userfault().

Suggested-by: Linus Torvalds 
Suggested-by: Andrea Arcangeli 
Reviewed-by: Jerome Glisse 
Signed-off-by: Peter Xu 
---
 arch/alpha/mm/fault.c  |  2 +-
 arch/arc/mm/fault.c| 11 ---
 arch/arm/mm/fault.c|  6 +++---
 arch/arm64/mm/fault.c  |  6 +++---
 arch/hexagon/mm/vm_fault.c |  2 +-
 arch/ia64/mm/fault.c   |  2 +-
 arch/m68k/mm/fault.c   |  2 +-
 arch/microblaze/mm/fault.c |  2 +-
 arch/mips/mm/fault.c   |  2 +-
 arch/nds32/mm/fault.c  |  6 +++---
 arch/nios2/mm/fault.c  |  2 +-
 arch/openrisc/mm/fault.c   |  2 +-
 arch/parisc/mm/fault.c |  2 +-
 arch/powerpc/mm/fault.c|  2 ++
 arch/riscv/mm/fault.c  |  4 ++--
 arch/s390/mm/fault.c   |  9 ++---
 arch/sh/mm/fault.c |  4 
 arch/sparc/mm/fault_32.c   |  3 +++
 arch/sparc/mm/fault_64.c   |  3 +++
 arch/um/kernel/trap.c  |  5 -
 arch/unicore32/mm/fault.c  |  4 ++--
 arch/x86/mm/fault.c|  6 +-
 arch/xtensa/mm/fault.c |  3 +++
 23 files changed, 56 insertions(+), 34 deletions(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index 188fc9256baf..8a2ef90b4bfc 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -150,7 +150,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
   the fault.  */
fault = handle_mm_fault(vma, address, flags);
 
-   if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
+   if ((fault & VM_FAULT_RETRY) && signal_pending(current))
return;
 
if (unlikely(fault & VM_FAULT_ERROR)) {
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 8df1638259f3..9e9e6eb1f7d0 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -141,17 +141,14 @@ void do_page_fault(unsigned long address, struct pt_regs 
*regs)
 */
fault = handle_mm_fault(vma, address, flags);
 
-   if (fatal_signal_pending(current)) {
-
+   if (unlikely((fault & VM_FAULT_RETRY) && signal_pending(current))) {
+   if (fatal_signal_pending(current) && !user_mode(regs))
+   goto no_context;
/*
 * if fault retry, mmap_sem already relinquished by core mm
 * so OK to return to user mode (with signal handled first)
 */
-   if (fault & VM_FAULT_RETRY) {
-   if (!user_mode(regs))
-   goto no_context;
-   return;
-   }
+   return;
}
 
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 58f69fa07df9..c41c021bbe40 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -314,12 +314,12 @@ do_page_fault(unsigned long addr, unsigned int fsr, 
struct pt_regs *regs)
 
fault = __do_page_fault(mm, addr, fsr, flags, tsk);
 
-   /* If we need to retry but a fatal signal is pending, handle the
+   /* If we need to retry but a signal is pending, handle the
 * signal first. We do not need to release the mmap_sem because
 * it would already be released in __lock_page_or_retry in
 * mm/filemap.c. */
-   if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
-   if (!user_mode(regs))
+   if (unlikely(fault & VM_FAULT_RETRY && signal_pending(current))) {
+   if (fatal_signal_pending(current) && !user_mode(regs))
goto no_context;
return 0;
}
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 1a7e92ab69eb..46c32d639fbf 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -512,13 +512,13 @@ static int __kprobes do_page_fault(unsigned long addr, 
unsigned int esr,
 
if (fault & VM_FAULT_RETRY) {
/*
-* If we need to retry but a fatal signal is pending,
+

[PATCH v4 01/27] mm: gup: rename "nonblocking" to "locked" where proper

2019-04-25 Thread Peter Xu

There's plenty of places around __get_user_pages() that has a parameter
"nonblocking" which does not really mean that "it won't block" (because
it can really block) but instead it shows whether the mmap_sem is
released by up_read() during the page fault handling mostly when
VM_FAULT_RETRY is returned.

We have the correct naming in e.g. get_user_pages_locked() or
get_user_pages_remote() as "locked", however there're still many places
that are using the "nonblocking" as name.

Renaming the places to "locked" where proper to better suite the
functionality of the variable.  While at it, fixing up some of the
comments accordingly.

Reviewed-by: Mike Rapoport 
Reviewed-by: Jerome Glisse 
Signed-off-by: Peter Xu 
---
 mm/gup.c | 44 +---
 mm/hugetlb.c |  8 
 2 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index f84e22685aaa..a78d252d6358 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -509,12 +509,12 @@ static int get_gate_page(struct mm_struct *mm, unsigned 
long address,
 }
 
 /*
- * mmap_sem must be held on entry.  If @nonblocking != NULL and
- * *@flags does not include FOLL_NOWAIT, the mmap_sem may be released.
- * If it is, *@nonblocking will be set to 0 and -EBUSY returned.
+ * mmap_sem must be held on entry.  If @locked != NULL and *@flags
+ * does not include FOLL_NOWAIT, the mmap_sem may be released.  If it
+ * is, *@locked will be set to 0 and -EBUSY returned.
  */
 static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
-   unsigned long address, unsigned int *flags, int *nonblocking)
+   unsigned long address, unsigned int *flags, int *locked)
 {
unsigned int fault_flags = 0;
vm_fault_t ret;
@@ -526,7 +526,7 @@ static int faultin_page(struct task_struct *tsk, struct 
vm_area_struct *vma,
fault_flags |= FAULT_FLAG_WRITE;
if (*flags & FOLL_REMOTE)
fault_flags |= FAULT_FLAG_REMOTE;
-   if (nonblocking)
+   if (locked)
fault_flags |= FAULT_FLAG_ALLOW_RETRY;
if (*flags & FOLL_NOWAIT)
fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
@@ -552,8 +552,8 @@ static int faultin_page(struct task_struct *tsk, struct 
vm_area_struct *vma,
}
 
if (ret & VM_FAULT_RETRY) {
-   if (nonblocking && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT))
-   *nonblocking = 0;
+   if (locked && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT))
+   *locked = 0;
return -EBUSY;
}
 
@@ -630,7 +630,7 @@ static int check_vma_flags(struct vm_area_struct *vma, 
unsigned long gup_flags)
  * only intends to ensure the pages are faulted in.
  * @vmas:  array of pointers to vmas corresponding to each page.
  * Or NULL if the caller does not require them.
- * @nonblocking: whether waiting for disk IO or mmap_sem contention
+ * @locked: whether we're still with the mmap_sem held
  *
  * Returns number of pages pinned. This may be fewer than the number
  * requested. If nr_pages is 0 or negative, returns 0. If no pages
@@ -659,13 +659,11 @@ static int check_vma_flags(struct vm_area_struct *vma, 
unsigned long gup_flags)
  * appropriate) must be called after the page is finished with, and
  * before put_page is called.
  *
- * If @nonblocking != NULL, __get_user_pages will not wait for disk IO
- * or mmap_sem contention, and if waiting is needed to pin all pages,
- * *@nonblocking will be set to 0.  Further, if @gup_flags does not
- * include FOLL_NOWAIT, the mmap_sem will be released via up_read() in
- * this case.
+ * If @locked != NULL, *@locked will be set to 0 when mmap_sem is
+ * released by an up_read().  That can happen if @gup_flags does not
+ * have FOLL_NOWAIT.
  *
- * A caller using such a combination of @nonblocking and @gup_flags
+ * A caller using such a combination of @locked and @gup_flags
  * must therefore hold the mmap_sem for reading only, and recognize
  * when it's been released.  Otherwise, it must be held for either
  * reading or writing and will not be released.
@@ -677,7 +675,7 @@ static int check_vma_flags(struct vm_area_struct *vma, 
unsigned long gup_flags)
 static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
-   struct vm_area_struct **vmas, int *nonblocking)
+   struct vm_area_struct **vmas, int *locked)
 {
long ret = 0, i = 0;
struct vm_area_struct *vma = NULL;
@@ -721,7 +719,7 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
if (is_vm_hugetlb_page(vma)) {
i = follow_hugetlb_page(mm, vma, pages, vmas,
&start, &nr_pages, i,
-

[PATCH v4 03/27] userfaultfd: don't retake mmap_sem to emulate NOPAGE

2019-04-25 Thread Peter Xu

The idea comes from the upstream discussion between Linus and Andrea:

https://lkml.org/lkml/2017/10/30/560

A summary to the issue: there was a special path in handle_userfault()
in the past that we'll return a VM_FAULT_NOPAGE when we detected
non-fatal signals when waiting for userfault handling.  We did that by
reacquiring the mmap_sem before returning.  However that brings a risk
in that the vmas might have changed when we retake the mmap_sem and
even we could be holding an invalid vma structure.

This patch removes the risk path in handle_userfault() then we will be
sure that the callers of handle_mm_fault() will know that the VMAs
might have changed.  Meanwhile with previous patch we don't lose
responsiveness as well since the core mm code now can handle the
nonfatal userspace signals quickly even if we return VM_FAULT_RETRY.

Suggested-by: Andrea Arcangeli 
Suggested-by: Linus Torvalds 
Reviewed-by: Jerome Glisse 
Signed-off-by: Peter Xu 
---
 fs/userfaultfd.c | 24 
 1 file changed, 24 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 89800fc7dc9d..b397bc3b954d 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -514,30 +514,6 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
 
__set_current_state(TASK_RUNNING);
 
-   if (return_to_userland) {
-   if (signal_pending(current) &&
-   !fatal_signal_pending(current)) {
-   /*
-* If we got a SIGSTOP or SIGCONT and this is
-* a normal userland page fault, just let
-* userland return so the signal will be
-* handled and gdb debugging works.  The page
-* fault code immediately after we return from
-* this function is going to release the
-* mmap_sem and it's not depending on it
-* (unlike gup would if we were not to return
-* VM_FAULT_RETRY).
-*
-* If a fatal signal is pending we still take
-* the streamlined VM_FAULT_RETRY failure path
-* and there's no need to retake the mmap_sem
-* in such case.
-*/
-   down_read(&mm->mmap_sem);
-   ret = VM_FAULT_NOPAGE;
-   }
-   }
-
/*
 * Here we race with the list_del; list_add in
 * userfaultfd_ctx_read(), however because we don't ever run
-- 
2.17.1

[PATCH v4 05/27] mm: gup: allow VM_FAULT_RETRY for multiple times

2019-04-25 Thread Peter Xu

This is the gup counterpart of the change that allows the VM_FAULT_RETRY
to happen for more than once.

Reviewed-by: Jerome Glisse 
Signed-off-by: Peter Xu 
---
 mm/gup.c | 17 +
 mm/hugetlb.c |  6 --
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index a78d252d6358..46b1d1412364 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -531,7 +531,10 @@ static int faultin_page(struct task_struct *tsk, struct 
vm_area_struct *vma,
if (*flags & FOLL_NOWAIT)
fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
if (*flags & FOLL_TRIED) {
-   VM_WARN_ON_ONCE(fault_flags & FAULT_FLAG_ALLOW_RETRY);
+   /*
+* Note: FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_TRIED
+* can co-exist
+*/
fault_flags |= FAULT_FLAG_TRIED;
}
 
@@ -946,17 +949,23 @@ static __always_inline long 
__get_user_pages_locked(struct task_struct *tsk,
/* VM_FAULT_RETRY triggered, so seek to the faulting offset */
pages += ret;
start += ret << PAGE_SHIFT;
+   lock_dropped = true;
 
+retry:
/*
 * Repeat on the address that fired VM_FAULT_RETRY
-* without FAULT_FLAG_ALLOW_RETRY but with
+* with both FAULT_FLAG_ALLOW_RETRY and
 * FAULT_FLAG_TRIED.
 */
*locked = 1;
-   lock_dropped = true;
down_read(&mm->mmap_sem);
ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED,
-  pages, NULL, NULL);
+  pages, NULL, locked);
+   if (!*locked) {
+   /* Continue to retry until we succeeded */
+   BUG_ON(ret != 0);
+   goto retry;
+   }
if (ret != 1) {
BUG_ON(ret > 1);
if (!pages_done)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e77b56141f0c..d14e2cc6f7c1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4268,8 +4268,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
fault_flags |= FAULT_FLAG_ALLOW_RETRY |
FAULT_FLAG_RETRY_NOWAIT;
if (flags & FOLL_TRIED) {
-   VM_WARN_ON_ONCE(fault_flags &
-   FAULT_FLAG_ALLOW_RETRY);
+   /*
+* Note: FAULT_FLAG_ALLOW_RETRY and
+* FAULT_FLAG_TRIED can co-exist
+*/
fault_flags |= FAULT_FLAG_TRIED;
}
ret = hugetlb_fault(mm, vma, vaddr, fault_flags);
-- 
2.17.1

[PATCH v4 00/27] userfaultfd: write protection support

2019-04-25 Thread Peter Xu

This series implements initial write protection support for
userfaultfd.  Currently both shmem and hugetlbfs are not supported
yet, but only anonymous memory.  This is the 4nd version of it.

The latest code can also be found at:

  https://github.com/xzpeter/linux/tree/uffd-wp-merged

v4 changelog:
- add r-bs
- use kernel-doc format for fault_flag_allow_retry_first [Jerome]
- drop "export wp_page_copy", add new patch to split do_wp_page(), use
  it in change_pte_range() to replace the wp_page_copy(). [Jerome] (I
  thought about different ways to do this but I still can't find a
  100% good way for all... in this version I still used the
  do_wp_page_cont naming.  We can still discuss this and how we should
  split do_wp_page)
- make sure uffd-wp will also apply to device private entries which
  HMM uses [Jerome]

v3 changelog:
- take r-bs
- patch 1: fix typo [Jerome]
- patch 2: use brackets where proper around (flags & VM_FAULT_RETRY)
  (there're three places to change, not four...) [Jerome]
- patch 4: make sure TRIED is applied correctly on all archs, add more
  comment to explain the new page fault mechanism [Jerome]
- patch 7: in do_swap_page() remove the two lines to remove
  FAULT_FLAG_WRITE flag [Jerome]
- patch 10: another brackets change like above, and in
  mfill_atomic_pte return -EINVAL when detected wp_copy==1 upon shared
  memories [Jerome]
- patch 12: move _PAGE_CHG_MASK change to patch 8 [Jerome]
- patch 14: wp_page_copy() - fix write bit; change_pte_range() -
  detect PTE change after COW [Jerome]
- patch 17: remove last paragraph of commit message, no need to drop
  the two lines in do_swap_page() since they've been directly dropped
  in patch 7; touch up remove_migration_pte() to only detect uffd-wp
  bit if it's read migration entry [Jerome]
- add patch: "userfaultfd: wp: declare _UFFDIO_WRITEPROTECT
  conditionally", which remove _UFFDIO_WRITEPROTECT bit if detected
  non-anonymous memory during REGISTER; meanwhile fixup the test case
  for shmem too for expected ioctls returned from REGISTER [Mike]
- add patch: "userfaultfd: wp: fixup swap entries in
  change_pte_range", the new patch will allow to apply the uffd-wp
  bits upon swap entries directly (e.g., when the page is during
  migration or the page was swapped out).  Please see the patch for
  detail information.

v2 changelog:
- add some r-bs
- split the patch "mm: userfault: return VM_FAULT_RETRY on signals"
  into two: one to focus on the signal behavior change, the other to
  remove the NOPAGE special path in handle_userfault().  Removing the
  ARC specific change and remove that part of commit message since
  it's fixed in 4d447455e73b already [Jerome]
- return -ENOENT when VMA is invalid for UFFDIO_WRITEPROTECT to match
  UFFDIO_COPY errno [Mike]
- add a new patch to introduce helper to find valid VMA for uffd
  [Mike]
- check against VM_MAYWRITE instead of VM_WRITE when registering UFFD
  WP [Mike]
- MM_CP_DIRTY_ACCT is used incorrectly, fix it up [Jerome]
- make sure the lock_page behavior will not be changed [Jerome]
- reorder the whole series, introduce the new ioctl last. [Jerome]
- fix up the uffdio_writeprotect() following commit df2cc96e77011cf79
  to return -EAGAIN when detected mm layout changes [Mike]

v1 can be found at: https://lkml.org/lkml/2019/1/21/130

Any comment would be greatly welcomed.   Thanks.

Overview


The uffd-wp work was initialized by Shaohua Li [1], and later
continued by Andrea [2]. This series is based upon Andrea's latest
userfaultfd tree, and it is a continuous works from both Shaohua and
Andrea.  Many of the follow up ideas come from Andrea too.

Besides the old MISSING register mode of userfaultfd, the new uffd-wp
support provides another alternative register mode called
UFFDIO_REGISTER_MODE_WP that can be used to listen to not only missing
page faults but also write protection page faults, or even they can be
registered together.  At the same time, the new feature also provides
a new userfaultfd ioctl called UFFDIO_WRITEPROTECT which allows the
userspace to write protect a range or memory or fixup write permission
of faulted pages.

Please refer to the document patch "userfaultfd: wp:
UFFDIO_REGISTER_MODE_WP documentation update" for more information on
the new interface and what it can do.

The major workflow of an uffd-wp program should be:

  1. Register a memory region with WP mode using UFFDIO_REGISTER_MODE_WP

  2. Write protect part of the whole registered region using
 UFFDIO_WRITEPROTECT, passing in UFFDIO_WRITEPROTECT_MODE_WP to
 show that we want to write protect the range.

  3. Start a working thread that modifies the protected pages,
 meanwhile listening to UFFD messages.

  4. When a write is detected upon the protected range, page fault
 happens, a UFFD message will be generated and reported to the
 page fault handling thread

  5. The page fault handler thread resolves the page fault using the
 new UFFDIO_WRITEPROTECT ioctl, b

Re: [PATCH 2/2] pinctrl: tegra: Add Tegra194 pinmux driver

2019-04-25 Thread Vidya Sagar


On 4/26/2019 8:26 AM, Krishna Yarlagadda wrote:

Tegra194 has PCIE L5 rst and clkreq pins which need to be controlled
dynamically at runtime. This driver supports change pinmux for these
pins. Pinmux for rest of the pins is set statically by bootloader and
will not be changed by this driver

Signed-off-by: Krishna Yarlagadda 
Signed-off-by: Suresh Mangipudi 
---
  drivers/pinctrl/tegra/Kconfig|   4 +
  drivers/pinctrl/tegra/Makefile   |   1 +
  drivers/pinctrl/tegra/pinctrl-tegra.c|   8 +-
  drivers/pinctrl/tegra/pinctrl-tegra.h|   8 +-
  drivers/pinctrl/tegra/pinctrl-tegra194.c | 175 +++
  drivers/soc/tegra/Kconfig|   1 +
  6 files changed, 189 insertions(+), 8 deletions(-)
  create mode 100644 drivers/pinctrl/tegra/pinctrl-tegra194.c

diff --git a/drivers/pinctrl/tegra/Kconfig b/drivers/pinctrl/tegra/Kconfig
index 24e20cc..6f79f1f 100644
--- a/drivers/pinctrl/tegra/Kconfig
+++ b/drivers/pinctrl/tegra/Kconfig
@@ -23,6 +23,10 @@ config PINCTRL_TEGRA210
bool
select PINCTRL_TEGRA
  
+config PINCTRL_TEGRA194

+   bool
+   select PINCTRL_TEGRA
+
  config PINCTRL_TEGRA_XUSB
def_bool y if ARCH_TEGRA
select GENERIC_PHY
diff --git a/drivers/pinctrl/tegra/Makefile b/drivers/pinctrl/tegra/Makefile
index bbcb043..ead4e10 100644
--- a/drivers/pinctrl/tegra/Makefile
+++ b/drivers/pinctrl/tegra/Makefile
@@ -5,4 +5,5 @@ obj-$(CONFIG_PINCTRL_TEGRA30)   += pinctrl-tegra30.o
  obj-$(CONFIG_PINCTRL_TEGRA114)+= pinctrl-tegra114.o
  obj-$(CONFIG_PINCTRL_TEGRA124)+= pinctrl-tegra124.o
  obj-$(CONFIG_PINCTRL_TEGRA210)+= pinctrl-tegra210.o
+obj-$(CONFIG_PINCTRL_TEGRA194) += pinctrl-tegra194.o
  obj-$(CONFIG_PINCTRL_TEGRA_XUSB)  += pinctrl-tegra-xusb.o
diff --git a/drivers/pinctrl/tegra/pinctrl-tegra.c 
b/drivers/pinctrl/tegra/pinctrl-tegra.c
index a5008c0..76e88c4 100644
--- a/drivers/pinctrl/tegra/pinctrl-tegra.c
+++ b/drivers/pinctrl/tegra/pinctrl-tegra.c
@@ -292,7 +292,7 @@ static int tegra_pinconf_reg(struct tegra_pmx *pmx,
 const struct tegra_pingroup *g,
 enum tegra_pinconf_param param,
 bool report_err,
-s8 *bank, s16 *reg, s8 *bit, s8 *width)
+s8 *bank, s32 *reg, s8 *bit, s8 *width)
  {
switch (param) {
case TEGRA_PINCONF_PARAM_PULL:
@@ -451,7 +451,7 @@ static int tegra_pinconf_group_get(struct pinctrl_dev 
*pctldev,
const struct tegra_pingroup *g;
int ret;
s8 bank, bit, width;
-   s16 reg;
+   s32 reg;
u32 val, mask;
  
  	g = &pmx->soc->groups[group];

@@ -480,7 +480,7 @@ static int tegra_pinconf_group_set(struct pinctrl_dev 
*pctldev,
const struct tegra_pingroup *g;
int ret, i;
s8 bank, bit, width;
-   s16 reg;
+   s32 reg;
u32 val, mask;
  
  	g = &pmx->soc->groups[group];

@@ -548,7 +548,7 @@ static void tegra_pinconf_group_dbg_show(struct pinctrl_dev 
*pctldev,
const struct tegra_pingroup *g;
int i, ret;
s8 bank, bit, width;
-   s16 reg;
+   s32 reg;
u32 val;
  
  	g = &pmx->soc->groups[group];

diff --git a/drivers/pinctrl/tegra/pinctrl-tegra.h 
b/drivers/pinctrl/tegra/pinctrl-tegra.h
index 44c7194..82cd947 100644
--- a/drivers/pinctrl/tegra/pinctrl-tegra.h
+++ b/drivers/pinctrl/tegra/pinctrl-tegra.h
@@ -143,10 +143,10 @@ struct tegra_pingroup {
const unsigned *pins;
u8 npins;
u8 funcs[4];
-   s16 mux_reg;
-   s16 pupd_reg;
-   s16 tri_reg;
-   s16 drv_reg;
+   s32 mux_reg;
+   s32 pupd_reg;
+   s32 tri_reg;
+   s32 drv_reg;
u32 mux_bank:2;
u32 pupd_bank:2;
u32 tri_bank:2;
diff --git a/drivers/pinctrl/tegra/pinctrl-tegra194.c 
b/drivers/pinctrl/tegra/pinctrl-tegra194.c
new file mode 100644
index 000..9172a8c
--- /dev/null
+++ b/drivers/pinctrl/tegra/pinctrl-tegra194.c
@@ -0,0 +1,175 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Pinctrl data for the NVIDIA Tegra210 pinmux
+ *
+ * Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pinctrl-tegra.h"
+
+#define _GPIO(offset)  (offset)
+#define NUM_GPIOS  (TEGRA_PIN_PEX_L5_RST_N_PGG1 + 1)
+
+/* Define unique ID for each pins */
+enum pin_id {
+   TEGRA_PIN_PEX_L5

Re: [PATCH] tty: Don't force RISCV SBI console as preferred console

2019-04-25 Thread Atish Patra


On 4/25/19 6:35 AM, Anup Patel wrote:

The Linux kernel will auto-disables all boot consoles whenever it
gets a preferred real console.

Currently on RISC-V systems, if we have a real console which is not
RISCV SBI console then boot consoles (such as earlycon=sbi) are not
auto-disabled when a real console (ttyS0 or ttySIF0) is available.
This results in duplicate prints at boot-time after kernel starts
using real console (i.e. ttyS0 or ttySIF0) if "earlycon=" kernel
parameter was passed by bootloader.

The reason for above issue is that RISCV SBI console always adds
itself as preferred console which is causing other real consoles
to be not used as preferred console.



Do we even need HVC_SBI console to be enabled by default? Disabling 
CONFIG_HVC_RISCV_SBI seems to be fine while running in QEMU.


If we don't need it, I suggest we should remove the config option from 
defconfig in addition to this patch.


Regards,
Atish

Ideally "console=" kernel parameter passed by bootloaders should
be the one selecting a preferred real console.

This patch fixes above issue by not forcing RISCV SBI console as
preferred console.

Fixes: afa6b1ccfad5 ("tty: New RISC-V SBI console driver")
Cc: sta...@vger.kernel.org
Signed-off-by: Anup Patel 
---
  drivers/tty/hvc/hvc_riscv_sbi.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/tty/hvc/hvc_riscv_sbi.c b/drivers/tty/hvc/hvc_riscv_sbi.c
index 75155bde2b88..31f53fa77e4a 100644
--- a/drivers/tty/hvc/hvc_riscv_sbi.c
+++ b/drivers/tty/hvc/hvc_riscv_sbi.c
@@ -53,7 +53,6 @@ device_initcall(hvc_sbi_init);
  static int __init hvc_sbi_console_init(void)
  {
hvc_instantiate(0, 0, &hvc_sbi_ops);
-   add_preferred_console("hvc", 0, NULL);
  
  	return 0;

  }

Re: [PATCH] kernel/sched: run nohz idle load balancer on HK_FLAG_MISC CPUs

2019-04-25 Thread Nicholas Piggin

Peter Zijlstra's on April 25, 2019 9:56 pm:
> On Fri, Apr 12, 2019 at 02:26:13PM +1000, Nicholas Piggin wrote:
>> The nohz idle balancer runs on the lowest idle CPU. This can
>> interfere with isolated CPUs, so confine it to HK_FLAG_MISC
>> housekeeping CPUs.
>> 
>> HK_FLAG_SCHED is not used for this because it is not set anywhere
>> at the moment. This could be folded into HK_FLAG_SCHED once that
>> option is fixed.
> 
> Frederic? Anyway, I thnk I'll take this patch as is.

That would be great, thanks. We've been testing it in a staging
environment (this is where they noticed the noise in the first
place), and results have been as expected:

  I've been able to test Nick's idle-loop load balancer (ILB) patch, 
  with and without the TEO cpuidle governor. With the ILB patch (and 
  nohz_full) I get a very quiet noise profile with either cpuidle 
  governor (menu or teo). For my tests, I don't see a meaningful 
  difference between the two governors.

  [...]

  Bottom line: Nick's patch that constrains the ILB to run on non-nohz 
  cores has a noticeable noise-reduction effect. For this type of 
  workload, the choice of cpuidle governor, menu or teo, is immaterial.

This is against a slightly backported RHEL kernel they are using, but
no significant differences from upstream in these areas.

Thanks,
Nick

Re: [PATCH] staging: most: protect potential string overflow

2019-04-25 Thread Bo YU

On Wed, Apr 24, 2019 at 10:55 PM Dan Carpenter  wrote:
>
> On Mon, Apr 22, 2019 at 10:20:18PM -0400, Bo YU wrote:
> > There maybe cause potential string overflow issue due to use
> > strcpy without checking the length
> >
> > Detected By CoversityScan CID# 1444760
> >
> > Fixes: 131ac62253dba:(staging: most: core: use device description as name)
>
> It doesn't really fix anything, it just silences a static checker
> warning.
>
> > Signed-off-by: Bo YU 
> > ---
> >  drivers/staging/most/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/staging/most/core.c b/drivers/staging/most/core.c
> > index 956daf8c3bd2..0f26cebac91a 100644
> > --- a/drivers/staging/most/core.c
> > +++ b/drivers/staging/most/core.c
> > @@ -1431,7 +1431,7 @@ int most_register_interface(struct most_interface 
> > *iface)
> >
> >   INIT_LIST_HEAD(&iface->p->channel_list);
> >   iface->p->dev_id = id;
> > - strcpy(iface->p->name, iface->description);
> > + strlcpy(iface->p->name, iface->description, sizeof(iface->p->name));
>
> We prefer strscpy() more than strlcpy() these days.

 Ok,will try it.
 Thanks,

>
> regards,
> dan carpenter
>

Re: [PATCH v2 4/9] powerpc/powernv/npu: use helper pci_dev_id

2019-04-25 Thread Alexey Kardashevskiy




On 25/04/2019 05:14, Heiner Kallweit wrote:
> Use new helper pci_dev_id() to simplify the code.
> 
> Signed-off-by: Heiner Kallweit 



Reviewed-by: Alexey Kardashevskiy 


> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index dc23d9d2a..495550432 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -1213,9 +1213,8 @@ int pnv_npu2_map_lpar_dev(struct pci_dev *gpdev, 
> unsigned int lparid,
>* Currently we only support radix and non-zero LPCR only makes sense
>* for hash tables so skiboot expects the LPCR parameter to be a zero.
>*/
> - ret = opal_npu_map_lpar(nphb->opal_id,
> - PCI_DEVID(gpdev->bus->number, gpdev->devfn), lparid,
> - 0 /* LPCR bits */);
> + ret = opal_npu_map_lpar(nphb->opal_id, pci_dev_id(gpdev), lparid,
> + 0 /* LPCR bits */);
>   if (ret) {
>   dev_err(&gpdev->dev, "Error %d mapping device to LPAR\n", ret);
>   return ret;
> @@ -1224,7 +1223,7 @@ int pnv_npu2_map_lpar_dev(struct pci_dev *gpdev, 
> unsigned int lparid,
>   dev_dbg(&gpdev->dev, "init context opalid=%llu msr=%lx\n",
>   nphb->opal_id, msr);
>   ret = opal_npu_init_context(nphb->opal_id, 0/*__unused*/, msr,
> - PCI_DEVID(gpdev->bus->number, gpdev->devfn));
> + pci_dev_id(gpdev));
>   if (ret < 0)
>   dev_err(&gpdev->dev, "Failed to init context: %d\n", ret);
>   else
> @@ -1258,7 +1257,7 @@ int pnv_npu2_unmap_lpar_dev(struct pci_dev *gpdev)
>   dev_dbg(&gpdev->dev, "destroy context opalid=%llu\n",
>   nphb->opal_id);
>   ret = opal_npu_destroy_context(nphb->opal_id, 0/*__unused*/,
> - PCI_DEVID(gpdev->bus->number, gpdev->devfn));
> +pci_dev_id(gpdev));
>   if (ret < 0) {
>   dev_err(&gpdev->dev, "Failed to destroy context: %d\n", ret);
>   return ret;
> @@ -1266,9 +1265,8 @@ int pnv_npu2_unmap_lpar_dev(struct pci_dev *gpdev)
>  
>   /* Set LPID to 0 anyway, just to be safe */
>   dev_dbg(&gpdev->dev, "Map LPAR opalid=%llu lparid=0\n", nphb->opal_id);
> - ret = opal_npu_map_lpar(nphb->opal_id,
> - PCI_DEVID(gpdev->bus->number, gpdev->devfn), 0 /*LPID*/,
> - 0 /* LPCR bits */);
> + ret = opal_npu_map_lpar(nphb->opal_id, pci_dev_id(gpdev), 0 /*LPID*/,
> + 0 /* LPCR bits */);
>   if (ret)
>   dev_err(&gpdev->dev, "Error %d mapping device to LPAR\n", ret);
>  
> 

-- 
Alexey

Re: [PATCH 1/2] clk: imx7ulp: update nic1_bus_clk parent info

2019-04-25 Thread Shawn Guo

On Thu, Apr 25, 2019 at 05:03:31PM -0700, Stephen Boyd wrote:
> Quoting Anson Huang (2019-04-24 22:19:07)
> > Since i.MX7ULP B0 chip, nic1_bus_clk's parent is changed to
> > from nic0_clk directly, update it accordingly.
> > 
> > Signed-off-by: Anson Huang 
> 
> Looks ok. Shawn, will you pick it up?

Stephen,

I prefer you directly pick up any i.MX clock patches that look good,
after I already send you PR.  I will start again for next cycle
around -rc1.

Shawn

Re: [PATCH v2 03/12] arm64: dts: tegra210: set thermtrip

2019-04-25 Thread Wei Ni

Hi Thierry,
Eduardo have picked this series to his branch except dts patches.
Please check
"git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal.git"
in the linus branch. They will be merged in the next major kernel release.

Could you please take these three dts changes?
Here is the list:
[PATCH v2 03/12] arm64: dts: tegra210: set thermtrip
[PATCH v2 06/12] arm64: dts: tegra210: set gpu hw throttle level
[PATCH v2 10/12] arm64: dts: tegra210: set EDP interrupt line

Thanks.
Wei.

On 21/2/2019 6:18 PM, Wei Ni wrote:
> Set "nvidia,thermtrips" property, it used to set
> HW shutdown temperatures.
> 
> Signed-off-by: Wei Ni 
> ---
>  arch/arm64/boot/dts/nvidia/tegra210.dtsi | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi 
> b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> index 6574396d2257..582d56820bbb 100644
> --- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> +++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> @@ -1410,6 +1410,9 @@
>   reset-names = "soctherm";
>   #thermal-sensor-cells = <1>;
>  
> + nvidia,thermtrips =  +  TEGRA124_SOCTHERM_SENSOR_GPU 103000>;
> +
>   throttle-cfgs {
>   throttle_heavy: heavy {
>   nvidia,priority = <100>;
> @@ -1429,8 +1432,8 @@
>   <&soctherm TEGRA124_SOCTHERM_SENSOR_CPU>;
>  
>   trips {
> - cpu-shutdown-trip {
> - temperature = <102500>;
> + cpu-critical-trip {
> + temperature = <102000>;
>   hysteresis = <0>;
>   type = "critical";
>   };
> @@ -1457,7 +1460,7 @@
>   <&soctherm TEGRA124_SOCTHERM_SENSOR_MEM>;
>  
>   trips {
> - mem-shutdown-trip {
> + mem-critical-trip {
>   temperature = <103000>;
>   hysteresis = <0>;
>   type = "critical";
> @@ -1479,8 +1482,8 @@
>   <&soctherm TEGRA124_SOCTHERM_SENSOR_GPU>;
>  
>   trips {
> - gpu-shutdown-trip {
> - temperature = <103000>;
> + gpu-critical-trip {
> + temperature = <102500>;
>   hysteresis = <0>;
>   type = "critical";
>   };
> @@ -1507,7 +1510,7 @@
>   <&soctherm TEGRA124_SOCTHERM_SENSOR_PLLX>;
>  
>   trips {
> - pllx-shutdown-trip {
> + pllx-critical-trip {
>   temperature = <103000>;
>   hysteresis = <0>;
>   type = "critical";
>

Re: [RFC PATCH v5 4/4] x86/acrn: Add hypercall for ACRN guest

2019-04-25 Thread Zhao, Yakui





On 2019年04月25日 19:00, Borislav Petkov wrote:

On Thu, Apr 25, 2019 at 06:16:02PM +0800, Zhao, Yakui wrote:

The parameter register for the VMCALL is predefined in ACRN hypervisor. Now
the R8 is used to pass the hcall_id.
It seems that there is no special constraint for R8~R15.
So the explicit register variable is used so that the R8 can be passed.


If you're going to use the constraint "D" for param1, you can just as
well do

"=a" (result)

everywhere since you have the letter constraint for %rax instead of
declaring it with "register".

Also, you can completely get rid of those "register" declarations
and let gcc have all the freedom to pass in hcall_id and the other
parameters:

Thanks Borislav for providing the code.

It seems that it is seldom used in kernel although the explicit register 
variable is supported by GCC and makes the code look simpler. And it 
seems that the explicit register variable is not suppoorted by CLAG.



So the explicit register variable will be removed. I will follow the asm 
code from Borislav. Of course one minor change is that the "movq" is 
used instead of "mov".


Is this ok?

Thanks



unsigned long result;

 asm volatile("mov %[hcall_id], %%r8\n\t"
  "vmcall\n\t"
  : "=a" (result)
  : [hcall_id] "g" (hcall_id)
  : "r8");

 return result;

and %r8 will be in the clobber list so gcc will reload it if needed.

gcc turns it into

1040 :
 1040:   4c 8b 05 e1 2f 00 00mov0x2fe1(%rip),%r8# 4028 

 1047:   0f 01 c1vmcall
 104a:   c3  retq
 104b:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)

here.

[PATCH] ASoC: fsl_sai: Add missing return 0 in remove()

2019-04-25 Thread Nicolin Chen

Build warning being reported:
sound/soc/fsl/fsl_sai.c: In function 'fsl_sai_remove':
sound/soc/fsl/fsl_sai.c:921:1: warning: no return statement in
function returning non-void [-Wreturn-type]

So this patch just adds a "return 0" to fix it.

Fixes: 812ad463e089 ("ASoC: fsl_sai: Add support for runtime pm")
Reported-by: Stephen Rothwell 
Signed-off-by: Nicolin Chen 
---
 sound/soc/fsl/fsl_sai.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index 26c27dc..8593269 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -918,6 +918,8 @@ static int fsl_sai_probe(struct platform_device *pdev)
 static int fsl_sai_remove(struct platform_device *pdev)
 {
pm_runtime_disable(&pdev->dev);
+
+   return 0;
 }
 
 static const struct of_device_id fsl_sai_ids[] = {
-- 
2.7.4

[PATCH v2] KVM: x86: Add Intel CPUID.1F cpuid emulation support

2019-04-25 Thread Like Xu

Some new systems have multiple software-visible die within each package.
Add support to expose Intel V2 Extended Topology Enumeration Leaf CPUID.1F.

Co-developed-by: Xiaoyao Li 
Signed-off-by: Xiaoyao Li 
Signed-off-by: Like Xu 
---

==changelog==
v2:
- Apply cpuid.1f check rule on Intel SDM page 3-222 Vol.2A
- Add comment to handle 0x1f anf 0xb in common code
- Reduce check time in a descending-break style

v1: https://lkml.org/lkml/2019/4/22/28

 arch/x86/kvm/cpuid.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index fd39516..f9b529e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -425,6 +425,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
switch (function) {
case 0:
+   /* Check if the cpuid leaf 0x1f is actually implemented */
+   if (entry->eax >= 0x1f && (cpuid_ebx(0x1f) & 0x)) {
+   entry->eax = 0x1f;
+   break;
+   }
entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
break;
case 1:
@@ -544,7 +549,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
entry->edx = edx.full;
break;
}
-   /* function 0xb has additional index. */
+   /*
+* Intel documentation states that 0x1f and 0xb have
+* identical formats and thus can be handled by common code.
+* (Intel SDM Vol. 2A - Instruction Set Reference - CPUID)
+*/
+   case 0x1f:
case 0xb: {
int i, level_type;
 
-- 
1.8.3.1

[PATCH v3] sound: isa: gus: fix misuse of %x

2019-04-25 Thread Fuqian Huang

Pointers should be printed with %p or %px rather than
cast to long type and printed with %lx.
Drop the address printing.

Signed-off-by: Fuqian Huang 
---
 sound/isa/gus/gus_mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/isa/gus/gus_mem.c b/sound/isa/gus/gus_mem.c
index 4ac76f46dd76..d708ae1525e4 100644
--- a/sound/isa/gus/gus_mem.c
+++ b/sound/isa/gus/gus_mem.c
@@ -306,7 +306,7 @@ static void snd_gf1_mem_info_read(struct snd_info_entry 
*entry,
used = 0;
for (block = alloc->first, i = 0; block; block = block->next, i++) {
used += block->size;
-   snd_iprintf(buffer, "Block %i at 0x%lx onboard 0x%x size %i 
(0x%x):\n", i, (long) block, block->ptr, block->size, block->size);
+   snd_iprintf(buffer, "Block %i onboard 0x%x size %i (0x%x):\n", 
i, block->ptr, block->size, block->size);
if (block->share ||
block->share_id[0] || block->share_id[1] ||
block->share_id[2] || block->share_id[3])
-- 
2.11.0

Re: linux-next: build warning after merge of the sound-asoc tree

2019-04-25 Thread Nicolin Chen

On Fri, Apr 26, 2019 at 01:05:49PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the sound-asoc tree, today's linux-next build (arm
> multi_v7_defconfig) produced this warning:
> 
> sound/soc/fsl/fsl_sai.c: In function 'fsl_sai_remove':
> sound/soc/fsl/fsl_sai.c:921:1: warning: no return statement in function 
> returning non-void [-Wreturn-type]
>  }
>  ^
> 
> Introduced by commit
> 
>   812ad463e089 ("ASoC: fsl_sai: Add support for runtime pm")

Thanks. I am submitting a fix.

linux-next: build warning after merge of the sound-asoc tree

2019-04-25 Thread Stephen Rothwell

Hi all,

After merging the sound-asoc tree, today's linux-next build (arm
multi_v7_defconfig) produced this warning:

sound/soc/fsl/fsl_sai.c: In function 'fsl_sai_remove':
sound/soc/fsl/fsl_sai.c:921:1: warning: no return statement in function 
returning non-void [-Wreturn-type]
 }
 ^

Introduced by commit

  812ad463e089 ("ASoC: fsl_sai: Add support for runtime pm")

-- 
Cheers,
Stephen Rothwell


pgpeZVEZzWGVo.pgp
Description: OpenPGP digital signature

Zdravstvuyte! Vas interesuyut kliyentskiye bazy dannykh?

2019-04-25 Thread linux-kernel

Zdravstvuyte! Vas interesuyut kliyentskiye bazy dannykh?

Re: Re: Re: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage

2019-04-25 Thread huangpei




> -原始邮件-
> 发件人: "Peter Zijlstra" 
> 发送时间: 2019-04-25 21:31:05 (星期四)
> 收件人: huang...@loongson.cn
> 抄送: "Paul Burton" , "st...@rowland.harvard.edu" 
> , "aki...@gmail.com" , 
> "andrea.pa...@amarulasolutions.com" , 
> "boqun.f...@gmail.com" , "dlus...@nvidia.com" 
> , "dhowe...@redhat.com" , 
> "j.algl...@ucl.ac.uk" , "luc.maran...@inria.fr" 
> , "npig...@gmail.com" , 
> "paul...@linux.ibm.com" , "will.dea...@arm.com" 
> , "linux-kernel@vger.kernel.org" 
> , "torva...@linux-foundation.org" 
> , "Huacai Chen" 
> 主题: Re: Re: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() 
> wreckage
> 
> On Thu, Apr 25, 2019 at 08:51:17PM +0800, huang...@loongson.cn wrote:
> 
> > > So basically the initial value of @v is set to 1.
> > > 
> > > Then CPU-1 does atomic_add_unless(v, 1, 0)
> > >  CPU-2 does atomic_set(v, 0)
> > > 
> > > If CPU1 goes first, it will see 1, which is not 0 and thus add 1 to 1
> > > and obtains 2. Then CPU2 goes and writes 0, so the exist clause sees
> > > v==0 and doesn't observe 2.
> > > 
> > > The other way around, CPU-2 goes first, writes a 0, then CPU-1 goes and
> > > observes the 0, finds it matches 0 and doesn't add.  Again, the exist
> > > clause will find 0 doesn't match 2.
> > > 
> > > This all goes unstuck if interleaved like:
> > > 
> > > 
> > >   CPU-1   CPU-2
> > > 
> > >   xor t0, t0
> > > 1:ll  t0, v
> > >   bez t0, 2f
> > >   sw  t0, v
> > >   add t0, t1
> > >   sc  t0, v
> > >   beqz t0, 1b
> > > 
> > > (sorry if I got the MIPS asm wrong; it's not something I normally write)
> > > 
> > > And the store-word from CPU-2 doesn't make the SC from CPU-1 fail.
> > > 
> > 
> > loongson's llsc bug DOES NOT fail this litmus( we will not get V=2)；
> > 
> > only speculative memory access from CPU-1 can "blind" CPU-1(here blind 
> > means do ll/sc
> >  wrong）, this speculative memory access can be observed corrently by CPU2. 
> > In this 
> > case, sw from CPU-2 can get I , which can be observed by CPU-1, and clear 
> > llbit，then 
> > failed sc. 
> 
> I'm not following, suppose CPU-1 happens as a speculation (imagine
> whatever code is required to make that happen before). CPU-2 sw will
> cause I on CPU-1's ll but, as in the previous email, CPU-1 will continue
> as if it still has E and complete the SC.
> 
> That is; I'm just not seeing why this case would be different from two
> competing LL/SCs.
> 

I get your point. I kept my eye on the sw from CPU-2, but forgot the speculative
 mem access from CPU-1. 

There is no difference bewteen this one and the former case.

= 
   V = 1

CPU-1   CPU-2

xor  t0, t0
1:  ll t0, V   
beqz   t0, 2f

/* if speculative mem 
access kick cacheline of
V out, it can blind CPU-1 
and make CPU-1 believe it 
still hold E on V, and can
NOT see the sw from CPU-2
actually invalid V, which 
should clear LLBit of CPU-1, 
but not */
sw   t0, V // just after sw, V = 0
addiu  t0, t0, 1

sc t0, V
/* oops, sc write t0(2) 
into V with LLBit */

/* get V=2 */
beqz   t0, 1b
nop
2:


   
if speculative mem access *does not* kick out cache line of V, CPU-1 can see sw
from CPU-2, and clear LLBit, which cause sc fail and retry, That's OK
 


北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 (10) 62546668传真: +86 (10) 
62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息，仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部
 分地泄露、复制或散发）本邮件及其附件中的信息。如果您错收本邮件，请您立即电话或邮件通知发件人并删除本邮件。 

This email and its attachments contain confidential information from Loongson
Technology Corporation Limited, which is intended only for the person or entity
whose address is listed above. Any use of the information contained herein in
any way (including, but not limited to, total or partial disclosure,
reproduction or dissemination) by persons other than the intended recipient(s)
is prohibited. If you receive this email in error, please notify the sender by
phone or email immediately and delete it.

[PATCH 1/2] dt-binding: Tegra194 pinctrl support

2019-04-25 Thread Krishna Yarlagadda

Add new compatible string and other fields used in pinctrl
driver for Tegra194 in nvidia,tegra210-pinmux.txt

Signed-off-by: Krishna Yarlagadda 
---
 .../bindings/pinctrl/nvidia,tegra210-pinmux.txt| 43 +++---
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/pinctrl/nvidia,tegra210-pinmux.txt 
b/Documentation/devicetree/bindings/pinctrl/nvidia,tegra210-pinmux.txt
index 85f2114..c4e802d 100644
--- a/Documentation/devicetree/bindings/pinctrl/nvidia,tegra210-pinmux.txt
+++ b/Documentation/devicetree/bindings/pinctrl/nvidia,tegra210-pinmux.txt
@@ -1,7 +1,7 @@
-NVIDIA Tegra210 pinmux controller
+NVIDIA Tegra210/194 pinmux controller
 
 Required properties:
-- compatible: "nvidia,tegra210-pinmux"
+- compatible: "nvidia,tegra210-pinmux" or "nvidia,tegra194-pinmux"
 - reg: Should contain a list of base address and size pairs for:
   - first entry: The APB_MISC_GP_*_PADCTRL registers (pad control)
   - second entry: The PINMUX_AUX_* registers (pinmux)
@@ -83,6 +83,10 @@ Valid values for pin and group names (nvidia,pin) are:
 These correspond to Tegra PINMUX_AUX_* (pinmux) registers. Any property
 that exists in those registers may be set for the following pin names.
 
+  Tegra194:
+pex_l5_clkreq_n_pgg0, pex_l5_rst_n_pgg1
+
+  Tegra210:
 In Tegra210, many pins also have a dedicated APB_MISC_GP_*_PADCTRL
 register. Where that is true, and property that exists in that register
 may also be set on the following pin names.
@@ -127,12 +131,15 @@ Valid values for pin and group names (nvidia,pin) are:
 registers. Note that where one of these registers controls a single pin
 for which a PINMUX_AUX_* exists, see the list above for the pin name to
 use when configuring the pinmux.
-
+  Tegra210:
 pa6, pcc7, pe6, pe7, ph6, pk0, pk1, pk2, pk3, pk4, pk5, pk6, pk7, pl0, pl1,
 pz0, pz1, pz2, pz3, pz4, pz5, sdmmc1, sdmmc2, sdmmc3, sdmmc4
+  Tegra194:
+pex_l5_clkreq_n_pgg0, pex_l5_rst_n_pgg1
 
 Valid values for nvidia,functions are:
 
+  Tegra210:
 aud, bcl, blink, ccla, cec, cldvfs, clk, core, cpu, displaya, displayb,
 dmic1, dmic2, dmic3, dp, dtv, extperiph3, i2c1, i2c2, i2c3, i2cpmu, i2cvi,
 i2s1, i2s2, i2s3, i2s4a, i2s4b, i2s5a, i2s5b, iqc0, iqc1, jtag, pe, pe0,
@@ -140,9 +147,12 @@ Valid values for nvidia,functions are:
 sdmmc1, sdmmc3, shutdown, soc, sor0, sor1, spdif, spi1, spi2, spi3, spi4,
 sys, touch, uart, uarta, uartb, uartc, uartd, usb, vgp1, vgp2, vgp3, vgp4,
 vgp5, vgp6, vimclk, vimclk2
+  Tegra194:
+pe5
 
-Example:
+Examples:
 
+  Tegra210:
pinmux: pinmux@7800 {
compatible = "nvidia,tegra210-pinmux";
reg = <0x0 0x78d4 0x0 0x2a8>, /* Pad control registers */
@@ -163,4 +173,27 @@ Example:
};
};
};
-};
+
+  Tegra194:
+   tegra_pinctrl: pinmux: pinmux@243 {
+   compatible = "nvidia,tegra194-pinmux";
+   reg = <0x243 0x17000
+  0xc30 0x4000>;
+   #gpio-range-cells = <2>;
+   pex_rst_c5_out_state: pex_rst_c5_out {
+   pex_rst {
+   nvidia,pins = "pex_l5_rst_n_pgg1";
+   nvidia,schmitt = ;
+   nvidia,lpdr = ;
+   nvidia,enable-input = 
;
+   nvidia,io-high-voltage = 
;
+   nvidia,tristate = ;
+   nvidia,pull = ;
+   };
+   };
+   };
+   pinmuxtest@0 {
+   compatible = "nvidia,tegra194-pinmux-test";
+   pinctrl-names = "pex_rst";
+   pinctrl-0 = <&pex_rst_c5_out_state>;
+   };
-- 
2.7.4

[PATCH 2/2] pinctrl: tegra: Add Tegra194 pinmux driver

2019-04-25 Thread Krishna Yarlagadda

Tegra194 has PCIE L5 rst and clkreq pins which need to be controlled
dynamically at runtime. This driver supports change pinmux for these
pins. Pinmux for rest of the pins is set statically by bootloader and
will not be changed by this driver

Signed-off-by: Krishna Yarlagadda 
Signed-off-by: Suresh Mangipudi 
---
 drivers/pinctrl/tegra/Kconfig|   4 +
 drivers/pinctrl/tegra/Makefile   |   1 +
 drivers/pinctrl/tegra/pinctrl-tegra.c|   8 +-
 drivers/pinctrl/tegra/pinctrl-tegra.h|   8 +-
 drivers/pinctrl/tegra/pinctrl-tegra194.c | 175 +++
 drivers/soc/tegra/Kconfig|   1 +
 6 files changed, 189 insertions(+), 8 deletions(-)
 create mode 100644 drivers/pinctrl/tegra/pinctrl-tegra194.c

diff --git a/drivers/pinctrl/tegra/Kconfig b/drivers/pinctrl/tegra/Kconfig
index 24e20cc..6f79f1f 100644
--- a/drivers/pinctrl/tegra/Kconfig
+++ b/drivers/pinctrl/tegra/Kconfig
@@ -23,6 +23,10 @@ config PINCTRL_TEGRA210
bool
select PINCTRL_TEGRA
 
+config PINCTRL_TEGRA194
+   bool
+   select PINCTRL_TEGRA
+
 config PINCTRL_TEGRA_XUSB
def_bool y if ARCH_TEGRA
select GENERIC_PHY
diff --git a/drivers/pinctrl/tegra/Makefile b/drivers/pinctrl/tegra/Makefile
index bbcb043..ead4e10 100644
--- a/drivers/pinctrl/tegra/Makefile
+++ b/drivers/pinctrl/tegra/Makefile
@@ -5,4 +5,5 @@ obj-$(CONFIG_PINCTRL_TEGRA30)   += pinctrl-tegra30.o
 obj-$(CONFIG_PINCTRL_TEGRA114) += pinctrl-tegra114.o
 obj-$(CONFIG_PINCTRL_TEGRA124) += pinctrl-tegra124.o
 obj-$(CONFIG_PINCTRL_TEGRA210) += pinctrl-tegra210.o
+obj-$(CONFIG_PINCTRL_TEGRA194) += pinctrl-tegra194.o
 obj-$(CONFIG_PINCTRL_TEGRA_XUSB)   += pinctrl-tegra-xusb.o
diff --git a/drivers/pinctrl/tegra/pinctrl-tegra.c 
b/drivers/pinctrl/tegra/pinctrl-tegra.c
index a5008c0..76e88c4 100644
--- a/drivers/pinctrl/tegra/pinctrl-tegra.c
+++ b/drivers/pinctrl/tegra/pinctrl-tegra.c
@@ -292,7 +292,7 @@ static int tegra_pinconf_reg(struct tegra_pmx *pmx,
 const struct tegra_pingroup *g,
 enum tegra_pinconf_param param,
 bool report_err,
-s8 *bank, s16 *reg, s8 *bit, s8 *width)
+s8 *bank, s32 *reg, s8 *bit, s8 *width)
 {
switch (param) {
case TEGRA_PINCONF_PARAM_PULL:
@@ -451,7 +451,7 @@ static int tegra_pinconf_group_get(struct pinctrl_dev 
*pctldev,
const struct tegra_pingroup *g;
int ret;
s8 bank, bit, width;
-   s16 reg;
+   s32 reg;
u32 val, mask;
 
g = &pmx->soc->groups[group];
@@ -480,7 +480,7 @@ static int tegra_pinconf_group_set(struct pinctrl_dev 
*pctldev,
const struct tegra_pingroup *g;
int ret, i;
s8 bank, bit, width;
-   s16 reg;
+   s32 reg;
u32 val, mask;
 
g = &pmx->soc->groups[group];
@@ -548,7 +548,7 @@ static void tegra_pinconf_group_dbg_show(struct pinctrl_dev 
*pctldev,
const struct tegra_pingroup *g;
int i, ret;
s8 bank, bit, width;
-   s16 reg;
+   s32 reg;
u32 val;
 
g = &pmx->soc->groups[group];
diff --git a/drivers/pinctrl/tegra/pinctrl-tegra.h 
b/drivers/pinctrl/tegra/pinctrl-tegra.h
index 44c7194..82cd947 100644
--- a/drivers/pinctrl/tegra/pinctrl-tegra.h
+++ b/drivers/pinctrl/tegra/pinctrl-tegra.h
@@ -143,10 +143,10 @@ struct tegra_pingroup {
const unsigned *pins;
u8 npins;
u8 funcs[4];
-   s16 mux_reg;
-   s16 pupd_reg;
-   s16 tri_reg;
-   s16 drv_reg;
+   s32 mux_reg;
+   s32 pupd_reg;
+   s32 tri_reg;
+   s32 drv_reg;
u32 mux_bank:2;
u32 pupd_bank:2;
u32 tri_bank:2;
diff --git a/drivers/pinctrl/tegra/pinctrl-tegra194.c 
b/drivers/pinctrl/tegra/pinctrl-tegra194.c
new file mode 100644
index 000..9172a8c
--- /dev/null
+++ b/drivers/pinctrl/tegra/pinctrl-tegra194.c
@@ -0,0 +1,175 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Pinctrl data for the NVIDIA Tegra210 pinmux
+ *
+ * Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pinctrl-tegra.h"
+
+#define _GPIO(offset)  (offset)
+#define NUM_GPIOS  (TEGRA_PIN_PEX_L5_RST_N_PGG1 + 1)
+
+/* Define unique ID for each pins */
+enum pin_id {
+   TEGRA_PIN_PEX_L5_CLKREQ_N_PGG0 = _GPIO(256),
+   TEGRA_PIN_PEX_L5_RST_N_PGG1 = _GPIO(257)

Re: [PATCH] bcache: avoid clang -Wunintialized warning

2019-04-25 Thread Coly Li

On 2019/4/26 2:08 上午, Nathan Chancellor wrote:
> On Fri, Mar 22, 2019 at 03:35:00PM +0100, Arnd Bergmann wrote:
>> clang has identified a code path in which it thinks a
>> variable may be unused:
>>
>> drivers/md/bcache/alloc.c:333:4: error: variable 'bucket' is used 
>> uninitialized whenever 'if' condition is false
>>   [-Werror,-Wsometimes-uninitialized]
>> fifo_pop(&ca->free_inc, bucket);
>> ^~~
>> drivers/md/bcache/util.h:219:27: note: expanded from macro 'fifo_pop'
>>  #define fifo_pop(fifo, i)   fifo_pop_front(fifo, (i))
>> ^
>> drivers/md/bcache/util.h:189:6: note: expanded from macro 'fifo_pop_front'
>> if (_r) {   \
>> ^~
>> drivers/md/bcache/alloc.c:343:46: note: uninitialized use occurs here
>> allocator_wait(ca, bch_allocator_push(ca, bucket));
>>   ^~
>> drivers/md/bcache/alloc.c:287:7: note: expanded from macro 'allocator_wait'
>> if (cond)   \
>> ^~~~
>> drivers/md/bcache/alloc.c:333:4: note: remove the 'if' if its condition is 
>> always true
>> fifo_pop(&ca->free_inc, bucket);
>> ^
>> drivers/md/bcache/util.h:219:27: note: expanded from macro 'fifo_pop'
>>  #define fifo_pop(fifo, i)   fifo_pop_front(fifo, (i))
>> ^
>> drivers/md/bcache/util.h:189:2: note: expanded from macro 'fifo_pop_front'
>> if (_r) {   \
>> ^
>> drivers/md/bcache/alloc.c:331:15: note: initialize the variable 'bucket' to 
>> silence this warning
>> long bucket;
>>^
>>
>> This cannot happen in practice because we only enter the loop
>> if there is at least one element in the list.
>>
>> Slightly rearranging the code makes this clearer to both the
>> reader and the compiler, which avoids the warning.
>>
>> Signed-off-by: Arnd Bergmann 
>> ---
>>  drivers/md/bcache/alloc.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
>> index 5002838ea476..f8986effcb50 100644
>> --- a/drivers/md/bcache/alloc.c
>> +++ b/drivers/md/bcache/alloc.c
>> @@ -327,10 +327,11 @@ static int bch_allocator_thread(void *arg)
>>   * possibly issue discards to them, then we add the bucket to
>>   * the free list:
>>   */
>> -while (!fifo_empty(&ca->free_inc)) {
>> +while (1) {
>>  long bucket;
>>  
>> -fifo_pop(&ca->free_inc, bucket);
>> +if (!fifo_pop(&ca->free_inc, bucket))
>> +break;
>>  
>>  if (ca->discard) {
>>  mutex_unlock(&ca->set->bucket_lock);
>> -- 
>> 2.20.0
>>
> 
> Hi all,
> 
> Could someone please review/pick this up? This is one of two remaining
> -Wsometimes-uninitialized warnings among arm, arm64, and x86_64
> all{yes,mod}config and I'd like to get it turned on as soon as possible
> to catch more bugs.

Hi Nathan,

It is in Jens' block tree for-next branch already, for Linux v5.2 merge
window.

Thanks.

-- 

Coly Li

RE: [RFC PATCH 0/5] New fallback workflow for heterogeneous memory system

2019-04-25 Thread Du, Fan

>-Original Message-
>From: Dan Williams [mailto:dan.j.willi...@intel.com]
>Sent: Thursday, April 25, 2019 11:43 PM
>To: Du, Fan 
>Cc: Michal Hocko ; a...@linux-foundation.org; Wu,
>Fengguang ; Hansen, Dave
>; xishi.qiuxi...@alibaba-inc.com; Huang, Ying
>; linux...@kvack.org; linux-kernel@vger.kernel.org
>Subject: Re: [RFC PATCH 0/5] New fallback workflow for heterogeneous
>memory system
>
>On Thu, Apr 25, 2019 at 1:05 AM Du, Fan  wrote:
>>
>>
>>
>> >-Original Message-
>> >From: owner-linux...@kvack.org [mailto:owner-linux...@kvack.org] On
>> >Behalf Of Michal Hocko
>> >Sent: Thursday, April 25, 2019 3:54 PM
>> >To: Du, Fan 
>> >Cc: a...@linux-foundation.org; Wu, Fengguang
>;
>> >Williams, Dan J ; Hansen, Dave
>> >; xishi.qiuxi...@alibaba-inc.com; Huang, Ying
>> >; linux...@kvack.org;
>linux-kernel@vger.kernel.org
>> >Subject: Re: [RFC PATCH 0/5] New fallback workflow for heterogeneous
>> >memory system
>> >
>> >On Thu 25-04-19 07:41:40, Du, Fan wrote:
>> >>
>> >>
>> >> >-Original Message-
>> >> >From: Michal Hocko [mailto:mho...@kernel.org]
>> >> >Sent: Thursday, April 25, 2019 2:37 PM
>> >> >To: Du, Fan 
>> >> >Cc: a...@linux-foundation.org; Wu, Fengguang
>> >;
>> >> >Williams, Dan J ; Hansen, Dave
>> >> >; xishi.qiuxi...@alibaba-inc.com; Huang, Ying
>> >> >; linux...@kvack.org;
>> >linux-kernel@vger.kernel.org
>> >> >Subject: Re: [RFC PATCH 0/5] New fallback workflow for heterogeneous
>> >> >memory system
>> >> >
>> >> >On Thu 25-04-19 09:21:30, Fan Du wrote:
>> >> >[...]
>> >> >> However PMEM has different characteristics from DRAM,
>> >> >> the more reasonable or desirable fallback style would be:
>> >> >> DRAM node 0 -> DRAM node 1 -> PMEM node 2 -> PMEM node 3.
>> >> >> When DRAM is exhausted, try PMEM then.
>> >> >
>> >> >Why and who does care? NUMA is fundamentally about memory nodes
>> >with
>> >> >different access characteristics so why is PMEM any special?
>> >>
>> >> Michal, thanks for your comments!
>> >>
>> >> The "different" lies in the local or remote access, usually the underlying
>> >> memory is the same type, i.e. DRAM.
>> >>
>> >> By "special", PMEM is usually in gigantic capacity than DRAM per dimm,
>> >> while with different read/write access latency than DRAM.
>> >
>> >You are describing a NUMA in general here. Yes access to different NUMA
>> >nodes has a different read/write latency. But that doesn't make PMEM
>> >really special from a regular DRAM.
>>
>> Not the numa distance b/w cpu and PMEM node make PMEM different
>than
>> DRAM. The difference lies in the physical layer. The access latency
>characteristics
>> comes from media level.
>
>No, there is no such thing as a "PMEM node". I've pushed back on this
>broken concept in the past [1] [2]. Consider that PMEM could be as
>fast as DRAM for technologies like NVDIMM-N or in emulation
>environments. These attempts to look at persistence as an attribute of
>performance are entirely missing the point that the system can have
>multiple varied memory types and the platform firmware needs to
>enumerate these performance properties in the HMAT on ACPI platforms.
>Any scheme that only considers a binary DRAM and not-DRAM property is
>immediately invalidated the moment the OS needs to consider a 3rd or
>4th memory type, or a more varied connection topology.

Dan, Thanks for your comments!

I've understood your point from the very beginning time of your post before.
Below is my something in my mind as a [standalone personal contributor] only:
a. I fully recognized what HMAT is designed for.
b. I understood your point for the "type" thing is temporal, and think you are 
right about your
  point.

A generic approach is indeed required, however I what to elaborate the point of 
the problem
I'm trying to solve for customer, not how we and other people solve it one way 
or another..

Customer require to fully utilized system memory, no matter DRAM, 1st 
generation PMEM,
future xth generation PMEM which beats DRAM.
Customer require to explicitly [coarse grained] control the memory allocation 
for different
latency/bandwidth.

Maybe it's more worthwhile to think what is needed essentially to solve the 
problem,
And make sure it scale well enough for some period.

a. Build fallback list for heterogeneous system.
  I prefer to build it per HMAT, because HMAT expose the latency/bandwidth from 
local node
  Perspective, it's already standardized in ACPI Spec. NUMA node distance from 
SLIT wouldn't be
  more accurately helpful for heterogeneous memory system anymore.

b. Provide explicit page allocation option for frequently read accessed pages 
request.
  This requirement is well justified as well. All scenario both in kernel or 
user level, don't care about
  write latency should leverage this option to archive overall optimal 
performance.

c. NUMA balancing for heterogeneous system.
  I'm aware of this topic, but it's not what I in mind(a. b.) right now.

>[1]:
>https://lore.kernel.org/lkml/CAPcyv4heiUbZvP7Ewoy-Hy=-mPrdjCj

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 901 matches

Mail list logo