Re: [PATCH 19/28] locking/lockdep: Optimize irq usage check when marking lock usage bit
Thanks for review. On Fri, 26 Apr 2019 at 03:32, Peter Zijlstra wrote: > > On Wed, Apr 24, 2019 at 06:19:25PM +0800, Yuyang Du wrote: > > After only a quick read of these next patches; this is the one that > worries me most. > > You did mention Frederic's patches, but I'm not entirely sure you're > aware why he's doing them. He's preparing to split the softirq state > into one state per softirq vector. > > See here: > > https://lkml.kernel.org/r/20190228171242.32144-14-frede...@kernel.org > https://lkml.kernel.org/r/20190228171242.32144-15-frede...@kernel.org > > IOW he's going to massively explode this storage. If I understand correctly, he is not going to. First of all, we can divide the whole usage thing into tracking and checking. Frederic's fine-grained soft vector state is applied to usage tracking, i.e., which specific vectors a lock is used or enabled. But for usage checking, which vectors are does not really matter. So, the current size of the arrays and bitmaps are good enough. Right?
[PATCH v4] cpufreq: qoriq: add support for lx2160a
Enable support of NXP SoC lx2160a to handle the lx2160a SoC. Signed-off-by: Tang Yuantian Signed-off-by: Yogesh Gaur Signed-off-by: Vabhav Sharma Acked-by: Scott Wood Acked-by: Stephen Boyd Acked-by: Viresh Kumar --- Changes for v4: - Incorporated review comments from Stephen Boyd Changes for v3: - Incorporated review comments of Rafael J. Wysocki - Updated commit message Changes for v2: - Subject line updated drivers/cpufreq/qoriq-cpufreq.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/cpufreq/qoriq-cpufreq.c b/drivers/cpufreq/qoriq-cpufreq.c index 4295e54..81f0288 100644 --- a/drivers/cpufreq/qoriq-cpufreq.c +++ b/drivers/cpufreq/qoriq-cpufreq.c @@ -284,6 +284,7 @@ static const struct of_device_id node_matches[] __initconst = { { .compatible = "fsl,ls1046a-clockgen", }, { .compatible = "fsl,ls1088a-clockgen", }, { .compatible = "fsl,ls2080a-clockgen", }, + { .compatible = "fsl,lx2160a-clockgen", }, { .compatible = "fsl,p4080-clockgen", }, { .compatible = "fsl,qoriq-clockgen-1.0", }, { .compatible = "fsl,qoriq-clockgen-2.0", }, -- 2.7.4
[PATCH v4] clk: qoriq: add support for lx2160a
Add clockgen support and configuration for NXP SoC lx2160a with compatible property as "fsl,lx2160a-clockgen". Signed-off-by: Tang Yuantian Signed-off-by: Yogesh Gaur Signed-off-by: Vabhav Sharma Acked-by: Scott Wood Acked-by: Stephen Boyd Acked-by: Viresh Kumar --- Changes for v4: - Incorporated review comments from Stephen Boyd Changes for v3: - Incorporated review comments of Rafael J. Wysocki - Updated commit message Changes for v2: - Subject line updated drivers/clk/clk-qoriq.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/clk/clk-qoriq.c b/drivers/clk/clk-qoriq.c index 3d51d7c..1a15201 100644 --- a/drivers/clk/clk-qoriq.c +++ b/drivers/clk/clk-qoriq.c @@ -570,6 +570,17 @@ static const struct clockgen_chipinfo chipinfo[] = { .flags = CG_VER3 | CG_LITTLE_ENDIAN, }, { + .compat = "fsl,lx2160a-clockgen", + .cmux_groups = { + &clockgen2_cmux_cga12, &clockgen2_cmux_cgb + }, + .cmux_to_group = { + 0, 0, 0, 0, 1, 1, 1, 1, -1 + }, + .pll_mask = 0x37, + .flags = CG_VER3 | CG_LITTLE_ENDIAN, + }, + { .compat = "fsl,p2041-clockgen", .guts_compat = "fsl,qoriq-device-config-1.0", .init_periph = p2041_init_periph, @@ -1427,6 +1438,7 @@ CLK_OF_DECLARE(qoriq_clockgen_ls1043a, "fsl,ls1043a-clockgen", clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_ls1046a, "fsl,ls1046a-clockgen", clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_ls1088a, "fsl,ls1088a-clockgen", clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_ls2080a, "fsl,ls2080a-clockgen", clockgen_init); +CLK_OF_DECLARE(qoriq_clockgen_lx2160a, "fsl,lx2160a-clockgen", clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_p2041, "fsl,p2041-clockgen", clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_p3041, "fsl,p3041-clockgen", clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_p4080, "fsl,p4080-clockgen", clockgen_init); -- 2.7.4
[PATCH] clk: imx: correct pfdv2 gate_bit/vld_bit operations
The operations of pfdv2 gate_bit/valid_bit are incorrect, they are defined as u8 for bit offset, but gate_bit is actually assigned as mask which could be 32 bit long and it causes overflow, and vld_bit is assigned as bit offset based on incorrect gate_bit value, it causes incorrect pfd clock gate status in clock tree, this patch fixes the issue by assigning them as correct bit offset. Fixes: 9fcb6be3b6c9 ("clk: imx: add pfdv2 support") Signed-off-by: Anson Huang --- drivers/clk/imx/clk-pfdv2.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/clk/imx/clk-pfdv2.c b/drivers/clk/imx/clk-pfdv2.c index 7e9134b..fb567dc 100644 --- a/drivers/clk/imx/clk-pfdv2.c +++ b/drivers/clk/imx/clk-pfdv2.c @@ -43,7 +43,7 @@ static int clk_pfdv2_wait(struct clk_pfdv2 *pfd) { u32 val; - return readl_poll_timeout(pfd->reg, val, val & pfd->vld_bit, + return readl_poll_timeout(pfd->reg, val, val & (1 << pfd->vld_bit), 0, LOCK_TIMEOUT_US); } @@ -55,7 +55,7 @@ static int clk_pfdv2_enable(struct clk_hw *hw) spin_lock_irqsave(&pfd_lock, flags); val = readl_relaxed(pfd->reg); - val &= ~pfd->gate_bit; + val &= ~(1 << pfd->gate_bit); writel_relaxed(val, pfd->reg); spin_unlock_irqrestore(&pfd_lock, flags); @@ -70,7 +70,7 @@ static void clk_pfdv2_disable(struct clk_hw *hw) spin_lock_irqsave(&pfd_lock, flags); val = readl_relaxed(pfd->reg); - val |= pfd->gate_bit; + val |= (1 << pfd->gate_bit); writel_relaxed(val, pfd->reg); spin_unlock_irqrestore(&pfd_lock, flags); } @@ -123,7 +123,7 @@ static int clk_pfdv2_is_enabled(struct clk_hw *hw) { struct clk_pfdv2 *pfd = to_clk_pfdv2(hw); - if (readl_relaxed(pfd->reg) & pfd->gate_bit) + if (readl_relaxed(pfd->reg) & (1 << pfd->gate_bit)) return 0; return 1; @@ -180,7 +180,7 @@ struct clk_hw *imx_clk_pfdv2(const char *name, const char *parent_name, return ERR_PTR(-ENOMEM); pfd->reg = reg; - pfd->gate_bit = 1 << ((idx + 1) * 8 - 1); + pfd->gate_bit = (idx + 1) * 8 - 1; pfd->vld_bit = pfd->gate_bit - 1; pfd->frac_off = idx * 8; -- 2.7.4
Re: [PATCH] dt-bindings: Add silabs,si5341
On 26-04-19 01:04, Stephen Boyd wrote: > Quoting Mike Looijmans (2019-04-24 02:02:16) >> Adds the devicetree bindings for the si5341 driver that supports the >> Si5341 and Si5340 chips. >> >> Signed-off-by: Mike Looijmans >> --- >> .../bindings/clock/silabs,si5341.txt | 141 ++ >> 1 file changed, 141 insertions(+) >> create mode 100644 >> Documentation/devicetree/bindings/clock/silabs,si5341.txt >> >> diff --git a/Documentation/devicetree/bindings/clock/silabs,si5341.txt >> b/Documentation/devicetree/bindings/clock/silabs,si5341.txt >> new file mode 100644 >> index ..1a00dd83100f >> --- /dev/null >> +++ b/Documentation/devicetree/bindings/clock/silabs,si5341.txt >> @@ -0,0 +1,141 @@ >> +Binding for Silicon Labs Si5341 and Si5340 programmable i2c clock generator. >> + >> +Reference >> +[1] Si5341 Data Sheet >> + >> https://www.silabs.com/documents/public/reference-manuals/Si5341-40-D-RM.pdf > > Thanks! I also had to look up the pinout in the datasheet, not the > reference manual above. Now you mention it, this is the "reference manual", not the datasheet. I'll add a reference to that as well. >> + >> +The Si5341 and Si5340 are programmable i2c clock generators with up to 10 >> output >> +clocks. The chip contains a PLL that sources 5 (or 4) multisynth clocks, >> which >> +in turn can be directed to any of the 10 (or 4) outputs through a divider. >> +The internal structure of the clock generators can be found in [1]. >> + >> +The driver can be used in "as is" mode, reading the current settings from >> the >> +chip at boot, in case you have a (pre-)programmed device. If the PLL is not >> +configured when the driver probes, it assumes the driver must fully >> initialize >> +it. >> + >> +The device type, speed grade and revision are determined runtime by probing. >> + >> +The driver currently only supports XTAL input mode, and does not support any >> +fancy input configurations. They can still be programmed into the chip and >> +the driver will leave them "as is". >> + >> +==I2C device node== >> + >> +Required properties: >> +- compatible: shall be one of the following: "silabs,si5341", >> "silabs,si5340" >> +- reg: i2c device address, usually 0x74 >> +- #clock-cells: from common clock binding; shall be set to 1. >> +- clocks: from common clock binding; list of parent clock >> + handles, shall be xtal reference clock. Usually a fixed clock. > > Is there only one possible clk parent? Looks like there's an optional > xtal on the XA/XB pins and then up to three more input clks on IN0/1/2. > So shouldn't this list all of those and then indicate that at least one > should be specified at all times? > >> +- clock-names: Shall be "xtal". > > This should include the other clk inputs? Some day maybe. That's what I meant when I wrote "does not support any fancy input configurations". The input config is horrendously complex. We have never used anything but just the xtal input, and I think that goes for 99.9% of the use cases for this chip. I already went way over budget with this one, my first intention was to write a driver that takes a firmware blob from the "clockbuilder" software, but while writing it I discovered that the whole damn thing could easily be controlled completely without it. > >> +- #address-cells: shall be set to 1. >> +- #size-cells: shall be set to 0. > > I'd expect to see all the input voltage supplies here too. > > vdd-supply > vdda-supply > vdds-supply > vdd0-supply > vdd1-supply > vdd2-supply > vdd3-supply > vdd4-supply > vdd5-supply > vdd6-supply > vdd7-supply > vdd8-supply > vdd9-supply I'll look into it. Might be useful for some register settings. >> + >> +Optional properties: >> +- silabs,pll-m-num, silabs,pll-m-den: Numerator and denominator for PLL >> + feedback divider. Must be such that the PLL output is in the valid range. >> For >> + example, to create 14GHz from a 48MHz xtal, use m-num=14000 and m-den=48. >> Only >> + the fraction matters, using 3500 and 12 will deliver the exact same >> result. >> + If these are not specified, and the PLL is not yet programmed when the >> driver >> + probes, the PLL will be set to 14GHz. > > Can this be done via assigned-clock-rates? Possibly with a table in the > clk driver to tell us how to generate those rates. The PLL frequency choice determines who'll get jitter and who won't. It's ridiculously accurate too. For example, if you need a 26 MHz and a 100 MHz output, there's no solution for the PLL that makes both clocks an integer divider (SI is vague about it, but apparently integer dividers have less jitter on output). Only the enduser can say which clock will get the better quality. > >> +- silabs,reprogram: When present, the driver will always assume the device >> must >> + be initialized, and always performs the soft-reset routine. Since this >> will >> + temporarily stop
RE: [EXT] Re: [PATCH v3] clk: qoriq: add support for lx2160a
> -Original Message- > From: Stephen Boyd > Sent: Thursday, April 25, 2019 11:52 PM > To: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; linux- > p...@vger.kernel.org; Vabhav Sharma > Cc: mturque...@baylibre.com; r...@rjwysocki.net; viresh.ku...@linaro.org; > Yogesh Narayan Gaur ; Andy Tang > ; Vabhav Sharma > Subject: [EXT] Re: [PATCH v3] clk: qoriq: add support for lx2160a > > Caution: EXT Email > > Quoting Vabhav Sharma (2019-04-25 06:57:05) > > From: Yogesh Gaur > > > > Add clockgen support and configuration for NXP SoC lx2160a in qoriq > > clock driver with compatible property as "fsl,lx2160a-clockgen". > > > > qoriq-cpufreq driver is based on qoriq clock driver, enable support of > > NXP SoC lx2160a in qoriq cpufreq driver to handle the lx2160a SoC. > > > > Signed-off-by: Tang Yuantian > > Signed-off-by: Yogesh Gaur > > Signed-off-by: Vabhav Sharma > > Acked-by: Scott Wood > > Acked-by: Stephen Boyd > > Acked-by: Viresh Kumar > > --- > > Changes for v3: > > - Incorporated review comments of Rafael J. Wysocki > > - Updated commit message > > If you can split it into clk and cpufreq that would be preferred. Then I can > take the clk part and PM tree can take the cpufreq part. Otherwise, you have > sent other patches to drivers/clk/clk-qoriq.c and I'm worried there will be > cross tree conflicts if I take those other patches this cycle. Agree, sure. I will split the patch and sent it to clk and PM tree.
Re: [PATCH 20/28] locking/lockdep: Refactorize check_noncircular and check_redundant
Thanks for review. On Fri, 26 Apr 2019 at 03:48, Peter Zijlstra wrote: > > On Wed, Apr 24, 2019 at 06:19:26PM +0800, Yuyang Du wrote: > > These two functions now handle different check results themselves. A new > > check_path function is added to check whether there is a path in the > > dependency graph. No functional change. > > This looks good, however I completely forgot we still had the redundant > thing. > > It was added for cross-release (which has since been reverted) which > would generate a lot of redundant links (IIRC) but having it makes the > reports more convoluted -- basically, if we had an A-B-C relation, then > A-C will not be added to the graph because it is already covered. This > then means any report will include B, even though a shorter cycle might > have been possible. > > Maybe we should make the whole redundant check depend on LOCKDEP_SMALL > for now. Sure. I can do that.
[PATCH v4 4/6] usb: roles: add API to get usb_role_switch by node
Add fwnode_usb_role_switch_get() to make easier to get usb_role_switch by fwnode which register it. It's useful when there is not device_connection registered between two drivers and only knows the fwnode which register usb_role_switch. Signed-off-by: Chunfeng Yun --- v4 changes: 1. use switch_fwnode_match() to find fwnode suggested by Heikki 2. this patch now depends on [1] [1] [v6,08/13] usb: roles: Introduce stubs for the exiting functions in role.h https://patchwork.kernel.org/patch/10909971/ v3 changes: 1. use fwnodes instead of node suggested by Andy 2. rebuild the API suggested by Heikki v2 no changes --- drivers/usb/roles/class.c | 25 + include/linux/usb/role.h | 8 2 files changed, 33 insertions(+) diff --git a/drivers/usb/roles/class.c b/drivers/usb/roles/class.c index f45d8df5cfb8..994fcb979795 100644 --- a/drivers/usb/roles/class.c +++ b/drivers/usb/roles/class.c @@ -12,6 +12,7 @@ #include #include #include +#include #include static struct class *role_class; @@ -135,6 +136,30 @@ struct usb_role_switch *usb_role_switch_get(struct device *dev) } EXPORT_SYMBOL_GPL(usb_role_switch_get); +/** + * fwnode_usb_role_switch_get - Find USB role switch by it's parent fwnode + * @fwnode: The fwnode that register USB role switch + * + * Finds and returns role switch registered by @fwnode. The reference count + * for the found switch is incremented. + */ +struct usb_role_switch * +fwnode_usb_role_switch_get(struct fwnode_handle *fwnode) +{ + struct usb_role_switch *sw; + struct device *dev; + + dev = class_find_device(role_class, NULL, fwnode, switch_fwnode_match); + if (!dev) + return ERR_PTR(-EPROBE_DEFER); + + sw = to_role_switch(dev); + WARN_ON(!try_module_get(sw->dev.parent->driver->owner)); + + return sw; +} +EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get); + /** * usb_role_switch_put - Release handle to a switch * @sw: USB Role Switch diff --git a/include/linux/usb/role.h b/include/linux/usb/role.h index da2b9641b877..35d460f9ec40 100644 --- a/include/linux/usb/role.h +++ b/include/linux/usb/role.h @@ -48,6 +48,8 @@ int usb_role_switch_set_role(struct usb_role_switch *sw, enum usb_role role); enum usb_role usb_role_switch_get_role(struct usb_role_switch *sw); struct usb_role_switch *usb_role_switch_get(struct device *dev); void usb_role_switch_put(struct usb_role_switch *sw); +struct usb_role_switch * +fwnode_usb_role_switch_get(struct fwnode_handle *fwnode); struct usb_role_switch * usb_role_switch_register(struct device *parent, @@ -72,6 +74,12 @@ static inline struct usb_role_switch *usb_role_switch_get(struct device *dev) static inline void usb_role_switch_put(struct usb_role_switch *sw) { } +static inline struct usb_role_switch * +fwnode_usb_role_switch_get(struct fwnode_handle *fwnode) +{ + return ERR_PTR(-ENODEV); +} + static inline struct usb_role_switch * usb_role_switch_register(struct device *parent, const struct usb_role_switch_desc *desc) -- 2.21.0
[PATCH v4 5/6] usb: roles: add USB Type-B GPIO connector driver
Due to the requirement of usb-connector.txt binding, the old way using extcon to support USB Dual-Role switch is now deprecated when use Type-B connector. This patch introduces a driver of Type-B connector which typically uses an input GPIO to detect USB ID pin, and try to replace the function provided by extcon-usb-gpio driver Signed-off-by: Chunfeng Yun --- v4 changes: 1. remove linux/gpio.h suggested by Linus 2. put node when error happens v3 changes: 1. treat bype-B connector as a virtual device; 2. change file name again v2 changes: 1. file name is changed 2. use new compatible --- drivers/usb/roles/Kconfig | 11 + drivers/usb/roles/Makefile | 1 + drivers/usb/roles/typeb-conn-gpio.c | 305 3 files changed, 317 insertions(+) create mode 100644 drivers/usb/roles/typeb-conn-gpio.c diff --git a/drivers/usb/roles/Kconfig b/drivers/usb/roles/Kconfig index f8b31aa67526..d1156e18a81a 100644 --- a/drivers/usb/roles/Kconfig +++ b/drivers/usb/roles/Kconfig @@ -26,4 +26,15 @@ config USB_ROLES_INTEL_XHCI To compile the driver as a module, choose M here: the module will be called intel-xhci-usb-role-switch. +config TYPEB_CONN_GPIO + tristate "USB Type-B GPIO Connector" + depends on GPIOLIB + help + The driver supports USB role switch between host and device via GPIO + based USB cable detection, used typically if an input GPIO is used + to detect USB ID pin. + + To compile the driver as a module, choose M here: the module will + be called typeb-conn-gpio.ko + endif # USB_ROLE_SWITCH diff --git a/drivers/usb/roles/Makefile b/drivers/usb/roles/Makefile index 757a7d2797eb..5d5620d9d113 100644 --- a/drivers/usb/roles/Makefile +++ b/drivers/usb/roles/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_USB_ROLE_SWITCH) += roles.o roles-y:= class.o obj-$(CONFIG_USB_ROLES_INTEL_XHCI) += intel-xhci-usb-role-switch.o +obj-$(CONFIG_TYPEB_CONN_GPIO) += typeb-conn-gpio.o diff --git a/drivers/usb/roles/typeb-conn-gpio.c b/drivers/usb/roles/typeb-conn-gpio.c new file mode 100644 index ..097d2ca12a12 --- /dev/null +++ b/drivers/usb/roles/typeb-conn-gpio.c @@ -0,0 +1,305 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * USB Type-B GPIO Connector Driver + * + * Copyright (C) 2019 MediaTek Inc. + * + * Author: Chunfeng Yun + * + * Some code borrowed from drivers/extcon/extcon-usb-gpio.c + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define USB_GPIO_DEB_MS20 /* ms */ +#define USB_GPIO_DEB_US((USB_GPIO_DEB_MS) * 1000) /* us */ + +#define USB_CONN_IRQF \ + (IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING | IRQF_ONESHOT) + +struct usb_conn_info { + struct device *dev; + struct usb_role_switch *role_sw; + enum usb_role last_role; + struct regulator *vbus; + struct delayed_work dw_det; + unsigned long debounce_jiffies; + + struct gpio_desc *id_gpiod; + struct gpio_desc *vbus_gpiod; + int id_irq; + int vbus_irq; +}; + +/** + * "DEVICE" = VBUS and "HOST" = !ID, so we have: + * Both "DEVICE" and "HOST" can't be set as active at the same time + * so if "HOST" is active (i.e. ID is 0) we keep "DEVICE" inactive + * even if VBUS is on. + * + * Role | ID | VBUS + * + * [1] DEVICE| H | H + * [2] NONE | H | L + * [3] HOST | L | H + * [4] HOST | L | L + * + * In case we have only one of these signals: + * - VBUS only - we want to distinguish between [1] and [2], so ID is always 1 + * - ID only - we want to distinguish between [1] and [4], so VBUS = ID + */ +static void usb_conn_detect_cable(struct work_struct *work) +{ + struct usb_conn_info *info; + enum usb_role role; + int id, vbus, ret; + + info = container_of(to_delayed_work(work), + struct usb_conn_info, dw_det); + + /* check ID and VBUS */ + id = info->id_gpiod ? + gpiod_get_value_cansleep(info->id_gpiod) : 1; + vbus = info->vbus_gpiod ? + gpiod_get_value_cansleep(info->vbus_gpiod) : id; + + if (!id) + role = USB_ROLE_HOST; + else if (vbus) + role = USB_ROLE_DEVICE; + else + role = USB_ROLE_NONE; + + dev_dbg(info->dev, "role %d/%d, gpios: id %d, vbus %d\n", + info->last_role, role, id, vbus); + + if (info->last_role == role) { + dev_warn(info->dev, "repeated role: %d\n", role); + return; + } + + if (info->last_role == USB_ROLE_HOST) + regulator_disable(info->vbus); + + ret = usb_role_switch_set_role(info->role_sw, role); + if (ret) + dev_err(in
Re: [PATCH 22/28] locking/lockdep: Adjust new bit cases in mark_lock
Thanks for review. On Fri, 26 Apr 2019 at 03:52, Peter Zijlstra wrote: > > + if (new_bit >= LOCK_USAGE_STATES) { > > + WARN_ON(1); > > Does that want to be DEBUG_LOCKS_WARN_ON() ? Indeed, it was.
[v4 PATCH 0/6] add USB Type-B GPIO connector driver
Because the USB Connector is introduced and the requirement of usb-connector.txt binding, the old way using extcon to support USB Dual-Role switch is now deprecated, meanwhile there is no available common driver when use Type-B connector, typically using an input GPIO to detect USB ID pin. This patch series introduce a Type-B GPIO connector driver and try to replace the function provided by extcon-usb-gpio driver. v4 changes: 1. use switch_fwnode_match() to find fwnode suggested by Heikki 2. assign fwnode member of usb_role_switch struct suggested by Heikki 3. make [4/6] depend on [2] 3. remove linux/gpio.h suggested by Linus 4. put node when error happens [4/6] usb: roles: add API to get usb_role_switch by node [2] [v6,08/13] usb: roles: Introduce stubs for the exiting functions in role.h https://patchwork.kernel.org/patch/10909971/ v3 changes: 1. add GPIO direction, and use fixed-regulator for GPIO controlled VBUS regulator suggested by Rob; 2. rebuild fwnode_usb_role_switch_get() suggested by Andy and Heikki 3. treat the type-B connector as a virtual device; 4. change file name of driver again 5. select USB_ROLE_SWITCH in mtu3/Kconfig suggested by Heikki 6. rename ssusb_mode_manual_switch() to ssusb_mode_switch() v2 changes: 1. make binding clear, and add a extra compatible suggested by Hans Chunfeng Yun (6): dt-bindings: connector: add optional properties for Type-B dt-bindings: usb: add binding for Type-B GPIO connector driver dt-bindings: usb: mtu3: add properties about USB Role Switch usb: roles: add API to get usb_role_switch by node usb: roles: add USB Type-B GPIO connector driver usb: mtu3: register a USB Role Switch for dual role mode .../bindings/connector/usb-connector.txt | 14 + .../devicetree/bindings/usb/mediatek,mtu3.txt | 10 +- .../bindings/usb/typeb-conn-gpio.txt | 49 +++ drivers/usb/mtu3/Kconfig | 1 + drivers/usb/mtu3/mtu3.h | 5 + drivers/usb/mtu3/mtu3_debugfs.c | 4 +- drivers/usb/mtu3/mtu3_dr.c| 48 ++- drivers/usb/mtu3/mtu3_dr.h| 6 +- drivers/usb/mtu3/mtu3_plat.c | 3 +- drivers/usb/roles/Kconfig | 11 + drivers/usb/roles/Makefile| 1 + drivers/usb/roles/class.c | 25 ++ drivers/usb/roles/typeb-conn-gpio.c | 305 ++ include/linux/usb/role.h | 8 + 14 files changed, 481 insertions(+), 9 deletions(-) create mode 100644 Documentation/devicetree/bindings/usb/typeb-conn-gpio.txt create mode 100644 drivers/usb/roles/typeb-conn-gpio.c -- 2.21.0
Re: [PATCH 23/28] locking/lockdep: Update irqsafe lock bitmaps
Thanks for review. On Fri, 26 Apr 2019 at 03:55, Peter Zijlstra wrote: > > + if (!dir) { > > + unsigned long *bitmaps[4] = { > > + lock_classes_hardirq_safe, > > + lock_classes_hardirq_safe_read, > > + lock_classes_softirq_safe, > > + lock_classes_softirq_safe_read > > That again should be something CPP magic using lockdep_states.h. Yes. > Also, that array can be static const, right? It's just an index into the > static bitmaps. Sure. [...] > > +static inline void remove_irqsafe_lock_bitmap(struct lock_class *class) > > +{ > > +#if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_PROVE_LOCKING) > > + unsigned long usage = class->usage_mask; > > + > > + if (usage & LOCKF_USED_IN_HARDIRQ) > > + __clear_bit(class - lock_classes, lock_classes_hardirq_safe); > > + if (usage & LOCKF_USED_IN_HARDIRQ_READ) > > + __clear_bit(class - lock_classes, > > lock_classes_hardirq_safe_read); > > + if (usage & LOCKF_USED_IN_SOFTIRQ) > > + __clear_bit(class - lock_classes, lock_classes_softirq_safe); > > + if (usage & LOCKF_USED_IN_SOFTIRQ_READ) > > + __clear_bit(class - lock_classes, > > lock_classes_softirq_safe_read); > > More CPP foo required here. Definitely. > Also, do we really need to test, we could > just unconditionally clear the bits. Actually, these tests are used later for another cause: we want to know which safe usage may be changed by zapping this lock.
Re: [PATCH v2 05/11] powerpc/mm: get rid of mm_ctx_slice_mask_xxx()
Christophe Leroy writes: > Now that slice_mask_for_size() is in mmu.h, the mm_ctx_slice_mask_xxx() > are not needed anymore, so drop them. Note that the 8xx ones where > not used anyway. > Reviewed-by: Aneesh Kumar K.V > Signed-off-by: Christophe Leroy > --- > arch/powerpc/include/asm/book3s/64/mmu.h | 32 > > arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 17 --- > 2 files changed, 4 insertions(+), 45 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h > b/arch/powerpc/include/asm/book3s/64/mmu.h > index ad00355f874f..e3d7f1404e20 100644 > --- a/arch/powerpc/include/asm/book3s/64/mmu.h > +++ b/arch/powerpc/include/asm/book3s/64/mmu.h > @@ -179,45 +179,21 @@ static inline void > mm_ctx_set_slb_addr_limit(mm_context_t *ctx, unsigned long li > ctx->hash_context->slb_addr_limit = limit; > } > > -#ifdef CONFIG_PPC_64K_PAGES > -static inline struct slice_mask *mm_ctx_slice_mask_64k(mm_context_t *ctx) > -{ > - return &ctx->hash_context->mask_64k; > -} > -#endif > - > -static inline struct slice_mask *mm_ctx_slice_mask_4k(mm_context_t *ctx) > -{ > - return &ctx->hash_context->mask_4k; > -} > - > -#ifdef CONFIG_HUGETLB_PAGE > -static inline struct slice_mask *mm_ctx_slice_mask_16m(mm_context_t *ctx) > -{ > - return &ctx->hash_context->mask_16m; > -} > - > -static inline struct slice_mask *mm_ctx_slice_mask_16g(mm_context_t *ctx) > -{ > - return &ctx->hash_context->mask_16g; > -} > -#endif > - > static inline struct slice_mask *slice_mask_for_size(mm_context_t *ctx, int > psize) > { > #ifdef CONFIG_PPC_64K_PAGES > if (psize == MMU_PAGE_64K) > - return mm_ctx_slice_mask_64k(&ctx); > + return &ctx->hash_context->mask_64k; > #endif > #ifdef CONFIG_HUGETLB_PAGE > if (psize == MMU_PAGE_16M) > - return mm_ctx_slice_mask_16m(&ctx); > + return &ctx->hash_context->mask_16m; > if (psize == MMU_PAGE_16G) > - return mm_ctx_slice_mask_16g(&ctx); > + return &ctx->hash_context->mask_16g; > #endif > VM_BUG_ON(psize != MMU_PAGE_4K); > > - return mm_ctx_slice_mask_4k(&ctx); > + return &ctx->hash_context->mask_4k; > } > > #ifdef CONFIG_PPC_SUBPAGE_PROT > diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h > b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h > index a0f6844a1498..beded4df1f50 100644 > --- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h > +++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h > @@ -255,23 +255,6 @@ static inline void > mm_ctx_set_slb_addr_limit(mm_context_t *ctx, unsigned long li > ctx->slb_addr_limit = limit; > } > > -static inline struct slice_mask *mm_ctx_slice_mask_base(mm_context_t *ctx) > -{ > - return &ctx->mask_base_psize; > -} > - > -#ifdef CONFIG_HUGETLB_PAGE > -static inline struct slice_mask *mm_ctx_slice_mask_512k(mm_context_t *ctx) > -{ > - return &ctx->mask_512k; > -} > - > -static inline struct slice_mask *mm_ctx_slice_mask_8m(mm_context_t *ctx) > -{ > - return &ctx->mask_8m; > -} > -#endif > - > static inline struct slice_mask *slice_mask_for_size(mm_context_t *ctx, int > psize) > { > #ifdef CONFIG_HUGETLB_PAGE > -- > 2.13.3
Re: [PATCH 2/2] HID: input: add mapping for KEY_KBD_LAYOUT_NEXT
On Thu, Apr 25, 2019 at 6:38 PM Dmitry Torokhov wrote: > > HUTRR56 defined a new usage code on consumer page to cycle through > set of keyboard layouts, let's add this mapping. > > Signed-off-by: Dmitry Torokhov > --- Acked-by: Benjamin Tissoires I don't think this will collide with the HID tree, so IMO, you can take this through yours if you want. Cheers, Benjamin > drivers/hid/hid-input.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/hid/hid-input.c b/drivers/hid/hid-input.c > index b607286a0bc8..0579b8d3f912 100644 > --- a/drivers/hid/hid-input.c > +++ b/drivers/hid/hid-input.c > @@ -1051,6 +1051,8 @@ static void hidinput_configure_usage(struct hid_input > *hidinput, struct hid_fiel > case 0x28b: map_key_clear(KEY_FORWARDMAIL); break; > case 0x28c: map_key_clear(KEY_SEND);break; > > + case 0x29d: map_key_clear(KEY_KBD_LAYOUT_NEXT); break; > + > case 0x2c7: map_key_clear(KEY_KBDINPUTASSIST_PREV); > break; > case 0x2c8: map_key_clear(KEY_KBDINPUTASSIST_NEXT); > break; > case 0x2c9: map_key_clear(KEY_KBDINPUTASSIST_PREVGROUP); > break; > -- > 2.21.0.593.g511ec345e18-goog >
Re: [PATCH 26/28] locking/lockdep: Remove __bfs
Thanks for review. On Fri, 26 Apr 2019 at 04:07, Peter Zijlstra wrote: > > On Wed, Apr 24, 2019 at 06:19:32PM +0800, Yuyang Du wrote: > > Since there is no need for backward dependecy searching, remove this > > extra function layer. > > OK, so $subject confused the heck out of me, I thought you were going to > remove the whole bfs machinery. May I suggest retaining > __bfs_backwards() in the previous patch (which I'm _waay_ to tired for > to look at now) and calling this patch: "Remove __bfs_backwards()". Sure thing.
[RFC][PATCH] panic: make panic start/end messages consistent
We don't have consistency: - we always print panic header pr_emerg("Kernel panic - not syncing:") - but we don't always print panic footer pr_emerg("---[ end Kernel panic - not syncing:") For instance, no panic footer (end panic) message will be printed when panic_timeout is set - the kernel will either reboot immediately after console_flush_on_panic() (emergency restart) or after panic_timeout seconds. Additionally, panic_print_sys_info() goes before panic footer line, which doesn't look very right, panic_print_sys_info() is just additional debugging into. Let's make it consistent: pr_emerg("Kernel panic - not syncing:") dump_stack(); console_flush_on_panic(); pr_emerg("---[ end Kernel panic - not syncing:") panic_print_sys_info(); /* the rest */ /* panic_timeout handling */ Signed-off-by: Sergey Senozhatsky --- kernel/panic.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/panic.c b/kernel/panic.c index 40882dad9f70..6482e4b54f0b 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -282,6 +282,7 @@ void panic(const char *fmt, ...) */ debug_locks_off(); console_flush_on_panic(CONSOLE_FLUSH_PENDING); + pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); panic_print_sys_info(); @@ -331,8 +332,6 @@ void panic(const char *fmt, ...) disabled_wait(caller); } #endif - pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); - /* Do not scroll important messages printed above */ suppress_printk = 1; local_irq_enable(); -- 2.21.0
Re: [PATCH 1/3] mfd: apple-ibridge: Add Apple iBridge MFD driver.
On Fri, Apr 26, 2019 at 7:56 AM Life is hard, and then you die wrote: > > > Hi Benjamin, > > On Thu, Apr 25, 2019 at 11:39:12AM +0200, Benjamin Tissoires wrote: > > On Thu, Apr 25, 2019 at 10:19 AM Life is hard, and then you die > > wrote: > > > > > > Hi Benjamin, > > > > > > Thank you for looking at this. > > > > > > On Wed, Apr 24, 2019 at 04:18:23PM +0200, Benjamin Tissoires wrote: > > > > On Mon, Apr 22, 2019 at 5:13 AM Ronald Tschalär > > > > wrote: > > > > > > > > > > The iBridge device provides access to several devices, including: > > > > > - the Touch Bar > > > > > - the iSight webcam > > > > > - the light sensor > > > > > - the fingerprint sensor > > > > > > > > > > This driver provides the core support for managing the iBridge device > > > > > and the access to the underlying devices. In particular, since the > > > > > functionality for the touch bar and light sensor is exposed via USB > > > > > HID > > > > > interfaces, and the same HID device is used for multiple functions, > > > > > this > > > > > driver provides a multiplexing layer that allows multiple HID drivers > > > > > to > > > > > be registered for a given HID device. This allows the touch bar and > > > > > ALS > > > > > driver to be separated out into their own modules. > > > > > > > > Sorry for coming late to the party, but IMO this series is far too > > > > complex for what you need. > > > > > > > > As I read this and the first comment of drivers/mfd/apple-ibridge.c, > > > > you need to have a HID driver that multiplex 2 other sub drivers > > > > through one USB communication. > > > > For that, you are using MFD, platform driver and you own sauce instead > > > > of creating a bus. > > > > > > Basically correct. To be a bit more precise, there are currently two > > > hid-devices and two drivers (touchbar and als) involved, with > > > connections as follows (pardon the ugly ascii art): > > > > > > hdev1 --- tb-drv > > >/ > > > / > > > / > > > hdev2 --- als-drv > > > > > > i.e. the touchbar driver talks to both hdev's, and hdev2's events > > > (reports) are processed by both drivers (though each handles different > > > reports). > > > > > > > So, how about we reuse entirely the HID subsystem which already > > > > provides the capability you need (assuming I am correct above). > > > > hid-logitech-dj already does the same kind of stuff and you could: > > > > - create drivers/hid/hid-ibridge.c that handles USB_ID_PRODUCT_IBRIDGE > > > > - hid-ibridge will then register itself to the hid subsystem with a > > > > call to hid_hw_start(hdev, HID_CONNECT_HIDRAW) and > > > > hid_device_io_start(hdev) to enable the events (so you don't create > > > > useless input nodes for it) > > > > - then you add your 2 new devices by calling hid_allocate_device() and > > > > then hid_add_device(). You can even create a new HID group > > > > APPLE_IBRIDGE and allocate 2 new PIDs for them to distinguish them > > > > from the actual USB device. > > > > - then you have 2 brand new HID devices you can create their driver as > > > > a regular ones. > > > > > > > > hid-ibridge.c would just need to behave like any other hid transport > > > > driver (see logi_dj_ll_driver in drivers/hid/hid-logitech-dj.c) and > > > > you can get rid of at least the MFD and the platform part of your > > > > drivers. > > > > > > > > Does it makes sense or am I missing something obvious in the middle? > > > > > > Yes, I think I understand, and I think this can work. Basically, > > > instead of demux'ing at the hid-driver level as I am doing now (i.e. > > > the iBridge hid-driver forwarding calls to the sub-hid-drivers), we > > > demux at the hid-device level (events forwarded from iBridge hdev to > > > all "virtual" sub-hdev's, and requests from sub-hdev's forwarded to > > > the original hdev via an iBridge ll_driver attached to the > > > sub-hdev's). > > > > > > So I would need to create 3 new "virtual" hid-devices (instances) as > > > follows: > > > > > > hdev1 --- vhdev1 --- tb-drv > > > / > > > -- vhdev2 -- > > > / > > > hdev2 --- vhdev3 --- als-drv > > > > > > (vhdev1 is probably not strictly necessary, but makes things more > > > consistent). > > > > Oh, ok. > > > > How about the following: > > > > hdev1 and hdev2 are merged together in hid-apple-ibridge.c, and then > > this driver creates 2 virtual hid drivers that are consistent > > > > like > > > > hdev1---ibridge-drv---vhdev1---tb-drv > > hdev2--/ \--vhdev2---als-drv > > I don't think this will work. The problem is when the sub-drivers need > to send a report or usb-command: how to they specify which hdev the > report/command is destined for? While we could store the original hdev > in each report (the hid_report's device field), that only works for > hid_hw_request(), but not for things like hid_hw_raw_request() or > hid_hw_output_report(). Now, currently I don't use the latter two; but > I do need to send
Re: linux-next: build warning after merge of the char-misc tree
On Fri, Apr 26, 2019 at 03:56:53PM +1000, Stephen Rothwell wrote: > Hi all, > > After merging the char-misc tree, today's linux-next build (x86_64 > allmodconfig) produced this warning: > > drivers/misc/aspeed-p2a-ctrl.c: In function 'aspeed_p2a_mmap': > drivers/misc/aspeed-p2a-ctrl.c:110:2: warning: ISO C90 forbids mixed > declarations and code [-Wdeclaration-after-statement] > pgprot_t prot = vma->vm_page_prot; > ^~~~ > > Introduced by commit > > 01c60dcea9f7 ("drivers/misc: Add Aspeed P2A control driver") Patrick, I thought you fixed all of these already? Can you send a patch again? Can you also make the driver so it can build with CONFIG_COMPILE_TEST enabled so that others can find your problems earlier in the review process? thanks, greg k-h
Re: [PATCH] tty: Don't force RISCV SBI console as preferred console
On Fri, Apr 26, 2019 at 10:11 AM Atish Patra wrote: > > On 4/25/19 6:35 AM, Anup Patel wrote: > > The Linux kernel will auto-disables all boot consoles whenever it > > gets a preferred real console. > > > > Currently on RISC-V systems, if we have a real console which is not > > RISCV SBI console then boot consoles (such as earlycon=sbi) are not > > auto-disabled when a real console (ttyS0 or ttySIF0) is available. > > This results in duplicate prints at boot-time after kernel starts > > using real console (i.e. ttyS0 or ttySIF0) if "earlycon=" kernel > > parameter was passed by bootloader. > > > > The reason for above issue is that RISCV SBI console always adds > > itself as preferred console which is causing other real consoles > > to be not used as preferred console. > > > > Do we even need HVC_SBI console to be enabled by default? Disabling > CONFIG_HVC_RISCV_SBI seems to be fine while running in QEMU. Actually, HVC_SBI console is useful on boards (such as SiFive Unleashed) lacking upstream serial driver. It allows us to boot upstream kernel to prompt on such boards with just timer driver (and probably irqchip driver). Also, we should be able to use same kernel image on QEMU and SiFive Unleashed board so disabling CONFIG_HVC_RISCV_SBI for QEMU is a temporary solution. > > If we don't need it, I suggest we should remove the config option from > defconfig in addition to this patch. Like mentioned above, HVC_SBI is useful for newer SOCs and boards where serial driver is not yet up-streamed. Regards, Anup > > Regards, > Atish > > Ideally "console=" kernel parameter passed by bootloaders should > > be the one selecting a preferred real console. > > > > This patch fixes above issue by not forcing RISCV SBI console as > > preferred console. > > > > Fixes: afa6b1ccfad5 ("tty: New RISC-V SBI console driver") > > Cc: sta...@vger.kernel.org > > Signed-off-by: Anup Patel > > --- > > drivers/tty/hvc/hvc_riscv_sbi.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/drivers/tty/hvc/hvc_riscv_sbi.c > > b/drivers/tty/hvc/hvc_riscv_sbi.c > > index 75155bde2b88..31f53fa77e4a 100644 > > --- a/drivers/tty/hvc/hvc_riscv_sbi.c > > +++ b/drivers/tty/hvc/hvc_riscv_sbi.c > > @@ -53,7 +53,6 @@ device_initcall(hvc_sbi_init); > > static int __init hvc_sbi_console_init(void) > > { > > hvc_instantiate(0, 0, &hvc_sbi_ops); > > - add_preferred_console("hvc", 0, NULL); > > > > return 0; > > } > > >
Re: [PATCH] tty: Don't force RISCV SBI console as preferred console
On Thu, Apr 25, 2019 at 09:41:21PM -0700, Atish Patra wrote: > Do we even need HVC_SBI console to be enabled by default? Disabling > CONFIG_HVC_RISCV_SBI seems to be fine while running in QEMU. > > If we don't need it, I suggest we should remove the config option from > defconfig in addition to this patch. I think the whole concept of the SBI console is a little dangerous. It means that for one piece of physical hardware (usually the uart) we have two entiries (the M-mode firmware and the OS) in control, which tends to rarely end well.
Re: [LKP] [btrfs] 302167c50b: fio.write_bw_MBps -12.4% regression
Hi, Josef, kernel test robot writes: > Greeting, > > FYI, we noticed a -12.4% regression of fio.write_bw_MBps due to commit: > > > commit: 302167c50b32e7fccc98994a91d40ddbbab04e52 ("btrfs: don't end the > transaction for delayed refs in throttle") > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git pending-fixes > > in testcase: fio-basic > on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with > 64G memory > with following parameters: > > runtime: 300s > nr_task: 8t > disk: 1SSD > fs: btrfs > rw: randwrite > bs: 4k > ioengine: sync > test_size: 400g > cpufreq_governor: performance > ucode: 0xb2e > > test-description: Fio is a tool that will spawn a number of threads or > processes doing a particular type of I/O action as specified by the user. > test-url: https://github.com/axboe/fio > > Do you have time to take a look at this regression? Best Regards, Huang, Ying
linux-next: manual merge of the staging tree with the v4l-dvb tree
Hi all, Today's linux-next merge of the staging tree got conflicts in: drivers/staging/media/zoran/Kconfig drivers/staging/media/zoran/videocodec.c drivers/staging/media/zoran/videocodec.h drivers/staging/media/zoran/zoran.h drivers/staging/media/zoran/zoran_card.c drivers/staging/media/zoran/zoran_card.h drivers/staging/media/zoran/zoran_device.c drivers/staging/media/zoran/zoran_device.h drivers/staging/media/zoran/zoran_driver.c drivers/staging/media/zoran/zoran_procfs.c drivers/staging/media/zoran/zoran_procfs.h drivers/staging/media/zoran/zr36016.c drivers/staging/media/zoran/zr36016.h drivers/staging/media/zoran/zr36050.c drivers/staging/media/zoran/zr36050.h drivers/staging/media/zoran/zr36057.h drivers/staging/media/zoran/zr36060.c drivers/staging/media/zoran/zr36060.h between commit: 8dce4b265a53 ("media: zoran: remove deprecated driver") from the v4l-dvb tree and various commits from the staging tree. I fixed it up (I just removed the files) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell pgpkyZw0Q9VzO.pgp Description: OpenPGP digital signature
Re: [PATCH 2/2] mmc: sdhci_am654: Fix SLOTTYPE write
On 25/04/19 6:57 PM, Faiz Abbas wrote: > In the call to regmap_update_bits() for SLOTTYPE, the mask and value > fields are exchanged. Fix this. Could you also comment on whether this has any known effect on the driver. > > Signed-off-by: Faiz Abbas > --- > drivers/mmc/host/sdhci_am654.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/mmc/host/sdhci_am654.c b/drivers/mmc/host/sdhci_am654.c > index 866a9082705f..613b151a73c5 100644 > --- a/drivers/mmc/host/sdhci_am654.c > +++ b/drivers/mmc/host/sdhci_am654.c > @@ -205,8 +205,8 @@ static int sdhci_am654_init(struct sdhci_host *host) > if (host->mmc->caps & MMC_CAP_NONREMOVABLE) > ctl_cfg_2 = SLOTTYPE_EMBEDDED; > > - regmap_update_bits(sdhci_am654->base, CTL_CFG_2, ctl_cfg_2, > -SLOTTYPE_MASK); > + regmap_update_bits(sdhci_am654->base, CTL_CFG_2, SLOTTYPE_MASK, > +ctl_cfg_2); > > return sdhci_add_host(host); > } >
[PATCH 1/3] tty: simserial: drop unused iflag macro
Drop the RELEVANT_IFLAG() macro which hasn't been used for over a decade. Cc: Tony Luck Cc: Fenghua Yu Signed-off-by: Johan Hovold --- arch/ia64/hp/sim/simserial.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/ia64/hp/sim/simserial.c b/arch/ia64/hp/sim/simserial.c index 7aeb48a18576..1a338e541334 100644 --- a/arch/ia64/hp/sim/simserial.c +++ b/arch/ia64/hp/sim/simserial.c @@ -324,8 +324,6 @@ static int rs_ioctl(struct tty_struct *tty, unsigned int cmd, unsigned long arg) return -ENOIOCTLCMD; } -#define RELEVANT_IFLAG(iflag) (iflag & (IGNBRK|BRKINT|IGNPAR|PARMRK|INPCK)) - /* * This routine will shutdown a serial port; interrupts are disabled, and * DTR is dropped if the hangup on close termio flag is on. -- 2.21.0
[PATCH 0/3] tty: drop unused iflag macro
I noticed that the RELEVANT_IFLAG() macro was unused in USB serial and turns out there were a few more instances that could be dropped. I have some pending changes that may conflict with the corresponding change to USB serial so I'll take that one separately through my tree, but perhaps the rest could go through Greg's tty tree. Johan Johan Hovold (3): tty: simserial: drop unused iflag macro tty: ipoctal: drop unused iflag macro tty: cpm_uart: drop unused iflag macro arch/ia64/hp/sim/simserial.c| 2 -- drivers/ipack/devices/ipoctal.h | 1 - drivers/tty/serial/cpm_uart/cpm_uart_core.c | 2 -- 3 files changed, 5 deletions(-) -- 2.21.0
[PATCH v2 13/17] powerpc/mm: cleanup HPAGE_SHIFT setup
Only book3s/64 may select default among several HPAGE_SHIFT at runtime. 8xx always defines 512K pages as default FSL_BOOK3E always defines 4M pages as default This patch limits HUGETLB_PAGE_SIZE_VARIABLE to book3s/64 moves the definitions in subarches files. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/hugetlb.h | 2 ++ arch/powerpc/include/asm/page.h | 11 --- arch/powerpc/mm/hugetlbpage-hash64.c | 16 arch/powerpc/mm/hugetlbpage.c| 23 +++ 5 files changed, 30 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 5d8e692d6470..7815eb0cc2a5 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -390,7 +390,7 @@ source "kernel/Kconfig.hz" config HUGETLB_PAGE_SIZE_VARIABLE bool - depends on HUGETLB_PAGE + depends on HUGETLB_PAGE && PPC_BOOK3S_64 default y config MATH_EMULATION diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h index 84598c6b0959..20a101046cff 100644 --- a/arch/powerpc/include/asm/hugetlb.h +++ b/arch/powerpc/include/asm/hugetlb.h @@ -15,6 +15,8 @@ extern bool hugetlb_disabled; +void hugetlbpage_init_default(void); + void flush_dcache_icache_hugepage(struct page *page); int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr, diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h index 6b508420d92b..dbc8c0679480 100644 --- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -28,10 +28,15 @@ #define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) #ifndef __ASSEMBLY__ -#ifdef CONFIG_HUGETLB_PAGE -extern unsigned int HPAGE_SHIFT; -#else +#ifndef CONFIG_HUGETLB_PAGE #define HPAGE_SHIFT PAGE_SHIFT +#elif defined(CONFIG_PPC_BOOK3S_64) +extern unsigned int hpage_shift; +#define HPAGE_SHIFT hpage_shift +#elif defined(CONFIG_PPC_8xx) +#define HPAGE_SHIFT19 /* 512k pages */ +#elif defined(CONFIG_PPC_FSL_BOOK3E) +#define HPAGE_SHIFT22 /* 4M pages */ #endif #define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) #define HPAGE_MASK (~(HPAGE_SIZE - 1)) diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c index b0d9209d9a86..7a58204c3688 100644 --- a/arch/powerpc/mm/hugetlbpage-hash64.c +++ b/arch/powerpc/mm/hugetlbpage-hash64.c @@ -15,6 +15,9 @@ #include #include +unsigned int hpage_shift; +EXPORT_SYMBOL(hpage_shift); + extern long hpte_insert_repeating(unsigned long hash, unsigned long vpn, unsigned long pa, unsigned long rlags, unsigned long vflags, int psize, int ssize); @@ -145,3 +148,16 @@ void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr old_pte, pte); set_huge_pte_at(vma->vm_mm, addr, ptep, pte); } + +void hugetlbpage_init_default(void) +{ + /* Set default large page size. Currently, we pick 16M or 1M +* depending on what is available +*/ + if (mmu_psize_defs[MMU_PAGE_16M].shift) + hpage_shift = mmu_psize_defs[MMU_PAGE_16M].shift; + else if (mmu_psize_defs[MMU_PAGE_1M].shift) + hpage_shift = mmu_psize_defs[MMU_PAGE_1M].shift; + else if (mmu_psize_defs[MMU_PAGE_2M].shift) + hpage_shift = mmu_psize_defs[MMU_PAGE_2M].shift; +} diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 828860a7492e..265bd6d04233 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -28,9 +28,6 @@ bool hugetlb_disabled = false; -unsigned int HPAGE_SHIFT; -EXPORT_SYMBOL(HPAGE_SHIFT); - #define hugepd_none(hpd) (hpd_val(hpd) == 0) #define PTE_T_ORDER(__builtin_ffs(sizeof(pte_t)) - __builtin_ffs(sizeof(void *))) @@ -647,23 +644,9 @@ static int __init hugetlbpage_init(void) #endif } -#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx) - /* Default hpage size = 4M on FSL_BOOK3E and 512k on 8xx */ - if (mmu_psize_defs[MMU_PAGE_4M].shift) - HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_4M].shift; - else if (mmu_psize_defs[MMU_PAGE_512K].shift) - HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_512K].shift; -#else - /* Set default large page size. Currently, we pick 16M or 1M -* depending on what is available -*/ - if (mmu_psize_defs[MMU_PAGE_16M].shift) - HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_16M].shift; - else if (mmu_psize_defs[MMU_PAGE_1M].shift) - HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_1M].shift; - else if (mmu_psize_defs[MMU_PAGE_2M].shift) - HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_2M].shift; -#endif + if (IS_ENABLED(HUGETLB_PAGE_SIZE_VARIABLE)) + huget
[PATCH v2 14/17] powerpc/mm: cleanup remaining ifdef mess in hugetlbpage.c
Only 3 subarches support huge pages. So when it is either 2 of them, it is not the third one. And mmu_has_feature() is known by all subarches so IS_ENABLED() can be used instead of #ifdef Signed-off-by: Christophe Leroy --- arch/powerpc/mm/hugetlbpage.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 265bd6d04233..1d5c6ec04351 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -226,7 +226,7 @@ int __init alloc_bootmem_huge_page(struct hstate *h) return __alloc_bootmem_huge_page(h); } -#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx) +#ifndef CONFIG_PPC_BOOK3S_64 #define HUGEPD_FREELIST_SIZE \ ((PAGE_SIZE - sizeof(struct hugepd_freelist)) / sizeof(pte_t)) @@ -597,10 +597,10 @@ static int __init hugetlbpage_init(void) return 0; } -#if !defined(CONFIG_PPC_FSL_BOOK3E) && !defined(CONFIG_PPC_8xx) - if (!radix_enabled() && !mmu_has_feature(MMU_FTR_16M_PAGE)) + if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !radix_enabled() && + !mmu_has_feature(MMU_FTR_16M_PAGE)) return -ENODEV; -#endif + for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) { unsigned shift; unsigned pdshift; @@ -638,10 +638,8 @@ static int __init hugetlbpage_init(void) pgtable_cache_add(PTE_INDEX_SIZE); else if (pdshift > shift) pgtable_cache_add(pdshift - shift); -#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx) - else + else if (IS_ENABLED(CONFIG_PPC_FSL_BOOK3E) || IS_ENABLED(CONFIG_PPC_8xx)) pgtable_cache_add(PTE_T_ORDER); -#endif } if (IS_ENABLED(HUGETLB_PAGE_SIZE_VARIABLE)) -- 2.13.3
[PATCH 2/3] tty: ipoctal: drop unused iflag macro
Drop the RELEVANT_IFLAG() macro which has never been used. Cc: Samuel Iglesias Gonsalvez Cc: Jens Taprogge Signed-off-by: Johan Hovold --- drivers/ipack/devices/ipoctal.h | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/ipack/devices/ipoctal.h b/drivers/ipack/devices/ipoctal.h index 7fede0eb6a0c..78e4fc81fb03 100644 --- a/drivers/ipack/devices/ipoctal.h +++ b/drivers/ipack/devices/ipoctal.h @@ -18,7 +18,6 @@ #define NR_CHANNELS8 #define IPOCTAL_MAX_BOARDS 16 #define MAX_DEVICES(NR_CHANNELS * IPOCTAL_MAX_BOARDS) -#define RELEVANT_IFLAG(iflag) ((iflag) & (IGNBRK|BRKINT|IGNPAR|PARMRK|INPCK)) /** * struct ipoctal_stats -- Stats since last reset -- 2.21.0
[PATCH v2 09/17] powerpc/mm: split asm/hugetlb.h into dedicated subarch files
Three subarches support hugepages: - fsl book3e - book3s/64 - 8xx This patch splits asm/hugetlb.h to reduce the #ifdef mess. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/64/hugetlb.h | 40 +++ arch/powerpc/include/asm/hugetlb.h | 87 ++-- arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 31 + arch/powerpc/include/asm/nohash/hugetlb-book3e.h | 31 + 4 files changed, 106 insertions(+), 83 deletions(-) create mode 100644 arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h create mode 100644 arch/powerpc/include/asm/nohash/hugetlb-book3e.h diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h b/arch/powerpc/include/asm/book3s/64/hugetlb.h index ec2a55a553c7..7c99f018f7b5 100644 --- a/arch/powerpc/include/asm/book3s/64/hugetlb.h +++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h @@ -62,4 +62,44 @@ extern pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, extern void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t old_pte, pte_t new_pte); +/* + * This should work for other subarchs too. But right now we use the + * new format only for 64bit book3s + */ +static inline pte_t *hugepd_page(hugepd_t hpd) +{ + VM_BUG_ON(!hugepd_ok(hpd)); + /* +* We have only four bits to encode, MMU page size +*/ + BUILD_BUG_ON((MMU_PAGE_COUNT - 1) > 0xf); + return __va(hpd_val(hpd) & HUGEPD_ADDR_MASK); +} + +static inline unsigned int hugepd_mmu_psize(hugepd_t hpd) +{ + return (hpd_val(hpd) & HUGEPD_SHIFT_MASK) >> 2; +} + +static inline unsigned int hugepd_shift(hugepd_t hpd) +{ + return mmu_psize_to_shift(hugepd_mmu_psize(hpd)); +} +static inline void flush_hugetlb_page(struct vm_area_struct *vma, + unsigned long vmaddr) +{ + if (radix_enabled()) + return radix__flush_hugetlb_page(vma, vmaddr); +} + +static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr, + unsigned int pdshift) +{ + unsigned long idx = (addr & ((1UL << pdshift) - 1)) >> hugepd_shift(hpd); + + return hugepd_page(hpd) + idx; +} + +void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr); + #endif diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h index 7f1867e428c0..fd5c0873a57d 100644 --- a/arch/powerpc/include/asm/hugetlb.h +++ b/arch/powerpc/include/asm/hugetlb.h @@ -6,83 +6,13 @@ #include #ifdef CONFIG_PPC_BOOK3S_64 - #include -/* - * This should work for other subarchs too. But right now we use the - * new format only for 64bit book3s - */ -static inline pte_t *hugepd_page(hugepd_t hpd) -{ - VM_BUG_ON(!hugepd_ok(hpd)); - /* -* We have only four bits to encode, MMU page size -*/ - BUILD_BUG_ON((MMU_PAGE_COUNT - 1) > 0xf); - return __va(hpd_val(hpd) & HUGEPD_ADDR_MASK); -} - -static inline unsigned int hugepd_mmu_psize(hugepd_t hpd) -{ - return (hpd_val(hpd) & HUGEPD_SHIFT_MASK) >> 2; -} - -static inline unsigned int hugepd_shift(hugepd_t hpd) -{ - return mmu_psize_to_shift(hugepd_mmu_psize(hpd)); -} -static inline void flush_hugetlb_page(struct vm_area_struct *vma, - unsigned long vmaddr) -{ - if (radix_enabled()) - return radix__flush_hugetlb_page(vma, vmaddr); -} - -#else - -static inline pte_t *hugepd_page(hugepd_t hpd) -{ - VM_BUG_ON(!hugepd_ok(hpd)); -#ifdef CONFIG_PPC_8xx - return (pte_t *)__va(hpd_val(hpd) & ~HUGEPD_SHIFT_MASK); -#else - return (pte_t *)((hpd_val(hpd) & - ~HUGEPD_SHIFT_MASK) | PD_HUGE); -#endif -} - -static inline unsigned int hugepd_shift(hugepd_t hpd) -{ -#ifdef CONFIG_PPC_8xx - return ((hpd_val(hpd) & _PMD_PAGE_MASK) >> 1) + 17; -#else - return hpd_val(hpd) & HUGEPD_SHIFT_MASK; -#endif -} - +#elif defined(CONFIG_PPC_FSL_BOOK3E) +#include +#elif defined(CONFIG_PPC_8xx) +#include #endif /* CONFIG_PPC_BOOK3S_64 */ - -static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr, - unsigned pdshift) -{ - /* -* On FSL BookE, we have multiple higher-level table entries that -* point to the same hugepte. Just use the first one since they're all -* identical. So for that case, idx=0. -*/ - unsigned long idx = 0; - - pte_t *dir = hugepd_page(hpd); -#ifdef CONFIG_PPC_8xx - idx = (addr & ((1UL << pdshift) - 1)) >> PAGE_SHIFT; -#elif !defined(CONFIG_PPC_FSL_BOOK3E) - idx = (addr & ((1UL << pdshift) - 1)) >> hugepd_shift(hpd); -#endif - - return dir + idx; -} - void flush_dcache_icache_hugepage(struct page *page); int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr, @@ -99,15 +29,6 @@ static
[PATCH v2 15/17] powerpc/mm: flatten function __find_linux_pte() step 1
__find_linux_pte() is full of if/else which is hard to follow allthough the handling is pretty simple. This patch flattens the function by getting rid of as much if/else as possible. In order to ease the review, this is done in three steps. Signed-off-by: Christophe Leroy --- arch/powerpc/mm/pgtable.c | 32 ++-- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index 9f4ccd15849f..d332abeedf0a 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -339,12 +339,16 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea, */ if (pgd_none(pgd)) return NULL; - else if (pgd_huge(pgd)) { - ret_pte = (pte_t *) pgdp; + + if (pgd_huge(pgd)) { + ret_pte = (pte_t *)pgdp; goto out; - } else if (is_hugepd(__hugepd(pgd_val(pgd + } + if (is_hugepd(__hugepd(pgd_val(pgd { hpdp = (hugepd_t *)&pgd; - else { + goto out_huge; + } + { /* * Even if we end up with an unmap, the pgtable will not * be freed, because we do an rcu free and here we are @@ -356,12 +360,16 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea, if (pud_none(pud)) return NULL; - else if (pud_huge(pud)) { + + if (pud_huge(pud)) { ret_pte = (pte_t *) pudp; goto out; - } else if (is_hugepd(__hugepd(pud_val(pud + } + if (is_hugepd(__hugepd(pud_val(pud { hpdp = (hugepd_t *)&pud; - else { + goto out_huge; + } + { pdshift = PMD_SHIFT; pmdp = pmd_offset(&pud, ea); pmd = READ_ONCE(*pmdp); @@ -386,12 +394,16 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea, if (pmd_huge(pmd) || pmd_large(pmd)) { ret_pte = (pte_t *) pmdp; goto out; - } else if (is_hugepd(__hugepd(pmd_val(pmd + } + if (is_hugepd(__hugepd(pmd_val(pmd { hpdp = (hugepd_t *)&pmd; - else - return pte_offset_kernel(&pmd, ea); + goto out_huge; + } + + return pte_offset_kernel(&pmd, ea); } } +out_huge: if (!hpdp) return NULL; -- 2.13.3
[PATCH 3/3] tty: cpm_uart: drop unused iflag macro
Drop the RELEVANT_IFLAG() macro which hasn't been used at least since the dawn of git. Signed-off-by: Johan Hovold --- drivers/tty/serial/cpm_uart/cpm_uart_core.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c b/drivers/tty/serial/cpm_uart/cpm_uart_core.c index b929c7ae3a27..505262b1c6c2 100644 --- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c +++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c @@ -567,8 +567,6 @@ static void cpm_uart_set_termios(struct uart_port *port, /* * Set up parity check flag */ -#define RELEVANT_IFLAG(iflag) (iflag & (IGNBRK|BRKINT|IGNPAR|PARMRK|INPCK)) - port->read_status_mask = (BD_SC_EMPTY | BD_SC_OV); if (termios->c_iflag & INPCK) port->read_status_mask |= BD_SC_FR | BD_SC_PR; -- 2.21.0
[PATCH v2 17/17] powerpc/mm: flatten function __find_linux_pte() step 3
__find_linux_pte() is full of if/else which is hard to follow allthough the handling is pretty simple. Previous patches left a { } block. This patch removes it. Signed-off-by: Christophe Leroy --- arch/powerpc/mm/pgtable.c | 98 +++ 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index c1c6d0b79baa..db4a6253df92 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -348,59 +348,59 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea, hpdp = (hugepd_t *)&pgd; goto out_huge; } - { - /* -* Even if we end up with an unmap, the pgtable will not -* be freed, because we do an rcu free and here we are -* irq disabled -*/ - pdshift = PUD_SHIFT; - pudp = pud_offset(&pgd, ea); - pud = READ_ONCE(*pudp); - if (pud_none(pud)) - return NULL; + /* +* Even if we end up with an unmap, the pgtable will not +* be freed, because we do an rcu free and here we are +* irq disabled +*/ + pdshift = PUD_SHIFT; + pudp = pud_offset(&pgd, ea); + pud = READ_ONCE(*pudp); - if (pud_huge(pud)) { - ret_pte = (pte_t *) pudp; - goto out; - } - if (is_hugepd(__hugepd(pud_val(pud { - hpdp = (hugepd_t *)&pud; - goto out_huge; - } - pdshift = PMD_SHIFT; - pmdp = pmd_offset(&pud, ea); - pmd = READ_ONCE(*pmdp); - /* -* A hugepage collapse is captured by pmd_none, because -* it mark the pmd none and do a hpte invalidate. -*/ - if (pmd_none(pmd)) - return NULL; - - if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) { - if (is_thp) - *is_thp = true; - ret_pte = (pte_t *)pmdp; - goto out; - } - /* -* pmd_large check below will handle the swap pmd pte -* we need to do both the check because they are config -* dependent. -*/ - if (pmd_huge(pmd) || pmd_large(pmd)) { - ret_pte = (pte_t *)pmdp; - goto out; - } - if (is_hugepd(__hugepd(pmd_val(pmd { - hpdp = (hugepd_t *)&pmd; - goto out_huge; - } + if (pud_none(pud)) + return NULL; - return pte_offset_kernel(&pmd, ea); + if (pud_huge(pud)) { + ret_pte = (pte_t *)pudp; + goto out; } + if (is_hugepd(__hugepd(pud_val(pud { + hpdp = (hugepd_t *)&pud; + goto out_huge; + } + pdshift = PMD_SHIFT; + pmdp = pmd_offset(&pud, ea); + pmd = READ_ONCE(*pmdp); + /* +* A hugepage collapse is captured by pmd_none, because +* it mark the pmd none and do a hpte invalidate. +*/ + if (pmd_none(pmd)) + return NULL; + + if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) { + if (is_thp) + *is_thp = true; + ret_pte = (pte_t *)pmdp; + goto out; + } + /* +* pmd_large check below will handle the swap pmd pte +* we need to do both the check because they are config +* dependent. +*/ + if (pmd_huge(pmd) || pmd_large(pmd)) { + ret_pte = (pte_t *)pmdp; + goto out; + } + if (is_hugepd(__hugepd(pmd_val(pmd { + hpdp = (hugepd_t *)&pmd; + goto out_huge; + } + + return pte_offset_kernel(&pmd, ea); + out_huge: if (!hpdp) return NULL; -- 2.13.3
Re: [RFC PATCH v5 3/4] x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector
* Zhao, Yakui wrote: > > > > Does the hypervisor model the APIC EOI command, i.e. does it require the > > > > APIC to be acked? I.e. would not acking the APIC create an IRQ storm? > > > > > > The hypervisor requires that the APIC EOI should be acked. If the EOI APIC > > > is not acked, the APIC ISR bit for the HYPERVISOR_CALLBACK_VECTOR will not > > > be cleared and then it will block the interrupt whose vector is lower than > > > HYPERVISOR_CALLBACK_VECTOR. > > > > Ok! > > > > I will add some comments for calling entering_ack_irq in > acrn_hv_callback_handler. Is this ok to you? Yeah, thanks! Ingo
linux-next: build warning after merge of the char-misc tree
Hi all, After merging the char-misc tree, today's linux-next build (x86_64 allmodconfig) produced this warning: drivers/misc/aspeed-p2a-ctrl.c: In function 'aspeed_p2a_mmap': drivers/misc/aspeed-p2a-ctrl.c:110:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] pgprot_t prot = vma->vm_page_prot; ^~~~ Introduced by commit 01c60dcea9f7 ("drivers/misc: Add Aspeed P2A control driver") -- Cheers, Stephen Rothwell pgpNvM_dCf8Uq.pgp Description: OpenPGP digital signature
Re: [PATCH 1/3] mfd: apple-ibridge: Add Apple iBridge MFD driver.
Hi Benjamin, On Thu, Apr 25, 2019 at 11:39:12AM +0200, Benjamin Tissoires wrote: > On Thu, Apr 25, 2019 at 10:19 AM Life is hard, and then you die > wrote: > > > > Hi Benjamin, > > > > Thank you for looking at this. > > > > On Wed, Apr 24, 2019 at 04:18:23PM +0200, Benjamin Tissoires wrote: > > > On Mon, Apr 22, 2019 at 5:13 AM Ronald Tschalär > > > wrote: > > > > > > > > The iBridge device provides access to several devices, including: > > > > - the Touch Bar > > > > - the iSight webcam > > > > - the light sensor > > > > - the fingerprint sensor > > > > > > > > This driver provides the core support for managing the iBridge device > > > > and the access to the underlying devices. In particular, since the > > > > functionality for the touch bar and light sensor is exposed via USB HID > > > > interfaces, and the same HID device is used for multiple functions, this > > > > driver provides a multiplexing layer that allows multiple HID drivers to > > > > be registered for a given HID device. This allows the touch bar and ALS > > > > driver to be separated out into their own modules. > > > > > > Sorry for coming late to the party, but IMO this series is far too > > > complex for what you need. > > > > > > As I read this and the first comment of drivers/mfd/apple-ibridge.c, > > > you need to have a HID driver that multiplex 2 other sub drivers > > > through one USB communication. > > > For that, you are using MFD, platform driver and you own sauce instead > > > of creating a bus. > > > > Basically correct. To be a bit more precise, there are currently two > > hid-devices and two drivers (touchbar and als) involved, with > > connections as follows (pardon the ugly ascii art): > > > > hdev1 --- tb-drv > >/ > > / > > / > > hdev2 --- als-drv > > > > i.e. the touchbar driver talks to both hdev's, and hdev2's events > > (reports) are processed by both drivers (though each handles different > > reports). > > > > > So, how about we reuse entirely the HID subsystem which already > > > provides the capability you need (assuming I am correct above). > > > hid-logitech-dj already does the same kind of stuff and you could: > > > - create drivers/hid/hid-ibridge.c that handles USB_ID_PRODUCT_IBRIDGE > > > - hid-ibridge will then register itself to the hid subsystem with a > > > call to hid_hw_start(hdev, HID_CONNECT_HIDRAW) and > > > hid_device_io_start(hdev) to enable the events (so you don't create > > > useless input nodes for it) > > > - then you add your 2 new devices by calling hid_allocate_device() and > > > then hid_add_device(). You can even create a new HID group > > > APPLE_IBRIDGE and allocate 2 new PIDs for them to distinguish them > > > from the actual USB device. > > > - then you have 2 brand new HID devices you can create their driver as > > > a regular ones. > > > > > > hid-ibridge.c would just need to behave like any other hid transport > > > driver (see logi_dj_ll_driver in drivers/hid/hid-logitech-dj.c) and > > > you can get rid of at least the MFD and the platform part of your > > > drivers. > > > > > > Does it makes sense or am I missing something obvious in the middle? > > > > Yes, I think I understand, and I think this can work. Basically, > > instead of demux'ing at the hid-driver level as I am doing now (i.e. > > the iBridge hid-driver forwarding calls to the sub-hid-drivers), we > > demux at the hid-device level (events forwarded from iBridge hdev to > > all "virtual" sub-hdev's, and requests from sub-hdev's forwarded to > > the original hdev via an iBridge ll_driver attached to the > > sub-hdev's). > > > > So I would need to create 3 new "virtual" hid-devices (instances) as > > follows: > > > > hdev1 --- vhdev1 --- tb-drv > > / > > -- vhdev2 -- > > / > > hdev2 --- vhdev3 --- als-drv > > > > (vhdev1 is probably not strictly necessary, but makes things more > > consistent). > > Oh, ok. > > How about the following: > > hdev1 and hdev2 are merged together in hid-apple-ibridge.c, and then > this driver creates 2 virtual hid drivers that are consistent > > like > > hdev1---ibridge-drv---vhdev1---tb-drv > hdev2--/ \--vhdev2---als-drv I don't think this will work. The problem is when the sub-drivers need to send a report or usb-command: how to they specify which hdev the report/command is destined for? While we could store the original hdev in each report (the hid_report's device field), that only works for hid_hw_request(), but not for things like hid_hw_raw_request() or hid_hw_output_report(). Now, currently I don't use the latter two; but I do need to send raw usb control messages in the touchbar driver (some commands are not proper hid reports), so it definitely breaks down there. Or am I missing something? Cheers, Ronald
[PATCH v2] arm64: dts: ls1028a: Add USB dt nodes
This patch adds USB dt nodes for LS1028A. Signed-off-by: Ran Wang --- Changes in v2: - Rename node from usb3@... to usb@... to meet DTSpec arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi | 20 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi index 8dd3501..188cfb8 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi @@ -144,6 +144,26 @@ clocks = <&sysclk>; }; + usb0:usb@310 { + compatible= "snps,dwc3"; + reg= <0x0 0x310 0x0 0x1>; + interrupts= <0 80 0x4>; + dr_mode= "host"; + snps,dis_rxdet_inp3_quirk; + snps,quirk-frame-length-adjustment = <0x20>; + snps,incr-burst-type-adjustment = <1>, <4>, <8>, <16>; + }; + + usb1:usb@311 { + compatible= "snps,dwc3"; + reg= <0x0 0x311 0x0 0x1>; + interrupts= <0 81 0x4>; + dr_mode= "host"; + snps,dis_rxdet_inp3_quirk; + snps,quirk-frame-length-adjustment = <0x20>; + snps,incr-burst-type-adjustment = <1>, <4>, <8>, <16>; + }; + i2c0: i2c@200 { compatible = "fsl,vf610-i2c"; #address-cells = <1>; -- 1.7.1
Re: [PATCH 1/2] mmc: sdhci_am654: Fix minor phy configurations
On 25/04/19 6:57 PM, Faiz Abbas wrote: > Fix the following minor things: > > 1. Line wrapping with the regmap_*() functions is way more conservative > than required by the 80 character rule. Expand the function calls out to > use less number of lines. > > 2. Add an error message if the DLL fails to lock. Please make the white space changes a separate patch. Also I would prefer not to use "fix" in the subject unless the patch fixes driver behaviour. > > Signed-off-by: Faiz Abbas > --- > drivers/mmc/host/sdhci_am654.c | 37 -- > 1 file changed, 17 insertions(+), 20 deletions(-) > > diff --git a/drivers/mmc/host/sdhci_am654.c b/drivers/mmc/host/sdhci_am654.c > index eea183e90f1b..866a9082705f 100644 > --- a/drivers/mmc/host/sdhci_am654.c > +++ b/drivers/mmc/host/sdhci_am654.c > @@ -88,8 +88,7 @@ static void sdhci_am654_set_clock(struct sdhci_host *host, > unsigned int clock) > int ret; > > if (sdhci_am654->dll_on) { > - regmap_update_bits(sdhci_am654->base, PHY_CTRL1, > -ENDLL_MASK, 0); > + regmap_update_bits(sdhci_am654->base, PHY_CTRL1, ENDLL_MASK, 0); > > sdhci_am654->dll_on = false; > } > @@ -101,8 +100,7 @@ static void sdhci_am654_set_clock(struct sdhci_host > *host, unsigned int clock) > mask = OTAPDLYENA_MASK | OTAPDLYSEL_MASK; > val = (1 << OTAPDLYENA_SHIFT) | > (sdhci_am654->otap_del_sel << OTAPDLYSEL_SHIFT); > - regmap_update_bits(sdhci_am654->base, PHY_CTRL4, > -mask, val); > + regmap_update_bits(sdhci_am654->base, PHY_CTRL4, mask, val); > switch (clock) { > case 2: > sel50 = 0; > @@ -120,8 +118,7 @@ static void sdhci_am654_set_clock(struct sdhci_host > *host, unsigned int clock) > /* Configure PHY DLL frequency */ > mask = SEL50_MASK | SEL100_MASK; > val = (sel50 << SEL50_SHIFT) | (sel100 << SEL100_SHIFT); > - regmap_update_bits(sdhci_am654->base, PHY_CTRL5, > -mask, val); > + regmap_update_bits(sdhci_am654->base, PHY_CTRL5, mask, val); > /* Configure DLL TRIM */ > mask = DLL_TRIM_ICP_MASK; > val = sdhci_am654->trm_icp << DLL_TRIM_ICP_SHIFT; > @@ -129,19 +126,21 @@ static void sdhci_am654_set_clock(struct sdhci_host > *host, unsigned int clock) > /* Configure DLL driver strength */ > mask |= DR_TY_MASK; > val |= sdhci_am654->drv_strength << DR_TY_SHIFT; > - regmap_update_bits(sdhci_am654->base, PHY_CTRL1, > -mask, val); > + regmap_update_bits(sdhci_am654->base, PHY_CTRL1, mask, val); > /* Enable DLL */ > - regmap_update_bits(sdhci_am654->base, PHY_CTRL1, > -ENDLL_MASK, 0x1 << ENDLL_SHIFT); > + regmap_update_bits(sdhci_am654->base, PHY_CTRL1, ENDLL_MASK, > +0x1 << ENDLL_SHIFT); > /* >* Poll for DLL ready. Use a one second timeout. >* Works in all experiments done so far >*/ > - ret = regmap_read_poll_timeout(sdhci_am654->base, > - PHY_STAT1, val, > - val & DLLRDY_MASK, > - 1000, 100); > + ret = regmap_read_poll_timeout(sdhci_am654->base, PHY_STAT1, > +val, val & DLLRDY_MASK, 1000, > +100); > + if (ret) { > + dev_err(mmc_dev(host->mmc), "DLL failed to relock\n"); > + return; > + } > > sdhci_am654->dll_on = true; > } > @@ -186,8 +185,7 @@ static int sdhci_am654_init(struct sdhci_host *host) > > /* Reset OTAP to default value */ > mask = OTAPDLYENA_MASK | OTAPDLYSEL_MASK; > - regmap_update_bits(sdhci_am654->base, PHY_CTRL4, > -mask, 0x0); > + regmap_update_bits(sdhci_am654->base, PHY_CTRL4, mask, 0x0); > > regmap_read(sdhci_am654->base, PHY_STAT1, &val); > if (~val & CALDONE_MASK) { > @@ -201,15 +199,14 @@ static int sdhci_am654_init(struct sdhci_host *host) > } > > /* Enable pins by setting IO mux to 0 */ > - regmap_update_bits(sdhci_am654->base, PHY_CTRL1, > -IOMUX_ENABLE_MASK, 0); > + regmap_update_bits(sdhci_am654->base, PHY_CTRL1, IOMUX_ENABLE_MASK, 0); > > /* Set slot type based on SD or eMMC */ > if (host->mmc->caps & MMC_CAP_NONREMOVABLE) > ctl_cfg_2 = SLOTTYPE_EMBEDDED; > > - regmap_update_bits(sdhci_am654->base, CTL_CFG_2, > -
Re: [PATCH 1/2] RISC-V: Add DT documentation for SiFive L2 Cache Controller
On Thu, Apr 25, 2019 at 3:43 PM Sudeep Holla wrote: > > On Thu, Apr 25, 2019 at 11:24:55AM +0530, Yash Shah wrote: > > Add device tree bindings for SiFive FU540 L2 cache controller driver > > > > Signed-off-by: Yash Shah > > --- > > .../devicetree/bindings/riscv/sifive-l2-cache.txt | 53 > > ++ > > 1 file changed, 53 insertions(+) > > create mode 100644 > > Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt > > > > diff --git a/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt > > b/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt > > new file mode 100644 > > index 000..15132e2 > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt > > @@ -0,0 +1,53 @@ > > +SiFive L2 Cache Controller > > +-- > > +The SiFive Level 2 Cache Controller is used to provide access to fast > > copies > > +of memory for masters in a Core Complex. The Level 2 Cache Controller also > > +acts as directory-based coherency manager. > > + > > +Required Properties: > > + > > +- compatible: Should be "sifive,fu540-c000-ccache" > > + > > +- cache-block-size: Specifies the block size in bytes of the cache > > + > > +- cache-level: Should be set to 2 for a level 2 cache > > + > > +- cache-sets: Specifies the number of associativity sets of the cache > > + > > +- cache-size: Specifies the size in bytes of the cache > > + > > +- cache-unified: Specifies the cache is a unified cache > > + > > +- interrupt-parent: Must be core interrupt controller > > + > > +- interrupts: Must contain 3 entries (DirError, DataError and DataFail > > signals) > > + > > +- reg: Physical base address and size of L2 cache controller registers map > > + > > +- reg-names: Should be "control" > > + > > It would be good if you mark the properties that are present in DT > specification and those that are added for sifive,fu540-c000-ccache I believe there isn't any property which is added explicitly for sifive,fu540-c000-ccache. > explicitly. Also I assume you can retain the stardard "cache" compatible > in addition to above. I am interested to see if the cacheinfo infrastructure > can be used without any issues. Yes, I will add the "cache" string to the compatible property. > > -- > Regards, > Sudeep Thanks for your comments. - Yash
Re: [PATCHv2 4/4] printk: make sure we always print console disabled message
Forgot to mention that the series is still in RFC phase. On (04/26/19 14:33), Sergey Senozhatsky wrote: [..] > +++ b/kernel/printk/printk.c > @@ -2613,6 +2613,12 @@ static int __unregister_console(struct console > *console) > pr_info("%sconsole [%s%d] disabled\n", > (console->flags & CON_BOOT) ? "boot" : "", > console->name, console->index); > + /* > + * Print 'console disabled' on all the consoles, including the > + * one we are about to unregister. > + */ > + console_unlock(); > + console_lock(); > > res = _braille_unregister_console(console); > if (res) Need to think more if this is race free... -ss
Re: [PATCH 02/28] locking/lockdep: Add description and explanation in lockdep design doc
Thank you very much for review. You mean class can go away? Before Bart's addition, it can go away. Right? I think maybe the original point of "never go away" in that context did not intend to talk about a class's real disappearance. Anyway, the points should be made comprehensive. You want me to resend the patch or you modify it? On Thu, 25 Apr 2019 at 22:01, Peter Zijlstra wrote: > > On Wed, Apr 24, 2019 at 06:19:08PM +0800, Yuyang Du wrote: > > +Unlike a lock instance, a lock-class itself never goes away: when a > > +lock-class's instance is used for the first time after bootup the class > > gets > > +registered, and all (subsequent) instances of that lock-class will be > > mapped > > +to the lock-class. > > That's not entirely accurate anymore. Bart van Assche recently added > lockdep_{,un}register_key().
Re: [PATCH 2/2] RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs
On Thu, Apr 25, 2019 at 3:48 PM Sudeep Holla wrote: > > On Thu, Apr 25, 2019 at 11:24:56AM +0530, Yash Shah wrote: > > The driver currently supports only SiFive FU540-C000 platform. > > > > The initial version of L2 cache controller driver includes: > > - Initial configuration reporting at boot up. > > - Support for ECC related functionality. > > > > Signed-off-by: Yash Shah > > [] > > > +static const struct file_operations l2_fops = { > > + .owner = THIS_MODULE, > > + .open = simple_open, > > + .write = l2_write > > +}; > > + > > +static void setup_sifive_debug(void) > > +{ > > + sifive_test = debugfs_create_dir("sifive_l2_cache", NULL); > > + if (!sifive_test) > > Drop the conditional check above, Greg K H removed lots of them recently. > In his words: When calling debugfs functions, there is no need to ever > check the return value. The function can work or not, but the code > logic should never do something different based on this. > > He may not like to see this :) Sure, thanks for pointing it out. Will drop all the conditional check in debugfs functions. > > > + return; > > + > > + if (!debugfs_create_file("sifive_debug_inject_error", 0200, > > + sifive_test, NULL, &l2_fops)) > > Ditto. > > > + debugfs_remove_recursive(sifive_test); > > +} > > -- > Regards, > Sudeep Thanks for your comments. - Yash
Re: [PATCH 1/3] mfd: apple-ibridge: Add Apple iBridge MFD driver.
Hi Jonathan, On Wed, Apr 24, 2019 at 08:13:17PM +0100, Jonathan Cameron wrote: > On Wed, 24 Apr 2019 03:47:18 -0700 > "Life is hard, and then you die" wrote: > > > Hi Jonathan, > > > > On Mon, Apr 22, 2019 at 12:34:26PM +0100, Jonathan Cameron wrote: > > > On Sun, 21 Apr 2019 20:12:49 -0700 > > > Ronald Tschalär wrote: > > > > > > > The iBridge device provides access to several devices, including: > > > > - the Touch Bar > > > > - the iSight webcam > > > > - the light sensor > > > > - the fingerprint sensor > > > > > > > > This driver provides the core support for managing the iBridge device > > > > and the access to the underlying devices. In particular, since the > > > > functionality for the touch bar and light sensor is exposed via USB HID > > > > interfaces, and the same HID device is used for multiple functions, this > > > > driver provides a multiplexing layer that allows multiple HID drivers to > > > > be registered for a given HID device. This allows the touch bar and ALS > > > > driver to be separated out into their own modules. > > > > > > > > Signed-off-by: Ronald Tschalär > > Hi Ronald, > > > > > > I've only taken a fairly superficial look at this. A few global > > > things to note though. > > > > Thanks for this review. [snip] I've applied all your feedback in my tree, but it now looks like this module is going to be redone differently. I'll try to keep all your comments in mind during the rewrite, though, so they're not wasted. Cheers, Ronald
[PATCHv2 2/4] printk: remove invalid register_console() comment
We don't iterate consoles twice, since commit 8259cf434202 ("printk: Ensure that "console enabled" messages are printed on the console"), so the comment is not valid anymore, and can be removed, as was suggested by Petr. The patch also invokes pr_info("%sconsole [%s%d] enabled\n") before we unlock_consoles(), just to make sure that we really print that message on every registered and enabled console. Suggested-by: Petr Mladek Signed-off-by: Sergey Senozhatsky --- kernel/printk/printk.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index b0e361ca1bea..3ac71701afa3 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2806,9 +2806,22 @@ void register_console(struct console *newcon) exclusive_console_stop_seq = console_seq; logbuf_unlock_irqrestore(flags); } + + /* +* We are still under console_sem, pr_info() will only add the message +* to the kernel's log buffer. console_unlock() will print it on all +* registered and enabled consoles. +*/ + pr_info("%sconsole [%s%d] enabled\n", + (newcon->flags & CON_BOOT) ? "boot" : "", + newcon->name, newcon->index); + console_unlock(); console_sysfs_notify(); + if (keep_bootcon) + return; + /* * By unregistering the bootconsoles after we enable the real console * we get the "console xxx enabled" message on all the consoles - @@ -2816,19 +2829,8 @@ void register_console(struct console *newcon) * users know there might be something in the kernel's log buffer that * went to the bootconsole (that they do not see on the real console) */ - pr_info("%sconsole [%s%d] enabled\n", - (newcon->flags & CON_BOOT) ? "boot" : "" , - newcon->name, newcon->index); - - if (keep_bootcon) - return; - if (bcon && (newcon->flags & (CON_CONSDEV|CON_BOOT)) == CON_CONSDEV) { console_lock(); - /* -* We need to iterate through all boot consoles, to make -* sure we print everything out, before we unregister them. -*/ for_each_console(bcon) if (bcon->flags & CON_BOOT) __unregister_console(bcon); -- 2.21.0
[PATCHv2 4/4] printk: make sure we always print console disabled message
Make sure that we print 'console disabled' messages on all the consoles, including the one we are about to unregister. Otherwise, unregistered console will not have that message, because pr_info() under console_sem doesn't print anything. We do the same thing in __register_console() with the 'console enabled' message. Signed-off-by: Sergey Senozhatsky --- kernel/printk/printk.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 3b36e26d4a51..20c702b963a9 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2613,6 +2613,12 @@ static int __unregister_console(struct console *console) pr_info("%sconsole [%s%d] disabled\n", (console->flags & CON_BOOT) ? "boot" : "", console->name, console->index); + /* +* Print 'console disabled' on all the consoles, including the +* one we are about to unregister. +*/ + console_unlock(); + console_lock(); res = _braille_unregister_console(console); if (res) -- 2.21.0
[PATCHv2 3/4] printk: factor out register_console() code
We need to take console_sem lock when we iterate console drivers list. Otherwise, another CPU can concurrently modify console drivers list or console drivers. Current register_console() has several race conditions - for_each_console() must be done under console_sem. Factor out console registration code and hold console_sem throughout entire registration process. Note that we need to unlock console_sem and lock it again after we added new console to the list and before we unregister boot consoles. This might look a bit weird, but this is how we print pending logbuf messages to all registered and available consoles. Signed-off-by: Sergey Senozhatsky --- kernel/printk/printk.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 3ac71701afa3..3b36e26d4a51 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2666,7 +2666,7 @@ static int __unregister_console(struct console *console) * - Once a "real" console is registered, any attempt to register a *bootconsoles will be rejected */ -void register_console(struct console *newcon) +static void __register_console(struct console *newcon) { int i; unsigned long flags; @@ -2771,7 +2771,6 @@ void register_console(struct console *newcon) * Put this console in the list - keep the * preferred driver at the head of the list. */ - console_lock(); if ((newcon->flags & CON_CONSDEV) || console_drivers == NULL) { newcon->next = console_drivers; console_drivers = newcon; @@ -2818,6 +2817,7 @@ void register_console(struct console *newcon) console_unlock(); console_sysfs_notify(); + console_lock(); if (keep_bootcon) return; @@ -2830,14 +2830,19 @@ void register_console(struct console *newcon) * went to the bootconsole (that they do not see on the real console) */ if (bcon && (newcon->flags & (CON_CONSDEV|CON_BOOT)) == CON_CONSDEV) { - console_lock(); for_each_console(bcon) if (bcon->flags & CON_BOOT) __unregister_console(bcon); - console_unlock(); - console_sysfs_notify(); } } + +void register_console(struct console *newcon) +{ + console_lock(); + __register_console(newcon); + console_unlock(); + console_sysfs_notify(); +} EXPORT_SYMBOL(register_console); int unregister_console(struct console *console) -- 2.21.0
[PATCH] Revert "drm/qxl: drop prime import/export callbacks"
This reverts commit f4c34b1e2a37d5676180901fa6ff188bcb6371f8. Simliar to commit a0cecc23cfcb Revert "drm/virtio: drop prime import/export callbacks". We have to do the same with qxl, for the same reasons (it breaks DRI3). Drop the WARN_ON_ONCE(). Fixes: f4c34b1e2a37d5676180901fa6ff188bcb6371f8 Signed-off-by: Gerd Hoffmann --- drivers/gpu/drm/qxl/qxl_drv.c | 4 drivers/gpu/drm/qxl/qxl_prime.c | 12 2 files changed, 16 insertions(+) diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c index 578d867a81d5..f33e349c4ec5 100644 --- a/drivers/gpu/drm/qxl/qxl_drv.c +++ b/drivers/gpu/drm/qxl/qxl_drv.c @@ -255,10 +255,14 @@ static struct drm_driver qxl_driver = { #if defined(CONFIG_DEBUG_FS) .debugfs_init = qxl_debugfs_init, #endif + .prime_handle_to_fd = drm_gem_prime_handle_to_fd, + .prime_fd_to_handle = drm_gem_prime_fd_to_handle, .gem_prime_export = drm_gem_prime_export, .gem_prime_import = drm_gem_prime_import, .gem_prime_pin = qxl_gem_prime_pin, .gem_prime_unpin = qxl_gem_prime_unpin, + .gem_prime_get_sg_table = qxl_gem_prime_get_sg_table, + .gem_prime_import_sg_table = qxl_gem_prime_import_sg_table, .gem_prime_vmap = qxl_gem_prime_vmap, .gem_prime_vunmap = qxl_gem_prime_vunmap, .gem_prime_mmap = qxl_gem_prime_mmap, diff --git a/drivers/gpu/drm/qxl/qxl_prime.c b/drivers/gpu/drm/qxl/qxl_prime.c index 8b448eca1cd9..114653b471c6 100644 --- a/drivers/gpu/drm/qxl/qxl_prime.c +++ b/drivers/gpu/drm/qxl/qxl_prime.c @@ -42,6 +42,18 @@ void qxl_gem_prime_unpin(struct drm_gem_object *obj) qxl_bo_unpin(bo); } +struct sg_table *qxl_gem_prime_get_sg_table(struct drm_gem_object *obj) +{ + return ERR_PTR(-ENOSYS); +} + +struct drm_gem_object *qxl_gem_prime_import_sg_table( + struct drm_device *dev, struct dma_buf_attachment *attach, + struct sg_table *table) +{ + return ERR_PTR(-ENOSYS); +} + void *qxl_gem_prime_vmap(struct drm_gem_object *obj) { struct qxl_bo *bo = gem_to_qxl_bo(obj); -- 2.18.1
[PATCHv2 0/4] Access console drivers list under console_sem
Hello, Normally, we grab console_sem lock before we iterate consoles list, which is necessary if we want to be race free. The only exception to this rule is console_flush_on_panic(). However, it seems that we are not fully race free - register_console() iterates console drivers list in unsafe manner in several places. E.g. the following scenarion: CPU0CPU1 register_console() unregister_console() console_lock() for_each_console() // modify console_drivers con->fookfree(con) I factored out register_console() and unregister_console() and now the bulk of the work is done under console_sem. Both in register and unregister paths we now have that oddly looking thing pr_info("console enabled/disabled"); console_unlock(); console_lock(); Which is not really odd, in fact. This is to make sure that we always print messages on all the consoles. v2: - removed outdated comment (Petr) - factor out register_console() and always run it under console_sem (Petr) - added a patch which enusures that we always print "console disabled' on every console, before we unregister one of them Sergey Senozhatsky (4): printk: factor out __unregister_console() code printk: remove invalid register_console() comment printk: factor out register_console() code printk: make sure we always print console disabled message kernel/printk/printk.c | 125 + 1 file changed, 76 insertions(+), 49 deletions(-) -- 2.21.0
[PATCHv2 1/4] printk: factor out __unregister_console() code
The following pattern in register_console() is not completely safe: for_each_console(bcon) if (bcon->flags & CON_BOOT) unregister_console(bcon); Because, in theory, console drivers list and console drivers can be modified concurrently from another CPU. We need to grab console_sem lock, which protects console drivers list and console drivers, before we start iterating console drivers list. Factor out __unregister_console(), which will be called from unregister_console() and register_console(), in both cases under console_sem lock. Signed-off-by: Sergey Senozhatsky --- kernel/printk/printk.c | 98 -- 1 file changed, 56 insertions(+), 42 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 17102fd4c136..b0e361ca1bea 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2605,6 +2605,48 @@ static int __init keep_bootcon_setup(char *str) early_param("keep_bootcon", keep_bootcon_setup); +static int __unregister_console(struct console *console) +{ + struct console *a, *b; + int res; + + pr_info("%sconsole [%s%d] disabled\n", + (console->flags & CON_BOOT) ? "boot" : "", + console->name, console->index); + + res = _braille_unregister_console(console); + if (res) + return res; + + res = 1; + if (console_drivers == console) { + console_drivers = console->next; + res = 0; + } else if (console_drivers) { + for (a = console_drivers->next, b = console_drivers; +a; b = a, a = b->next) { + if (a == console) { + b->next = a->next; + res = 0; + break; + } + } + } + + if (!res && (console->flags & CON_EXTENDED)) + nr_ext_console_drivers--; + + /* +* If this isn't the last console and it has CON_CONSDEV set, we +* need to set it on the next preferred console. +*/ + if (console_drivers != NULL && console->flags & CON_CONSDEV) + console_drivers->flags |= CON_CONSDEV; + + console->flags &= ~CON_ENABLED; + return res; +} + /* * The console driver calls this routine during kernel initialization * to register the console printing procedure with printk() and to @@ -2777,62 +2819,34 @@ void register_console(struct console *newcon) pr_info("%sconsole [%s%d] enabled\n", (newcon->flags & CON_BOOT) ? "boot" : "" , newcon->name, newcon->index); - if (bcon && - ((newcon->flags & (CON_CONSDEV | CON_BOOT)) == CON_CONSDEV) && - !keep_bootcon) { - /* We need to iterate through all boot consoles, to make + + if (keep_bootcon) + return; + + if (bcon && (newcon->flags & (CON_CONSDEV|CON_BOOT)) == CON_CONSDEV) { + console_lock(); + /* +* We need to iterate through all boot consoles, to make * sure we print everything out, before we unregister them. */ for_each_console(bcon) if (bcon->flags & CON_BOOT) - unregister_console(bcon); + __unregister_console(bcon); + console_unlock(); + console_sysfs_notify(); } } EXPORT_SYMBOL(register_console); int unregister_console(struct console *console) { -struct console *a, *b; - int res; - - pr_info("%sconsole [%s%d] disabled\n", - (console->flags & CON_BOOT) ? "boot" : "" , - console->name, console->index); - - res = _braille_unregister_console(console); - if (res) - return res; + int ret; - res = 1; console_lock(); - if (console_drivers == console) { - console_drivers=console->next; - res = 0; - } else if (console_drivers) { - for (a=console_drivers->next, b=console_drivers ; -a; b=a, a=b->next) { - if (a == console) { - b->next = a->next; - res = 0; - break; - } - } - } - - if (!res && (console->flags & CON_EXTENDED)) - nr_ext_console_drivers--; - - /* -* If this isn't the last console and it has CON_CONSDEV set, we -* need to set it on the next preferred console. -*/ - if (console_drivers != NULL && console->flags & CON_CONSDEV) - console_drivers->flags |= CON_CONSDEV; - - console->flags &= ~CON_ENABLED; + ret = __unregister_console(console); console_unlock(); console_sysf
Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
On Thu 25-04-19 14:42:52, Jann Horn wrote: > On Thu, Apr 25, 2019 at 2:14 PM Michal Hocko wrote: > [...] > > On Wed 24-04-19 14:10:39, Matthew Garrett wrote: > > > From: Matthew Garrett > > > > > > Applications that hold secrets and wish to avoid them leaking can use > > > mlock() to prevent the page from being pushed out to swap and > > > MADV_DONTDUMP to prevent it from being included in core dumps. > > > Applications > > > can also use atexit() handlers to overwrite secrets on application exit. > > > However, if an attacker can reboot the system into another OS, they can > > > dump the contents of RAM and extract secrets. We can avoid this by setting > > > CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that > > > the > > > firmware wipe the contents of RAM before booting another OS, but this > > > means > > > rebooting takes a *long* time - the expected behaviour is for a clean > > > shutdown to remove the request after scrubbing secrets from RAM in order > > > to > > > avoid this. > > > > > > Unfortunately, if an application exits uncleanly, its secrets may still be > > > present in RAM. This can't be easily fixed in userland (eg, if the OOM > > > killer decides to kill a process holding secrets, we're not going to be > > > able > > > to avoid that), so this patch adds a new flag to madvise() to allow > > > userland > > > to request that the kernel clear the covered pages whenever the page > > > reference count hits zero. Since vm_flags is already full on 32-bit, it > > > will only work on 64-bit systems. > [...] > > > diff --git a/mm/madvise.c b/mm/madvise.c > > > index 21a7881a2db4..989c2fde15cf 100644 > > > --- a/mm/madvise.c > > > +++ b/mm/madvise.c > > > @@ -92,6 +92,22 @@ static long madvise_behavior(struct vm_area_struct > > > *vma, > > > case MADV_KEEPONFORK: > > > new_flags &= ~VM_WIPEONFORK; > > > break; > > > + case MADV_WIPEONRELEASE: > > > + /* MADV_WIPEONRELEASE is only supported on anonymous > > > memory. */ > > > + if (VM_WIPEONRELEASE == 0 || vma->vm_file || > > > + vma->vm_flags & VM_SHARED) { > > > + error = -EINVAL; > > > + goto out; > > > + } > > > + new_flags |= VM_WIPEONRELEASE; > > > + break; > > An interesting effect of this is that it will be possible to set this > on a CoW anon VMA in a fork() child, and then the semantics in the > parent will be subtly different - e.g. if the parent vmsplice()d a > CoWed page into a pipe, then forked an unprivileged child, the child Maybe a stupid question. How do you fork an unprivileged child (without exec)? Child would have to drop priviledges on its own, no? > set MADV_WIPEONRELEASE on its VMA, the parent died somehow, and then > the child died, the page in the pipe would be zeroed out. A child > should not be able to affect its parent like this, I think. If this > was an mmap() flag instead of a madvise() command, that issue could be > avoided. With a VMA flag underneath, I think you can do an early CoW during fork to prevent from that. > Alternatively, if adding more mmap() flags doesn't work, > perhaps you could scan the VMA and ensure that it contains no pages > yet, or something like that? > > > > diff --git a/mm/memory.c b/mm/memory.c > > > index ab650c21bccd..ff78b527660e 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -1091,6 +1091,9 @@ static unsigned long zap_pte_range(struct > > > mmu_gather *tlb, > > > page_remove_rmap(page, false); > > > if (unlikely(page_mapcount(page) < 0)) > > > print_bad_pte(vma, addr, ptent, page); > > > + if (unlikely(vma->vm_flags & VM_WIPEONRELEASE) && > > > + page_mapcount(page) == 0) > > > + clear_highpage(page); > > > if (unlikely(__tlb_remove_page(tlb, page))) { > > > force_flush = 1; > > > addr += PAGE_SIZE; > > Should something like this perhaps be added in page_remove_rmap() > instead? That's where the mapcount is decremented; and looking at > other callers of page_remove_rmap(), in particular the following ones > look interesting: Well spotted! -- Michal Hocko SUSE Labs
Re: [PATCH] nvme: determine the number of IO queues
On 4/25/19 10:39 PM, Christoph Hellwig wrote: > Honestly, unless this is a device shiping in a max market consumer > product already I don't think we should work around this crap at all, > given that this device has obviously never been tested at all. It > really needs a firmware fix instead of a host workaround. Already pushed this issue to firmware eng team. They will try to fix it. As far as I know we don't need this host workaround. Thanks, Aaron
Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
On Thu 25-04-19 13:39:01, Matthew Garrett wrote: > On Thu, Apr 25, 2019 at 5:37 AM Michal Hocko wrote: > > Besides that you inherently assume that the user would do mlock because > > you do not try to wipe the swap content. Is this intentional? > > Yes, given MADV_DONTDUMP doesn't imply mlock I thought it'd be more > consistent to keep those independent. Do we want to fail madvise call on VMAs that are not mlocked then? What if the munlock happens later after the madvise is called? -- Michal Hocko SUSE Labs
Re: [PATCH 1/2] serial: 8250-mtk: add follow control
On Thu, 2019-04-25 at 12:40 +0200, Matthias Brugger wrote: > > On 25/04/2019 10:41, Long Cheng wrote: > > Add SW and HW follow control function. > > Can you please explain a bit more what you are doing in this patch. > You change the setting of the registers for different baud rates. Please > elaborate what is happening there. > Clock source is different. Sometimes, baudrate is greater than or equal to 115200, we use highspeed of 3 algorithm and fractional divider to ensure more accurate baudrate. Next release version, I will update this to commit message > > > > Signed-off-by: Long Cheng > > --- > > drivers/tty/serial/8250/8250_mtk.c | 60 > > ++-- > > 1 file changed, 37 insertions(+), 23 deletions(-) > > > > diff --git a/drivers/tty/serial/8250/8250_mtk.c > > b/drivers/tty/serial/8250/8250_mtk.c > > index c1fdbc0..959fd85 100644 > > --- a/drivers/tty/serial/8250/8250_mtk.c > > +++ b/drivers/tty/serial/8250/8250_mtk.c > > @@ -21,12 +21,14 @@ > > > > #include "8250.h" > > > > -#define UART_MTK_HIGHS 0x09/* Highspeed register */ > > -#define UART_MTK_SAMPLE_COUNT 0x0a/* Sample count register */ > > -#define UART_MTK_SAMPLE_POINT 0x0b/* Sample point register */ > > +#define MTK_UART_HIGHS 0x09/* Highspeed register */ > > +#define MTK_UART_SAMPLE_COUNT 0x0a/* Sample count register */ > > +#define MTK_UART_SAMPLE_POINT 0x0b/* Sample point register */ > > Rename looks good to me. But I'd prefer to have it in a separate patch. > OK. > > #define MTK_UART_RATE_FIX 0x0d/* UART Rate Fix Register */ > > - > > #define MTK_UART_DMA_EN0x13/* DMA Enable register */ > > +#define MTK_UART_RXTRI_AD 0x14/* RX Trigger address */ > > +#define MTK_UART_FRACDIV_L 0x15/* Fractional divider LSB address */ > > +#define MTK_UART_FRACDIV_M 0x16/* Fractional divider MSB address */ > > #define MTK_UART_DMA_EN_TX 0x2 > > #define MTK_UART_DMA_EN_RX 0x5 > > > > @@ -46,6 +48,7 @@ enum dma_rx_status { > > struct mtk8250_data { > > int line; > > unsigned intrx_pos; > > + unsigned intclk_count; > > What is that for, not used in this patch. > It's for other patch. Sorry, I will remove it. > > struct clk *uart_clk; > > struct clk *bus_clk; > > struct uart_8250_dma*dma; > > @@ -196,9 +199,15 @@ static void mtk8250_shutdown(struct uart_port *port) > > mtk8250_set_termios(struct uart_port *port, struct ktermios *termios, > > struct ktermios *old) > > { > > + unsigned short fraction_L_mapping[] = { > > + 0, 1, 0x5, 0x15, 0x55, 0x57, 0x57, 0x77, 0x7F, 0xFF, 0xFF > > + }; > > + unsigned short fraction_M_mapping[] = { > > + 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 3 > > + }; > > struct uart_8250_port *up = up_to_u8250p(port); > > + unsigned int baud, quot, fraction; > > unsigned long flags; > > - unsigned int baud, quot; > > > > #ifdef CONFIG_SERIAL_8250_DMA > > if (up->dma) { > > @@ -214,7 +223,7 @@ static void mtk8250_shutdown(struct uart_port *port) > > serial8250_do_set_termios(port, termios, old); > > > > /* > > -* Mediatek UARTs use an extra highspeed register (UART_MTK_HIGHS) > > +* Mediatek UARTs use an extra highspeed register (MTK_UART_HIGHS) > > * > > * We need to recalcualte the quot register, as the claculation depends > > * on the vaule in the highspeed register. > > @@ -230,18 +239,11 @@ static void mtk8250_shutdown(struct uart_port *port) > > port->uartclk / 16 / UART_DIV_MAX, > > port->uartclk); > > > > - if (baud <= 115200) { > > - serial_port_out(port, UART_MTK_HIGHS, 0x0); > > + if (baud < 115200) { > > + serial_port_out(port, MTK_UART_HIGHS, 0x0); > > quot = uart_get_divisor(port, baud); > > - } else if (baud <= 576000) { > > - serial_port_out(port, UART_MTK_HIGHS, 0x2); > > - > > - /* Set to next lower baudrate supported */ > > - if ((baud == 50) || (baud == 576000)) > > - baud = 460800; > > - quot = DIV_ROUND_UP(port->uartclk, 4 * baud); > > So we allow now also these baud rates? Then you have to update the comment as > well. > Yes. When clock source is different, data sometimes is error by the previous algorithm. It's not good. So we update new method to fix the issue. > Regards, > Matthias > > > } else { > > - serial_port_out(port, UART_MTK_HIGHS, 0x3); > > + serial_port_out(port, MTK_UART_HIGHS, 0x3); > > quot = DIV_ROUND_UP(port->uartclk, 256 * baud); > > } > > > > @@ -258,17 +260,29 @@ static void mtk8250_shutdown(struct uart_port *port) > > /* reset DLAB */ > > serial_port_out(port, UART_LCR, up->lcr); > > > > - if (baud > 460800) { > > + if (baud >= 115200
Re: [PATCH] sparc: vdso: add FORCE to the build rule of %.so
From: Masahiro Yamada Date: Fri, 26 Apr 2019 09:40:46 +0900 > Hi David, > > > On Wed, Apr 3, 2019 at 5:34 PM Masahiro Yamada > wrote: >> >> $(call if_changed,...) must have FORCE as a prerequisite. >> >> Signed-off-by: Masahiro Yamada >> --- > > Ping? Sorry, I'm really busy and taking a short vacation before the LSF/MM summit. I will get to this when I have a chance. Thank you.
[PATCH v4 26/27] userfaultfd: selftests: refactor statistics
Introduce uffd_stats structure for statistics of the self test, at the same time refactor the code to always pass in the uffd_stats for either read() or poll() typed fault handling threads instead of using two different ways to return the statistic results. No functional change. With the new structure, it's very easy to introduce new statistics. Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- tools/testing/selftests/vm/userfaultfd.c | 76 +++- 1 file changed, 49 insertions(+), 27 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 5d1db824f73a..e5d12c209e09 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -88,6 +88,12 @@ static char *area_src, *area_src_alias, *area_dst, *area_dst_alias; static char *zeropage; pthread_attr_t attr; +/* Userfaultfd test statistics */ +struct uffd_stats { + int cpu; + unsigned long missing_faults; +}; + /* pthread_mutex_t starts at page offset 0 */ #define area_mutex(___area, ___nr) \ ((pthread_mutex_t *) ((___area) + (___nr)*page_size)) @@ -127,6 +133,17 @@ static void usage(void) exit(1); } +static void uffd_stats_reset(struct uffd_stats *uffd_stats, +unsigned long n_cpus) +{ + int i; + + for (i = 0; i < n_cpus; i++) { + uffd_stats[i].cpu = i; + uffd_stats[i].missing_faults = 0; + } +} + static int anon_release_pages(char *rel_area) { int ret = 0; @@ -469,8 +486,8 @@ static int uffd_read_msg(int ufd, struct uffd_msg *msg) return 0; } -/* Return 1 if page fault handled by us; otherwise 0 */ -static int uffd_handle_page_fault(struct uffd_msg *msg) +static void uffd_handle_page_fault(struct uffd_msg *msg, + struct uffd_stats *stats) { unsigned long offset; @@ -485,18 +502,19 @@ static int uffd_handle_page_fault(struct uffd_msg *msg) offset = (char *)(unsigned long)msg->arg.pagefault.address - area_dst; offset &= ~(page_size-1); - return copy_page(uffd, offset); + if (copy_page(uffd, offset)) + stats->missing_faults++; } static void *uffd_poll_thread(void *arg) { - unsigned long cpu = (unsigned long) arg; + struct uffd_stats *stats = (struct uffd_stats *)arg; + unsigned long cpu = stats->cpu; struct pollfd pollfd[2]; struct uffd_msg msg; struct uffdio_register uffd_reg; int ret; char tmp_chr; - unsigned long userfaults = 0; pollfd[0].fd = uffd; pollfd[0].events = POLLIN; @@ -526,7 +544,7 @@ static void *uffd_poll_thread(void *arg) msg.event), exit(1); break; case UFFD_EVENT_PAGEFAULT: - userfaults += uffd_handle_page_fault(&msg); + uffd_handle_page_fault(&msg, stats); break; case UFFD_EVENT_FORK: close(uffd); @@ -545,28 +563,27 @@ static void *uffd_poll_thread(void *arg) break; } } - return (void *)userfaults; + + return NULL; } pthread_mutex_t uffd_read_mutex = PTHREAD_MUTEX_INITIALIZER; static void *uffd_read_thread(void *arg) { - unsigned long *this_cpu_userfaults; + struct uffd_stats *stats = (struct uffd_stats *)arg; struct uffd_msg msg; - this_cpu_userfaults = (unsigned long *) arg; - *this_cpu_userfaults = 0; - pthread_mutex_unlock(&uffd_read_mutex); /* from here cancellation is ok */ for (;;) { if (uffd_read_msg(uffd, &msg)) continue; - (*this_cpu_userfaults) += uffd_handle_page_fault(&msg); + uffd_handle_page_fault(&msg, stats); } - return (void *)NULL; + + return NULL; } static void *background_thread(void *arg) @@ -582,13 +599,12 @@ static void *background_thread(void *arg) return NULL; } -static int stress(unsigned long *userfaults) +static int stress(struct uffd_stats *uffd_stats) { unsigned long cpu; pthread_t locking_threads[nr_cpus]; pthread_t uffd_threads[nr_cpus]; pthread_t background_threads[nr_cpus]; - void **_userfaults = (void **) userfaults; finished = 0; for (cpu = 0; cpu < nr_cpus; cpu++) { @@ -597,12 +613,13 @@ static int stress(unsigned long *userfaults) return 1; if (bounces & BOUNCE_POLL) { if (pthread_create(&uffd_threads[cpu], &attr, - uffd_poll_thread, (void *)cpu)) + uffd_poll_thread, + (void *)&uffd_stats[cpu]))
[PATCH v4 27/27] userfaultfd: selftests: add write-protect test
This patch adds uffd tests for write protection. Instead of introducing new tests for it, let's simply squashing uffd-wp tests into existing uffd-missing test cases. Changes are: (1) Bouncing tests We do the write-protection in two ways during the bouncing test: - By using UFFDIO_COPY_MODE_WP when resolving MISSING pages: then we'll make sure for each bounce process every single page will be at least fault twice: once for MISSING, once for WP. - By direct call UFFDIO_WRITEPROTECT on existing faulted memories: To further torture the explicit page protection procedures of uffd-wp, we split each bounce procedure into two halves (in the background thread): the first half will be MISSING+WP for each page as explained above. After the first half, we write protect the faulted region in the background thread to make sure at least half of the pages will be write protected again which is the first half to test the new UFFDIO_WRITEPROTECT call. Then we continue with the 2nd half, which will contain both MISSING and WP faulting tests for the 2nd half and WP-only faults from the 1st half. (2) Event/Signal test Mostly previous tests but will do MISSING+WP for each page. For sigbus-mode test we'll need to provide standalone path to handle the write protection faults. For all tests, do statistics as well for uffd-wp pages. Signed-off-by: Peter Xu --- tools/testing/selftests/vm/userfaultfd.c | 157 +++ 1 file changed, 133 insertions(+), 24 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index e5d12c209e09..bf1e10db72f5 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -56,6 +56,7 @@ #include #include #include +#include #include "../kselftest.h" @@ -78,6 +79,8 @@ static int test_type; #define ALARM_INTERVAL_SECS 10 static volatile bool test_uffdio_copy_eexist = true; static volatile bool test_uffdio_zeropage_eexist = true; +/* Whether to test uffd write-protection */ +static bool test_uffdio_wp = false; static bool map_shared; static int huge_fd; @@ -92,6 +95,7 @@ pthread_attr_t attr; struct uffd_stats { int cpu; unsigned long missing_faults; + unsigned long wp_faults; }; /* pthread_mutex_t starts at page offset 0 */ @@ -141,9 +145,29 @@ static void uffd_stats_reset(struct uffd_stats *uffd_stats, for (i = 0; i < n_cpus; i++) { uffd_stats[i].cpu = i; uffd_stats[i].missing_faults = 0; + uffd_stats[i].wp_faults = 0; } } +static void uffd_stats_report(struct uffd_stats *stats, int n_cpus) +{ + int i; + unsigned long long miss_total = 0, wp_total = 0; + + for (i = 0; i < n_cpus; i++) { + miss_total += stats[i].missing_faults; + wp_total += stats[i].wp_faults; + } + + printf("userfaults: %llu missing (", miss_total); + for (i = 0; i < n_cpus; i++) + printf("%lu+", stats[i].missing_faults); + printf("\b), %llu wp (", wp_total); + for (i = 0; i < n_cpus; i++) + printf("%lu+", stats[i].wp_faults); + printf("\b)\n"); +} + static int anon_release_pages(char *rel_area) { int ret = 0; @@ -264,10 +288,15 @@ struct uffd_test_ops { void (*alias_mapping)(__u64 *start, size_t len, unsigned long offset); }; -#define ANON_EXPECTED_IOCTLS ((1 << _UFFDIO_WAKE) | \ +#define SHMEM_EXPECTED_IOCTLS ((1 << _UFFDIO_WAKE) | \ (1 << _UFFDIO_COPY) | \ (1 << _UFFDIO_ZEROPAGE)) +#define ANON_EXPECTED_IOCTLS ((1 << _UFFDIO_WAKE) | \ +(1 << _UFFDIO_COPY) | \ +(1 << _UFFDIO_ZEROPAGE) | \ +(1 << _UFFDIO_WRITEPROTECT)) + static struct uffd_test_ops anon_uffd_test_ops = { .expected_ioctls = ANON_EXPECTED_IOCTLS, .allocate_area = anon_allocate_area, @@ -276,7 +305,7 @@ static struct uffd_test_ops anon_uffd_test_ops = { }; static struct uffd_test_ops shmem_uffd_test_ops = { - .expected_ioctls = ANON_EXPECTED_IOCTLS, + .expected_ioctls = SHMEM_EXPECTED_IOCTLS, .allocate_area = shmem_allocate_area, .release_pages = shmem_release_pages, .alias_mapping = noop_alias_mapping, @@ -300,6 +329,21 @@ static int my_bcmp(char *str1, char *str2, size_t n) return 0; } +static void wp_range(int ufd, __u64 start, __u64 len, bool wp) +{ + struct uffdio_writeprotect prms = { 0 }; + + /* Write protection page faults */ + prms.range.start = start; + prms.range.len = len; + /* Undo write-protect, do wakeup after that */ + prms.mode = wp ? UFFDIO_WRITEPROTECT_MODE_WP : 0; + + if (ioctl(ufd, UFFDIO_WRITEP
[PATCH v4 25/27] userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally
Only declare _UFFDIO_WRITEPROTECT if the user specified UFFDIO_REGISTER_MODE_WP and if all the checks passed. Then when the user registers regions with shmem/hugetlbfs we won't expose the new ioctl to them. Even with complete anonymous memory range, we'll only expose the new WP ioctl bit if the register mode has MODE_WP. Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- fs/userfaultfd.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index f1f61a0278c2..7f87e9e4fb9b 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1456,14 +1456,24 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, up_write(&mm->mmap_sem); mmput(mm); if (!ret) { + __u64 ioctls_out; + + ioctls_out = basic_ioctls ? UFFD_API_RANGE_IOCTLS_BASIC : + UFFD_API_RANGE_IOCTLS; + + /* +* Declare the WP ioctl only if the WP mode is +* specified and all checks passed with the range +*/ + if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_WP)) + ioctls_out &= ~((__u64)1 << _UFFDIO_WRITEPROTECT); + /* * Now that we scanned all vmas we can already tell * userland which ioctls methods are guaranteed to * succeed on this range. */ - if (put_user(basic_ioctls ? UFFD_API_RANGE_IOCTLS_BASIC : -UFFD_API_RANGE_IOCTLS, -&user_uffdio_register->ioctls)) + if (put_user(ioctls_out, &user_uffdio_register->ioctls)) ret = -EFAULT; } out: -- 2.17.1
[PATCH v4 22/27] userfaultfd: wp: enabled write protection in userfaultfd API
From: Shaohua Li Now it's safe to enable write protection in userfaultfd API Cc: Andrea Arcangeli Cc: Pavel Emelyanov Cc: Rik van Riel Cc: Kirill A. Shutemov Cc: Mel Gorman Cc: Hugh Dickins Cc: Johannes Weiner Signed-off-by: Shaohua Li Signed-off-by: Andrea Arcangeli Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- include/uapi/linux/userfaultfd.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 95c4a160e5f8..e7e98bde221f 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -19,7 +19,8 @@ * means the userland is reading). */ #define UFFD_API ((__u64)0xAA) -#define UFFD_API_FEATURES (UFFD_FEATURE_EVENT_FORK | \ +#define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP |\ + UFFD_FEATURE_EVENT_FORK |\ UFFD_FEATURE_EVENT_REMAP | \ UFFD_FEATURE_EVENT_REMOVE | \ UFFD_FEATURE_EVENT_UNMAP | \ @@ -34,7 +35,8 @@ #define UFFD_API_RANGE_IOCTLS \ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ -(__u64)1 << _UFFDIO_ZEROPAGE) +(__u64)1 << _UFFDIO_ZEROPAGE | \ +(__u64)1 << _UFFDIO_WRITEPROTECT) #define UFFD_API_RANGE_IOCTLS_BASIC\ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY) -- 2.17.1
[PATCH v4 19/27] userfaultfd: introduce helper vma_find_uffd
We've have multiple (and more coming) places that would like to find a userfault enabled VMA from a mm struct that covers a specific memory range. This patch introduce the helper for it, meanwhile apply it to the code. Suggested-by: Mike Rapoport Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- mm/userfaultfd.c | 54 +++- 1 file changed, 30 insertions(+), 24 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 240de2a8492d..2606409572b2 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -20,6 +20,34 @@ #include #include "internal.h" +/* + * Find a valid userfault enabled VMA region that covers the whole + * address range, or NULL on failure. Must be called with mmap_sem + * held. + */ +static struct vm_area_struct *vma_find_uffd(struct mm_struct *mm, + unsigned long start, + unsigned long len) +{ + struct vm_area_struct *vma = find_vma(mm, start); + + if (!vma) + return NULL; + + /* +* Check the vma is registered in uffd, this is required to +* enforce the VM_MAYWRITE check done at uffd registration +* time. +*/ + if (!vma->vm_userfaultfd_ctx.ctx) + return NULL; + + if (start < vma->vm_start || start + len > vma->vm_end) + return NULL; + + return vma; +} + static int mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, @@ -228,20 +256,9 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, */ if (!dst_vma) { err = -ENOENT; - dst_vma = find_vma(dst_mm, dst_start); + dst_vma = vma_find_uffd(dst_mm, dst_start, len); if (!dst_vma || !is_vm_hugetlb_page(dst_vma)) goto out_unlock; - /* -* Check the vma is registered in uffd, this is -* required to enforce the VM_MAYWRITE check done at -* uffd registration time. -*/ - if (!dst_vma->vm_userfaultfd_ctx.ctx) - goto out_unlock; - - if (dst_start < dst_vma->vm_start || - dst_start + len > dst_vma->vm_end) - goto out_unlock; err = -EINVAL; if (vma_hpagesize != vma_kernel_pagesize(dst_vma)) @@ -488,20 +505,9 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, * both valid and fully within a single existing vma. */ err = -ENOENT; - dst_vma = find_vma(dst_mm, dst_start); + dst_vma = vma_find_uffd(dst_mm, dst_start, len); if (!dst_vma) goto out_unlock; - /* -* Check the vma is registered in uffd, this is required to -* enforce the VM_MAYWRITE check done at uffd registration -* time. -*/ - if (!dst_vma->vm_userfaultfd_ctx.ctx) - goto out_unlock; - - if (dst_start < dst_vma->vm_start || - dst_start + len > dst_vma->vm_end) - goto out_unlock; err = -EINVAL; /* -- 2.17.1
[PATCH v4 13/27] mm: introduce do_wp_page_cont()
The userfaultfd handling in do_wp_page() is very special comparing to the rest of the function because it only postpones the real handling of the page fault to the userspace program. Isolate the handling part of do_wp_page() into a new function called do_wp_page_cont() so that we can use it somewhere else when resolving the userfault page fault. Signed-off-by: Peter Xu --- include/linux/mm.h | 2 ++ mm/memory.c| 8 2 files changed, 10 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index a5ac81188523..a2911de04cdd 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -445,6 +445,8 @@ struct vm_fault { */ }; +vm_fault_t do_wp_page_cont(struct vm_fault *vmf); + /* page entry size for vm->huge_fault() */ enum page_entry_size { PE_SIZE_PTE = 0, diff --git a/mm/memory.c b/mm/memory.c index 64bd8075f054..ab98a1eb4702 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2497,6 +2497,14 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) return handle_userfault(vmf, VM_UFFD_WP); } + return do_wp_page_cont(vmf); +} + +vm_fault_t do_wp_page_cont(struct vm_fault *vmf) + __releases(vmf->ptl) +{ + struct vm_area_struct *vma = vmf->vma; + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /* -- 2.17.1
[PATCH v4 24/27] userfaultfd: wp: UFFDIO_REGISTER_MODE_WP documentation update
From: Martin Cracauer Adds documentation about the write protection support. Signed-off-by: Martin Cracauer Signed-off-by: Andrea Arcangeli [peterx: rewrite in rst format; fixups here and there] Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- Documentation/admin-guide/mm/userfaultfd.rst | 51 1 file changed, 51 insertions(+) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 5048cf661a8a..c30176e67900 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -108,6 +108,57 @@ UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an half copied page since it'll keep userfaulting until the copy has finished. +Notes: + +- If you requested UFFDIO_REGISTER_MODE_MISSING when registering then + you must provide some kind of page in your thread after reading from + the uffd. You must provide either UFFDIO_COPY or UFFDIO_ZEROPAGE. + The normal behavior of the OS automatically providing a zero page on + an annonymous mmaping is not in place. + +- None of the page-delivering ioctls default to the range that you + registered with. You must fill in all fields for the appropriate + ioctl struct including the range. + +- You get the address of the access that triggered the missing page + event out of a struct uffd_msg that you read in the thread from the + uffd. You can supply as many pages as you want with UFFDIO_COPY or + UFFDIO_ZEROPAGE. Keep in mind that unless you used DONTWAKE then + the first of any of those IOCTLs wakes up the faulting thread. + +- Be sure to test for all errors including (pollfd[0].revents & + POLLERR). This can happen, e.g. when ranges supplied were + incorrect. + +Write Protect Notifications +--- + +This is equivalent to (but faster than) using mprotect and a SIGSEGV +signal handler. + +Firstly you need to register a range with UFFDIO_REGISTER_MODE_WP. +Instead of using mprotect(2) you use ioctl(uffd, UFFDIO_WRITEPROTECT, +struct *uffdio_writeprotect) while mode = UFFDIO_WRITEPROTECT_MODE_WP +in the struct passed in. The range does not default to and does not +have to be identical to the range you registered with. You can write +protect as many ranges as you like (inside the registered range). +Then, in the thread reading from uffd the struct will have +msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP set. Now you send +ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect) again +while pagefault.mode does not have UFFDIO_WRITEPROTECT_MODE_WP set. +This wakes up the thread which will continue to run with writes. This +allows you to do the bookkeeping about the write in the uffd reading +thread before the ioctl. + +If you registered with both UFFDIO_REGISTER_MODE_MISSING and +UFFDIO_REGISTER_MODE_WP then you need to think about the sequence in +which you supply a page and undo write protect. Note that there is a +difference between writes into a WP area and into a !WP area. The +former will have UFFD_PAGEFAULT_FLAG_WP set, the latter +UFFD_PAGEFAULT_FLAG_WRITE. The latter did not fail on protection but +you still need to supply a page when UFFDIO_REGISTER_MODE_MISSING was +used. + QEMU/KVM -- 2.17.1
[PATCH v4 21/27] userfaultfd: wp: add the writeprotect API to userfaultfd ioctl
From: Andrea Arcangeli v1: From: Shaohua Li v2: cleanups, remove a branch. [peterx writes up the commit message, as below...] This patch introduces the new uffd-wp APIs for userspace. Firstly, we'll allow to do UFFDIO_REGISTER with write protection tracking using the new UFFDIO_REGISTER_MODE_WP flag. Note that this flag can co-exist with the existing UFFDIO_REGISTER_MODE_MISSING, in which case the userspace program can not only resolve missing page faults, and at the same time tracking page data changes along the way. Secondly, we introduced the new UFFDIO_WRITEPROTECT API to do page level write protection tracking. Note that we will need to register the memory region with UFFDIO_REGISTER_MODE_WP before that. Signed-off-by: Andrea Arcangeli [peterx: remove useless block, write commit message, check against VM_MAYWRITE rather than VM_WRITE when register] Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- fs/userfaultfd.c | 82 +--- include/uapi/linux/userfaultfd.h | 23 + 2 files changed, 89 insertions(+), 16 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 3092885c9d2c..81962d62520c 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -304,8 +304,11 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, if (!pmd_present(_pmd)) goto out; - if (pmd_trans_huge(_pmd)) + if (pmd_trans_huge(_pmd)) { + if (!pmd_write(_pmd) && (reason & VM_UFFD_WP)) + ret = true; goto out; + } /* * the pmd is stable (as in !pmd_trans_unstable) so we can re-read it @@ -318,6 +321,8 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, */ if (pte_none(*pte)) ret = true; + if (!pte_write(*pte) && (reason & VM_UFFD_WP)) + ret = true; pte_unmap(pte); out: @@ -1251,10 +1256,13 @@ static __always_inline int validate_range(struct mm_struct *mm, return 0; } -static inline bool vma_can_userfault(struct vm_area_struct *vma) +static inline bool vma_can_userfault(struct vm_area_struct *vma, +unsigned long vm_flags) { - return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || - vma_is_shmem(vma); + /* FIXME: add WP support to hugetlbfs and shmem */ + return vma_is_anonymous(vma) || + ((is_vm_hugetlb_page(vma) || vma_is_shmem(vma)) && +!(vm_flags & VM_UFFD_WP)); } static int userfaultfd_register(struct userfaultfd_ctx *ctx, @@ -1286,15 +1294,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, vm_flags = 0; if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING) vm_flags |= VM_UFFD_MISSING; - if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) { + if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) vm_flags |= VM_UFFD_WP; - /* -* FIXME: remove the below error constraint by -* implementing the wprotect tracking mode. -*/ - ret = -EINVAL; - goto out; - } ret = validate_range(mm, uffdio_register.range.start, uffdio_register.range.len); @@ -1342,7 +1343,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, /* check not compatible vmas */ ret = -EINVAL; - if (!vma_can_userfault(cur)) + if (!vma_can_userfault(cur, vm_flags)) goto out_unlock; /* @@ -1370,6 +1371,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, if (end & (vma_hpagesize - 1)) goto out_unlock; } + if ((vm_flags & VM_UFFD_WP) && !(cur->vm_flags & VM_MAYWRITE)) + goto out_unlock; /* * Check that this vma isn't already owned by a @@ -1399,7 +1402,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, do { cond_resched(); - BUG_ON(!vma_can_userfault(vma)); + BUG_ON(!vma_can_userfault(vma, vm_flags)); BUG_ON(vma->vm_userfaultfd_ctx.ctx && vma->vm_userfaultfd_ctx.ctx != ctx); WARN_ON(!(vma->vm_flags & VM_MAYWRITE)); @@ -1534,7 +1537,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, * provides for more strict behavior to notice * unregistration errors. */ - if (!vma_can_userfault(cur)) + if (!vma_can_userfault(cur, cur->vm_flags)) goto out_unlock; found = true; @@ -1548,7 +1551,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx
[PATCH v4 23/27] userfaultfd: wp: don't wake up when doing write protect
It does not make sense to try to wake up any waiting thread when we're write-protecting a memory region. Only wake up when resolving a write protected page fault. Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- fs/userfaultfd.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 81962d62520c..f1f61a0278c2 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1771,6 +1771,7 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, struct uffdio_writeprotect uffdio_wp; struct uffdio_writeprotect __user *user_uffdio_wp; struct userfaultfd_wake_range range; + bool mode_wp, mode_dontwake; if (READ_ONCE(ctx->mmap_changing)) return -EAGAIN; @@ -1789,18 +1790,20 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, if (uffdio_wp.mode & ~(UFFDIO_WRITEPROTECT_MODE_DONTWAKE | UFFDIO_WRITEPROTECT_MODE_WP)) return -EINVAL; - if ((uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP) && -(uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_DONTWAKE)) + + mode_wp = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP; + mode_dontwake = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_DONTWAKE; + + if (mode_wp && mode_dontwake) return -EINVAL; ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start, - uffdio_wp.range.len, uffdio_wp.mode & - UFFDIO_WRITEPROTECT_MODE_WP, + uffdio_wp.range.len, mode_wp, &ctx->mmap_changing); if (ret) return ret; - if (!(uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_DONTWAKE)) { + if (!mode_wp && !mode_dontwake) { range.start = uffdio_wp.range.start; range.len = uffdio_wp.range.len; wake_userfault(ctx, &range); -- 2.17.1
[PATCH v4 15/27] userfaultfd: wp: drop _PAGE_UFFD_WP properly when fork
UFFD_EVENT_FORK support for uffd-wp should be already there, except that we should clean the uffd-wp bit if uffd fork event is not enabled. Detect that to avoid _PAGE_UFFD_WP being set even if the VMA is not being tracked by VM_UFFD_WP. Do this for both small PTEs and huge PMDs. Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- mm/huge_memory.c | 8 mm/memory.c | 8 2 files changed, 16 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3885747d4901..cf8f11d6e6cd 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -976,6 +976,14 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pmd = *src_pmd; + /* +* Make sure the _PAGE_UFFD_WP bit is cleared if the new VMA +* does not have the VM_UFFD_WP, which means that the uffd +* fork event is not enabled. +*/ + if (!(vma->vm_flags & VM_UFFD_WP)) + pmd = pmd_clear_uffd_wp(pmd); + #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION if (unlikely(is_swap_pmd(pmd))) { swp_entry_t entry = pmd_to_swp_entry(pmd); diff --git a/mm/memory.c b/mm/memory.c index 965d974bb9bd..2abf0934ad7f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -789,6 +789,14 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, pte = pte_mkclean(pte); pte = pte_mkold(pte); + /* +* Make sure the _PAGE_UFFD_WP bit is cleared if the new VMA +* does not have the VM_UFFD_WP, which means that the uffd +* fork event is not enabled. +*/ + if (!(vm_flags & VM_UFFD_WP)) + pte = pte_clear_uffd_wp(pte); + page = vm_normal_page(vma, addr, pte); if (page) { get_page(page); -- 2.17.1
[PATCH v4 20/27] userfaultfd: wp: support write protection for userfault vma range
From: Shaohua Li Add API to enable/disable writeprotect a vma range. Unlike mprotect, this doesn't split/merge vmas. Cc: Andrea Arcangeli Cc: Rik van Riel Cc: Kirill A. Shutemov Cc: Mel Gorman Cc: Hugh Dickins Cc: Johannes Weiner Signed-off-by: Shaohua Li Signed-off-by: Andrea Arcangeli [peterx: - use the helper to find VMA; - return -ENOENT if not found to match mcopy case; - use the new MM_CP_UFFD_WP* flags for change_protection - check against mmap_changing for failures] Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- include/linux/userfaultfd_k.h | 3 ++ mm/userfaultfd.c | 54 +++ 2 files changed, 57 insertions(+) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 765ce884cec0..8f6e6ed544fb 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long len, bool *mmap_changing); +extern int mwriteprotect_range(struct mm_struct *dst_mm, + unsigned long start, unsigned long len, + bool enable_wp, bool *mmap_changing); /* mm helpers */ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 2606409572b2..70cea2ff3960 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, { return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0); } + +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, + unsigned long len, bool enable_wp, bool *mmap_changing) +{ + struct vm_area_struct *dst_vma; + pgprot_t newprot; + int err; + + /* +* Sanitize the command parameters: +*/ + BUG_ON(start & ~PAGE_MASK); + BUG_ON(len & ~PAGE_MASK); + + /* Does the address range wrap, or is the span zero-sized? */ + BUG_ON(start + len <= start); + + down_read(&dst_mm->mmap_sem); + + /* +* If memory mappings are changing because of non-cooperative +* operation (e.g. mremap) running in parallel, bail out and +* request the user to retry later +*/ + err = -EAGAIN; + if (mmap_changing && READ_ONCE(*mmap_changing)) + goto out_unlock; + + err = -ENOENT; + dst_vma = vma_find_uffd(dst_mm, start, len); + /* +* Make sure the vma is not shared, that the dst range is +* both valid and fully within a single existing vma. +*/ + if (!dst_vma || (dst_vma->vm_flags & VM_SHARED)) + goto out_unlock; + if (!userfaultfd_wp(dst_vma)) + goto out_unlock; + if (!vma_is_anonymous(dst_vma)) + goto out_unlock; + + if (enable_wp) + newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE)); + else + newprot = vm_get_page_prot(dst_vma->vm_flags); + + change_protection(dst_vma, start, start + len, newprot, + enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE); + + err = 0; +out_unlock: + up_read(&dst_mm->mmap_sem); + return err; +} -- 2.17.1
[PATCH v4 18/27] khugepaged: skip collapse if uffd-wp detected
Don't collapse the huge PMD if there is any userfault write protected small PTEs. The problem is that the write protection is in small page granularity and there's no way to keep all these write protection information if the small pages are going to be merged into a huge PMD. The same thing needs to be considered for swap entries and migration entries. So do the check as well disregarding khugepaged_max_ptes_swap. Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- include/trace/events/huge_memory.h | 1 + mm/khugepaged.c| 23 +++ 2 files changed, 24 insertions(+) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index dd4db334bd63..2d7bad9cb976 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -13,6 +13,7 @@ EM( SCAN_PMD_NULL, "pmd_null") \ EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \ + EM( SCAN_PTE_UFFD_WP, "pte_uffd_wp") \ EM( SCAN_PAGE_RO, "no_writable_page") \ EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \ EM( SCAN_PAGE_NULL, "page_null")\ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 449044378782..6aa9935317d4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -29,6 +29,7 @@ enum scan_result { SCAN_PMD_NULL, SCAN_EXCEED_NONE_PTE, SCAN_PTE_NON_PRESENT, + SCAN_PTE_UFFD_WP, SCAN_PAGE_RO, SCAN_LACK_REFERENCED_PAGE, SCAN_PAGE_NULL, @@ -1124,6 +1125,15 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, pte_t pteval = *_pte; if (is_swap_pte(pteval)) { if (++unmapped <= khugepaged_max_ptes_swap) { + /* +* Always be strict with uffd-wp +* enabled swap entries. Please see +* comment below for pte_uffd_wp(). +*/ + if (pte_swp_uffd_wp(pteval)) { + result = SCAN_PTE_UFFD_WP; + goto out_unmap; + } continue; } else { result = SCAN_EXCEED_SWAP_PTE; @@ -1143,6 +1153,19 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, result = SCAN_PTE_NON_PRESENT; goto out_unmap; } + if (pte_uffd_wp(pteval)) { + /* +* Don't collapse the page if any of the small +* PTEs are armed with uffd write protection. +* Here we can also mark the new huge pmd as +* write protected if any of the small ones is +* marked but that could bring uknown +* userfault messages that falls outside of +* the registered range. So, just be simple. +*/ + result = SCAN_PTE_UFFD_WP; + goto out_unmap; + } if (pte_write(pteval)) writable = true; -- 2.17.1
[PATCH v4 17/27] userfaultfd: wp: support swap and page migration
For either swap and page migration, we all use the bit 2 of the entry to identify whether this entry is uffd write-protected. It plays a similar role as the existing soft dirty bit in swap entries but only for keeping the uffd-wp tracking for a specific PTE/PMD. Something special here is that when we want to recover the uffd-wp bit from a swap/migration entry to the PTE bit we'll also need to take care of the _PAGE_RW bit and make sure it's cleared, otherwise even with the _PAGE_UFFD_WP bit we can't trap it at all. In change_pte_range() we do nothing for uffd if the PTE is a swap entry. That can lead to data mismatch if the page that we are going to write protect is swapped out when sending the UFFDIO_WRITEPROTECT. This patch also applies/removes the uffd-wp bit even for the swap entries. Signed-off-by: Peter Xu --- include/linux/swapops.h | 2 ++ mm/huge_memory.c| 3 +++ mm/memory.c | 8 mm/migrate.c| 6 ++ mm/mprotect.c | 28 +--- mm/rmap.c | 6 ++ 6 files changed, 42 insertions(+), 11 deletions(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 4d961668e5fc..0c2923b1cdb7 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -68,6 +68,8 @@ static inline swp_entry_t pte_to_swp_entry(pte_t pte) if (pte_swp_soft_dirty(pte)) pte = pte_swp_clear_soft_dirty(pte); + if (pte_swp_uffd_wp(pte)) + pte = pte_swp_clear_uffd_wp(pte); arch_entry = __pte_to_swp_entry(pte); return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry)); } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index cf8f11d6e6cd..998a7e5d625e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2212,6 +2212,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, write = is_write_migration_entry(entry); young = false; soft_dirty = pmd_swp_soft_dirty(old_pmd); + uffd_wp = pmd_swp_uffd_wp(old_pmd); } else { page = pmd_page(old_pmd); if (pmd_dirty(old_pmd)) @@ -2244,6 +2245,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, entry = swp_entry_to_pte(swp_entry); if (soft_dirty) entry = pte_swp_mksoft_dirty(entry); + if (uffd_wp) + entry = pte_swp_mkuffd_wp(entry); } else { entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot)); entry = maybe_mkwrite(entry, vma); diff --git a/mm/memory.c b/mm/memory.c index 2abf0934ad7f..f53f54592ddc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -737,6 +737,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, pte = swp_entry_to_pte(entry); if (pte_swp_soft_dirty(*src_pte)) pte = pte_swp_mksoft_dirty(pte); + if (pte_swp_uffd_wp(*src_pte)) + pte = pte_swp_mkuffd_wp(pte); set_pte_at(src_mm, addr, src_pte, pte); } } else if (is_device_private_entry(entry)) { @@ -766,6 +768,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, is_cow_mapping(vm_flags)) { make_device_private_entry_read(&entry); pte = swp_entry_to_pte(entry); + if (pte_swp_uffd_wp(*src_pte)) + pte = pte_swp_mkuffd_wp(pte); set_pte_at(src_mm, addr, src_pte, pte); } } @@ -2854,6 +2858,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) flush_icache_page(vma, page); if (pte_swp_soft_dirty(vmf->orig_pte)) pte = pte_mksoft_dirty(pte); + if (pte_swp_uffd_wp(vmf->orig_pte)) { + pte = pte_mkuffd_wp(pte); + pte = pte_wrprotect(pte); + } set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte); vmf->orig_pte = pte; diff --git a/mm/migrate.c b/mm/migrate.c index 663a5449367a..deff1f8c20af 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -241,11 +241,15 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, entry = pte_to_swp_entry(*pvmw.pte); if (is_write_migration_entry(entry)) pte = maybe_mkwrite(pte, vma); + else if (pte_swp_uffd_wp(*pvmw.pte)) + pte = pte_mkuffd_wp(pte); if (unlikely(is_zone_device_page(new))
[PATCH v4 16/27] userfaultfd: wp: add pmd_swp_*uffd_wp() helpers
Adding these missing helpers for uffd-wp operations with pmd swap/migration entries. Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- arch/x86/include/asm/pgtable.h | 15 +++ include/asm-generic/pgtable_uffd.h | 15 +++ 2 files changed, 30 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 6863236e8484..18a815d6f4ea 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1401,6 +1401,21 @@ static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) { return pte_clear_flags(pte, _PAGE_SWP_UFFD_WP); } + +static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) +{ + return pmd_set_flags(pmd, _PAGE_SWP_UFFD_WP); +} + +static inline int pmd_swp_uffd_wp(pmd_t pmd) +{ + return pmd_flags(pmd) & _PAGE_SWP_UFFD_WP; +} + +static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) +{ + return pmd_clear_flags(pmd, _PAGE_SWP_UFFD_WP); +} #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ #define PKRU_AD_BIT 0x1 diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtable_uffd.h index 643d1bf559c2..828966d4c281 100644 --- a/include/asm-generic/pgtable_uffd.h +++ b/include/asm-generic/pgtable_uffd.h @@ -46,6 +46,21 @@ static __always_inline pte_t pte_swp_clear_uffd_wp(pte_t pte) { return pte; } + +static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) +{ + return pmd; +} + +static inline int pmd_swp_uffd_wp(pmd_t pmd) +{ + return 0; +} + +static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) +{ + return pmd; +} #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ #endif /* _ASM_GENERIC_PGTABLE_UFFD_H */ -- 2.17.1
[PATCH v4 14/27] userfaultfd: wp: handle COW properly for uffd-wp
This allows uffd-wp to support write-protected pages for COW. For example, the uffd write-protected PTE could also be write-protected by other usages like COW or zero pages. When that happens, we can't simply set the write bit in the PTE since otherwise it'll change the content of every single reference to the page. Instead, we should do the COW first if necessary, then handle the uffd-wp fault. To correctly copy the page, we'll also need to carry over the _PAGE_UFFD_WP bit if it was set in the original PTE. For huge PMDs, we just simply split the huge PMDs where we want to resolve an uffd-wp page fault always. That matches what we do with general huge PMD write protections. In that way, we resolved the huge PMD copy-on-write issue into PTE copy-on-write. Signed-off-by: Peter Xu --- mm/memory.c | 5 - mm/mprotect.c | 55 --- 2 files changed, 56 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index ab98a1eb4702..965d974bb9bd 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2299,7 +2299,10 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) } flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); entry = mk_pte(new_page, vma->vm_page_prot); - entry = maybe_mkwrite(pte_mkdirty(entry), vma); + if (pte_uffd_wp(vmf->orig_pte)) + entry = pte_mkuffd_wp(entry); + else + entry = maybe_mkwrite(pte_mkdirty(entry), vma); /* * Clear the pte entry and flush it first, before updating the * pte with the new entry. This will avoid a race condition diff --git a/mm/mprotect.c b/mm/mprotect.c index 732d9b6d1d21..1f40662182f8 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -73,18 +73,18 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, flush_tlb_batched_pending(vma->vm_mm); arch_enter_lazy_mmu_mode(); do { +retry_pte: oldpte = *pte; if (pte_present(oldpte)) { pte_t ptent; bool preserve_write = prot_numa && pte_write(oldpte); + struct page *page; /* * Avoid trapping faults against the zero or KSM * pages. See similar comment in change_huge_pmd. */ if (prot_numa) { - struct page *page; - page = vm_normal_page(vma, addr, oldpte); if (!page || PageKsm(page)) continue; @@ -114,6 +114,45 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, continue; } + /* +* Detect whether we'll need to COW before +* resolving an uffd-wp fault. Note that this +* includes detection of the zero page (where +* page==NULL) +*/ + if (uffd_wp_resolve) { + struct vm_fault vmf = { + .vma = vma, + .address = addr & PAGE_MASK, + .orig_pte = oldpte, + .pmd = pmd, + .pte = pte, + .ptl = ptl, + }; + vm_fault_t ret; + + /* If the fault is resolved already, skip */ + if (!pte_uffd_wp(*pte)) + continue; + + arch_leave_lazy_mmu_mode(); + /* With PTE lock held */ + ret = do_wp_page_cont(&vmf); + if (ret != VM_FAULT_WRITE && ret != 0) + /* Probably OOM */ + return pages; + pte = pte_offset_map_lock(vma->vm_mm, pmd, + addr, &ptl); + arch_enter_lazy_mmu_mode(); + if (ret == 0 || !pte_present(*pte)) + /* +* This PTE could have been modified +* during or after COW before taking +* the lock; retry. +*/ + goto retry_pte; + } +
[PATCH v4 12/27] userfaultfd: wp: apply _PAGE_UFFD_WP bit
Firstly, introduce two new flags MM_CP_UFFD_WP[_RESOLVE] for change_protection() when used with uffd-wp and make sure the two new flags are exclusively used. Then, - For MM_CP_UFFD_WP: apply the _PAGE_UFFD_WP bit and remove _PAGE_RW when a range of memory is write protected by uffd - For MM_CP_UFFD_WP_RESOLVE: remove the _PAGE_UFFD_WP bit and recover _PAGE_RW when write protection is resolved from userspace And use this new interface in mwriteprotect_range() to replace the old MM_CP_DIRTY_ACCT. Do this change for both PTEs and huge PMDs. Then we can start to identify which PTE/PMD is write protected by general (e.g., COW or soft dirty tracking), and which is for userfaultfd-wp. Since we should keep the _PAGE_UFFD_WP when doing pte_modify(), add it into _PAGE_CHG_MASK as well. Meanwhile, since we have this new bit, we can be even more strict when detecting uffd-wp page faults in either do_wp_page() or wp_huge_pmd(). Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- include/linux/mm.h | 5 + mm/huge_memory.c | 14 +- mm/memory.c| 4 ++-- mm/mprotect.c | 12 mm/userfaultfd.c | 8 ++-- 5 files changed, 38 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 086e69d4439d..a5ac81188523 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1652,6 +1652,11 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma, #define MM_CP_DIRTY_ACCT (1UL << 0) /* Whether this protection change is for NUMA hints */ #define MM_CP_PROT_NUMA (1UL << 1) +/* Whether this change is for write protecting */ +#define MM_CP_UFFD_WP (1UL << 2) /* do wp */ +#define MM_CP_UFFD_WP_RESOLVE (1UL << 3) /* Resolve wp */ +#define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ + MM_CP_UFFD_WP_RESOLVE) extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long start, unsigned long end, pgprot_t newprot, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 64d26b1989d2..3885747d4901 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1907,6 +1907,8 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, bool preserve_write; int ret; bool prot_numa = cp_flags & MM_CP_PROT_NUMA; + bool uffd_wp = cp_flags & MM_CP_UFFD_WP; + bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; ptl = __pmd_trans_huge_lock(pmd, vma); if (!ptl) @@ -1973,6 +1975,13 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, entry = pmd_modify(entry, newprot); if (preserve_write) entry = pmd_mk_savedwrite(entry); + if (uffd_wp) { + entry = pmd_wrprotect(entry); + entry = pmd_mkuffd_wp(entry); + } else if (uffd_wp_resolve) { + entry = pmd_mkwrite(entry); + entry = pmd_clear_uffd_wp(entry); + } ret = HPAGE_PMD_NR; set_pmd_at(mm, addr, pmd, entry); BUG_ON(vma_is_anonymous(vma) && !preserve_write && pmd_write(entry)); @@ -2120,7 +2129,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, struct page *page; pgtable_t pgtable; pmd_t old_pmd, _pmd; - bool young, write, soft_dirty, pmd_migration = false; + bool young, write, soft_dirty, pmd_migration = false, uffd_wp = false; unsigned long addr; int i; @@ -2202,6 +2211,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, write = pmd_write(old_pmd); young = pmd_young(old_pmd); soft_dirty = pmd_soft_dirty(old_pmd); + uffd_wp = pmd_uffd_wp(old_pmd); } VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); @@ -2235,6 +2245,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, entry = pte_mkold(entry); if (soft_dirty) entry = pte_mksoft_dirty(entry); + if (uffd_wp) + entry = pte_mkuffd_wp(entry); } pte = pte_offset_map(&_pmd, addr); BUG_ON(!pte_none(*pte)); diff --git a/mm/memory.c b/mm/memory.c index 8ccd4927b58d..64bd8075f054 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2492,7 +2492,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; - if (userfaultfd_wp(vma)) { + if (userfaultfd_pte_wp(vma, *vmf->pte)) { pte_unmap_unlock(vmf->pte, vmf->ptl); return handle_userfault(vmf, VM_UFFD_WP); } @@ -3713,7 +3713,7 @@ static inline vm_fault_t create_huge_pmd(struct vm
[PATCH v4 10/27] userfaultfd: wp: add UFFDIO_COPY_MODE_WP
From: Andrea Arcangeli This allows UFFDIO_COPY to map pages write-protected. Signed-off-by: Andrea Arcangeli [peterx: switch to VM_WARN_ON_ONCE in mfill_atomic_pte; add brackets around "dst_vma->vm_flags & VM_WRITE"; fix wordings in comments and commit messages] Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- fs/userfaultfd.c | 5 +++-- include/linux/userfaultfd_k.h| 2 +- include/uapi/linux/userfaultfd.h | 11 +- mm/userfaultfd.c | 36 ++-- 4 files changed, 35 insertions(+), 19 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index b397bc3b954d..3092885c9d2c 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1683,11 +1683,12 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, ret = -EINVAL; if (uffdio_copy.src + uffdio_copy.len <= uffdio_copy.src) goto out; - if (uffdio_copy.mode & ~UFFDIO_COPY_MODE_DONTWAKE) + if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP)) goto out; if (mmget_not_zero(ctx->mm)) { ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, - uffdio_copy.len, &ctx->mmap_changing); + uffdio_copy.len, &ctx->mmap_changing, + uffdio_copy.mode); mmput(ctx->mm); } else { return -ESRCH; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index c6590c58ce28..765ce884cec0 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -34,7 +34,7 @@ extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason); extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool *mmap_changing); + bool *mmap_changing, __u64 mode); extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long len, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 48f1a7c2f1f0..340f23bc251d 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -203,13 +203,14 @@ struct uffdio_copy { __u64 dst; __u64 src; __u64 len; +#define UFFDIO_COPY_MODE_DONTWAKE ((__u64)1<<0) /* -* There will be a wrprotection flag later that allows to map -* pages wrprotected on the fly. And such a flag will be -* available if the wrprotection ioctl are implemented for the -* range according to the uffdio_register.ioctls. +* UFFDIO_COPY_MODE_WP will map the page write protected on +* the fly. UFFDIO_COPY_MODE_WP is available only if the +* write protected ioctl is implemented for the range +* according to the uffdio_register.ioctls. */ -#define UFFDIO_COPY_MODE_DONTWAKE ((__u64)1<<0) +#define UFFDIO_COPY_MODE_WP((__u64)1<<1) __u64 mode; /* diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index d59b5a73dfb3..eaecc21806da 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -25,7 +25,8 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - struct page **pagep) + struct page **pagep, + bool wp_copy) { struct mem_cgroup *memcg; pte_t _dst_pte, *dst_pte; @@ -71,9 +72,9 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, if (mem_cgroup_try_charge(page, dst_mm, GFP_KERNEL, &memcg, false)) goto out_release; - _dst_pte = mk_pte(page, dst_vma->vm_page_prot); - if (dst_vma->vm_flags & VM_WRITE) - _dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte)); + _dst_pte = pte_mkdirty(mk_pte(page, dst_vma->vm_page_prot)); + if ((dst_vma->vm_flags & VM_WRITE) && !wp_copy) + _dst_pte = pte_mkwrite(_dst_pte); dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); if (dst_vma->vm_file) { @@ -399,7 +400,8 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, struct page **page, - bool zeropage) + bool zeropage, + bool wp_copy) { ssize_t
[PATCH v4 11/27] mm: merge parameters for change_protection()
change_protection() was used by either the NUMA or mprotect() code, there's one parameter for each of the callers (dirty_accountable and prot_numa). Further, these parameters are passed along the calls: - change_protection_range() - change_p4d_range() - change_pud_range() - change_pmd_range() - ... Now we introduce a flag for change_protect() and all these helpers to replace these parameters. Then we can avoid passing multiple parameters multiple times along the way. More importantly, it'll greatly simplify the work if we want to introduce any new parameters to change_protection(). In the follow up patches, a new parameter for userfaultfd write protection will be introduced. No functional change at all. Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 2 +- include/linux/mm.h | 14 +- mm/huge_memory.c| 3 ++- mm/mempolicy.c | 2 +- mm/mprotect.c | 29 - 5 files changed, 33 insertions(+), 17 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 381e872bfde0..1550fb12dbd4 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -46,7 +46,7 @@ extern bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, pmd_t *old_pmd, pmd_t *new_pmd); extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, pgprot_t newprot, - int prot_numa); + unsigned long cp_flags); vm_fault_t vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, pfn_t pfn, bool write); vm_fault_t vmf_insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr, diff --git a/include/linux/mm.h b/include/linux/mm.h index bad93704abc8..086e69d4439d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1641,9 +1641,21 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len, bool need_rmap_locks); + +/* + * Flags used by change_protection(). For now we make it a bitmap so + * that we can pass in multiple flags just like parameters. However + * for now all the callers are only use one of the flags at the same + * time. + */ +/* Whether we should allow dirty bit accounting */ +#define MM_CP_DIRTY_ACCT (1UL << 0) +/* Whether this protection change is for NUMA hints */ +#define MM_CP_PROT_NUMA (1UL << 1) + extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long start, unsigned long end, pgprot_t newprot, - int dirty_accountable, int prot_numa); + unsigned long cp_flags); extern int mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, unsigned long start, unsigned long end, unsigned long newflags); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 165ea46bf149..64d26b1989d2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1899,13 +1899,14 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, * - HPAGE_PMD_NR is protections changed and TLB flush necessary */ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long addr, pgprot_t newprot, int prot_numa) + unsigned long addr, pgprot_t newprot, unsigned long cp_flags) { struct mm_struct *mm = vma->vm_mm; spinlock_t *ptl; pmd_t entry; bool preserve_write; int ret; + bool prot_numa = cp_flags & MM_CP_PROT_NUMA; ptl = __pmd_trans_huge_lock(pmd, vma); if (!ptl) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 2219e747df49..825053818bcb 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -575,7 +575,7 @@ unsigned long change_prot_numa(struct vm_area_struct *vma, { int nr_updated; - nr_updated = change_protection(vma, addr, end, PAGE_NONE, 0, 1); + nr_updated = change_protection(vma, addr, end, PAGE_NONE, MM_CP_PROT_NUMA); if (nr_updated) count_vm_numa_events(NUMA_PTE_UPDATES, nr_updated); diff --git a/mm/mprotect.c b/mm/mprotect.c index 028c724dcb1a..98091408bd11 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -37,13 +37,15 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, pgprot_t newprot, - int dirty_accountable, int prot_numa) + unsigned long cp_flags) { struct mm_struct *mm = vma->vm_mm; pte_t *pte, oldpte; spinlock_t *ptl; unsigned long pages = 0; int target_node = NUMA_NO_NODE; + bool dirty_accountable = cp_flag
[PATCH v4 06/27] userfaultfd: wp: add helper for writeprotect check
From: Shaohua Li add helper for writeprotect check. Will use it later. Cc: Andrea Arcangeli Cc: Pavel Emelyanov Cc: Rik van Riel Cc: Kirill A. Shutemov Cc: Mel Gorman Cc: Hugh Dickins Cc: Johannes Weiner Signed-off-by: Shaohua Li Signed-off-by: Andrea Arcangeli Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- include/linux/userfaultfd_k.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 37c9eba75c98..38f748e7186e 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -50,6 +50,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma) return vma->vm_flags & VM_UFFD_MISSING; } +static inline bool userfaultfd_wp(struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_UFFD_WP; +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP); @@ -94,6 +99,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma) return false; } +static inline bool userfaultfd_wp(struct vm_area_struct *vma) +{ + return false; +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return false; -- 2.17.1
[PATCH v4 09/27] userfaultfd: wp: userfaultfd_pte/huge_pmd_wp() helpers
From: Andrea Arcangeli Implement helpers methods to invoke userfaultfd wp faults more selectively: not only when a wp fault triggers on a vma with vma->vm_flags VM_UFFD_WP set, but only if the _PAGE_UFFD_WP bit is set in the pagetable too. Signed-off-by: Andrea Arcangeli Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- include/linux/userfaultfd_k.h | 27 +++ 1 file changed, 27 insertions(+) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 38f748e7186e..c6590c58ce28 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -14,6 +14,8 @@ #include /* linux/include/uapi/linux/userfaultfd.h */ #include +#include +#include /* * CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining @@ -55,6 +57,18 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma) return vma->vm_flags & VM_UFFD_WP; } +static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, + pte_t pte) +{ + return userfaultfd_wp(vma) && pte_uffd_wp(pte); +} + +static inline bool userfaultfd_huge_pmd_wp(struct vm_area_struct *vma, + pmd_t pmd) +{ + return userfaultfd_wp(vma) && pmd_uffd_wp(pmd); +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP); @@ -104,6 +118,19 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma) return false; } +static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, + pte_t pte) +{ + return false; +} + +static inline bool userfaultfd_huge_pmd_wp(struct vm_area_struct *vma, + pmd_t pmd) +{ + return false; +} + + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return false; -- 2.17.1
[PATCH v4 08/27] userfaultfd: wp: add WP pagetable tracking to x86
From: Andrea Arcangeli Accurate userfaultfd WP tracking is possible by tracking exactly which virtual memory ranges were writeprotected by userland. We can't relay only on the RW bit of the mapped pagetable because that information is destroyed by fork() or KSM or swap. If we were to relay on that, we'd need to stay on the safe side and generate false positive wp faults for every swapped out page. Signed-off-by: Andrea Arcangeli [peterx: append _PAGE_UFD_WP to _PAGE_CHG_MASK] Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- arch/x86/Kconfig | 1 + arch/x86/include/asm/pgtable.h | 52 arch/x86/include/asm/pgtable_64.h| 8 - arch/x86/include/asm/pgtable_types.h | 11 +- include/asm-generic/pgtable.h| 1 + include/asm-generic/pgtable_uffd.h | 51 +++ init/Kconfig | 5 +++ 7 files changed, 127 insertions(+), 2 deletions(-) create mode 100644 include/asm-generic/pgtable_uffd.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5ad92419be19..70d369fe08d7 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -208,6 +208,7 @@ config X86 select USER_STACKTRACE_SUPPORT select VIRT_TO_BUS select X86_FEATURE_NAMESif PROC_FS + select HAVE_ARCH_USERFAULTFD_WP if USERFAULTFD config INSTRUCTION_DECODER def_bool y diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 2779ace16d23..6863236e8484 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -23,6 +23,7 @@ #ifndef __ASSEMBLY__ #include +#include extern pgd_t early_top_pgt[PTRS_PER_PGD]; int __init __early_make_pgtable(unsigned long address, pmdval_t pmd); @@ -293,6 +294,23 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear) return native_make_pte(v & ~clear); } +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +static inline int pte_uffd_wp(pte_t pte) +{ + return pte_flags(pte) & _PAGE_UFFD_WP; +} + +static inline pte_t pte_mkuffd_wp(pte_t pte) +{ + return pte_set_flags(pte, _PAGE_UFFD_WP); +} + +static inline pte_t pte_clear_uffd_wp(pte_t pte) +{ + return pte_clear_flags(pte, _PAGE_UFFD_WP); +} +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ + static inline pte_t pte_mkclean(pte_t pte) { return pte_clear_flags(pte, _PAGE_DIRTY); @@ -372,6 +390,23 @@ static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear) return native_make_pmd(v & ~clear); } +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +static inline int pmd_uffd_wp(pmd_t pmd) +{ + return pmd_flags(pmd) & _PAGE_UFFD_WP; +} + +static inline pmd_t pmd_mkuffd_wp(pmd_t pmd) +{ + return pmd_set_flags(pmd, _PAGE_UFFD_WP); +} + +static inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) +{ + return pmd_clear_flags(pmd, _PAGE_UFFD_WP); +} +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ + static inline pmd_t pmd_mkold(pmd_t pmd) { return pmd_clear_flags(pmd, _PAGE_ACCESSED); @@ -1351,6 +1386,23 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd) #endif #endif +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +static inline pte_t pte_swp_mkuffd_wp(pte_t pte) +{ + return pte_set_flags(pte, _PAGE_SWP_UFFD_WP); +} + +static inline int pte_swp_uffd_wp(pte_t pte) +{ + return pte_flags(pte) & _PAGE_SWP_UFFD_WP; +} + +static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +{ + return pte_clear_flags(pte, _PAGE_SWP_UFFD_WP); +} +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ + #define PKRU_AD_BIT 0x1 #define PKRU_WD_BIT 0x2 #define PKRU_BITS_PER_PKEY 2 diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 0bb566315621..627666b1c3c0 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -189,7 +189,7 @@ extern void sync_global_pgds(unsigned long start, unsigned long end); * * | ...| 11| 10| 9|8|7|6|5| 4| 3|2| 1|0| <- bit number * | ...|SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names - * | TYPE (59-63) | ~OFFSET (9-58) |0|0|X|X| X| X|X|SD|0| <- swp entry + * | TYPE (59-63) | ~OFFSET (9-58) |0|0|X|X| X| X|F|SD|0| <- swp entry * * G (8) is aliased and used as a PROT_NONE indicator for * !present ptes. We need to start storing swap entries above @@ -197,9 +197,15 @@ extern void sync_global_pgds(unsigned long start, unsigned long end); * erratum where they can be incorrectly set by hardware on * non-present PTEs. * + * SD Bits 1-4 are not used in non-present format and available for + * special use described below: + * * SD (1) in swp entry is used to store soft dirty bit, which helps us * remember soft dirty over page migration * + * F (2) in swp entry is used to record when a pagetable is + * writeprotected by userfaultfd WP support. + * * Bit 7 in swp entry should be 0 because pmd_present checks not only P, *
[PATCH v4 04/27] mm: allow VM_FAULT_RETRY for multiple times
The idea comes from a discussion between Linus and Andrea [1]. Before this patch we only allow a page fault to retry once. We achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing handle_mm_fault() the second time. This was majorly used to avoid unexpected starvation of the system by looping over forever to handle the page fault on a single page. However that should hardly happen, and after all for each code path to return a VM_FAULT_RETRY we'll first wait for a condition (during which time we should possibly yield the cpu) to happen before VM_FAULT_RETRY is really returned. This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY. It means that the page fault handler now can retry the page fault for multiple times if necessary without the need to generate another page fault event. Meanwhile we still keep the FAULT_FLAG_TRIED flag so page fault handler can still identify whether a page fault is the first attempt or not. Then we'll have these combinations of fault flags (only considering ALLOW_RETRY flag and TRIED flag): - ALLOW_RETRY and !TRIED: this means the page fault allows to retry, and this is the first try - ALLOW_RETRY and TRIED: this means the page fault allows to retry, and this is not the first try - !ALLOW_RETRY and !TRIED: this means the page fault does not allow to retry at all - !ALLOW_RETRY and TRIED: this is forbidden and should never be used In existing code we have multiple places that has taken special care of the first condition above by checking against (fault_flags & FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect the first retry of a page fault by checking against both (fault_flags & FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now even the 2nd try will have the ALLOW_RETRY set, then use that helper in all existing special paths. One example is in __lock_page_or_retry(), now we'll drop the mmap_sem only in the first attempt of page fault and we'll keep it in follow up retries, so old locking behavior will be retained. This will be a nice enhancement for current code [2] at the same time a supporting material for the future userfaultfd-writeprotect work, since in that work there will always be an explicit userfault writeprotect retry for protected pages, and if that cannot resolve the page fault (e.g., when userfaultfd-writeprotect is used in conjunction with swapped pages) then we'll possibly need a 3rd retry of the page fault. It might also benefit other potential users who will have similar requirement like userfault write-protection. GUP code is not touched yet and will be covered in follow up patch. Please read the thread below for more information. [1] https://lkml.org/lkml/2017/11/2/833 [2] https://lkml.org/lkml/2018/12/30/64 Suggested-by: Linus Torvalds Suggested-by: Andrea Arcangeli Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- arch/alpha/mm/fault.c | 2 +- arch/arc/mm/fault.c | 1 - arch/arm/mm/fault.c | 3 --- arch/arm64/mm/fault.c | 5 arch/hexagon/mm/vm_fault.c | 1 - arch/ia64/mm/fault.c| 1 - arch/m68k/mm/fault.c| 3 --- arch/microblaze/mm/fault.c | 1 - arch/mips/mm/fault.c| 1 - arch/nds32/mm/fault.c | 1 - arch/nios2/mm/fault.c | 3 --- arch/openrisc/mm/fault.c| 1 - arch/parisc/mm/fault.c | 4 +--- arch/powerpc/mm/fault.c | 6 - arch/riscv/mm/fault.c | 5 arch/s390/mm/fault.c| 5 +--- arch/sh/mm/fault.c | 1 - arch/sparc/mm/fault_32.c| 1 - arch/sparc/mm/fault_64.c| 1 - arch/um/kernel/trap.c | 1 - arch/unicore32/mm/fault.c | 4 +--- arch/x86/mm/fault.c | 2 -- arch/xtensa/mm/fault.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 +++--- include/linux/mm.h | 41 - mm/filemap.c| 2 +- mm/shmem.c | 2 +- 27 files changed, 55 insertions(+), 56 deletions(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index 8a2ef90b4bfc..6a02c0fb36b9 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -169,7 +169,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; + flags |= FAULT_FLAG_TRIED; /* No need to up_read(&mm->mmap_sem) as we would * have already released it in __lock_page_or_retry diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index 9e9e6eb1f7d0..e7d2947ba72c 100644 --- a/arch/arc/mm/fault.c +++ b/arch/
[PATCH v4 07/27] userfaultfd: wp: hook userfault handler to write protection fault
From: Andrea Arcangeli There are several cases write protection fault happens. It could be a write to zero page, swaped page or userfault write protected page. When the fault happens, there is no way to know if userfault write protect the page before. Here we just blindly issue a userfault notification for vma with VM_UFFD_WP regardless if app write protects it yet. Application should be ready to handle such wp fault. v1: From: Shaohua Li v2: Handle the userfault in the common do_wp_page. If we get there a pagetable is present and readonly so no need to do further processing until we solve the userfault. In the swapin case, always swapin as readonly. This will cause false positive userfaults. We need to decide later if to eliminate them with a flag like soft-dirty in the swap entry (see _PAGE_SWP_SOFT_DIRTY). hugetlbfs wouldn't need to worry about swapouts but and tmpfs would be handled by a swap entry bit like anonymous memory. The main problem with no easy solution to eliminate the false positives, will be if/when userfaultfd is extended to real filesystem pagecache. When the pagecache is freed by reclaim we can't leave the radix tree pinned if the inode and in turn the radix tree is reclaimed as well. The estimation is that full accuracy and lack of false positives could be easily provided only to anonymous memory (as long as there's no fork or as long as MADV_DONTFORK is used on the userfaultfd anonymous range) tmpfs and hugetlbfs, it's most certainly worth to achieve it but in a later incremental patch. v3: Add hooking point for THP wrprotect faults. CC: Shaohua Li Signed-off-by: Andrea Arcangeli [peterx: don't conditionally drop FAULT_FLAG_WRITE in do_swap_page] Reviewed-by: Mike Rapoport Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- mm/memory.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index ab650c21bccd..8ccd4927b58d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2492,6 +2492,11 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; + if (userfaultfd_wp(vma)) { + pte_unmap_unlock(vmf->pte, vmf->ptl); + return handle_userfault(vmf, VM_UFFD_WP); + } + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /* @@ -3707,8 +3712,11 @@ static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf) /* `inline' is required to avoid gcc 4.1.2 build error */ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf, pmd_t orig_pmd) { - if (vma_is_anonymous(vmf->vma)) + if (vma_is_anonymous(vmf->vma)) { + if (userfaultfd_wp(vmf->vma)) + return handle_userfault(vmf, VM_UFFD_WP); return do_huge_pmd_wp_page(vmf, orig_pmd); + } if (vmf->vma->vm_ops->huge_fault) return vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PMD); -- 2.17.1
[PATCH v4 02/27] mm: userfault: return VM_FAULT_RETRY on signals
The idea comes from the upstream discussion between Linus and Andrea: https://lkml.org/lkml/2017/10/30/560 A summary to the issue: there was a special path in handle_userfault() in the past that we'll return a VM_FAULT_NOPAGE when we detected non-fatal signals when waiting for userfault handling. We did that by reacquiring the mmap_sem before returning. However that brings a risk in that the vmas might have changed when we retake the mmap_sem and even we could be holding an invalid vma structure. This patch removes the special path and we'll return a VM_FAULT_RETRY with the common path even if we have got such signals. Then for all the architectures that is passing in VM_FAULT_ALLOW_RETRY into handle_mm_fault(), we check not only for SIGKILL but for all the rest of userspace pending signals right after we returned from handle_mm_fault(). This can allow the userspace to handle nonfatal signals faster than before. This patch is a preparation work for the next patch to finally remove the special code path mentioned above in handle_userfault(). Suggested-by: Linus Torvalds Suggested-by: Andrea Arcangeli Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- arch/alpha/mm/fault.c | 2 +- arch/arc/mm/fault.c| 11 --- arch/arm/mm/fault.c| 6 +++--- arch/arm64/mm/fault.c | 6 +++--- arch/hexagon/mm/vm_fault.c | 2 +- arch/ia64/mm/fault.c | 2 +- arch/m68k/mm/fault.c | 2 +- arch/microblaze/mm/fault.c | 2 +- arch/mips/mm/fault.c | 2 +- arch/nds32/mm/fault.c | 6 +++--- arch/nios2/mm/fault.c | 2 +- arch/openrisc/mm/fault.c | 2 +- arch/parisc/mm/fault.c | 2 +- arch/powerpc/mm/fault.c| 2 ++ arch/riscv/mm/fault.c | 4 ++-- arch/s390/mm/fault.c | 9 ++--- arch/sh/mm/fault.c | 4 arch/sparc/mm/fault_32.c | 3 +++ arch/sparc/mm/fault_64.c | 3 +++ arch/um/kernel/trap.c | 5 - arch/unicore32/mm/fault.c | 4 ++-- arch/x86/mm/fault.c| 6 +- arch/xtensa/mm/fault.c | 3 +++ 23 files changed, 56 insertions(+), 34 deletions(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index 188fc9256baf..8a2ef90b4bfc 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -150,7 +150,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr, the fault. */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && signal_pending(current)) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index 8df1638259f3..9e9e6eb1f7d0 100644 --- a/arch/arc/mm/fault.c +++ b/arch/arc/mm/fault.c @@ -141,17 +141,14 @@ void do_page_fault(unsigned long address, struct pt_regs *regs) */ fault = handle_mm_fault(vma, address, flags); - if (fatal_signal_pending(current)) { - + if (unlikely((fault & VM_FAULT_RETRY) && signal_pending(current))) { + if (fatal_signal_pending(current) && !user_mode(regs)) + goto no_context; /* * if fault retry, mmap_sem already relinquished by core mm * so OK to return to user mode (with signal handled first) */ - if (fault & VM_FAULT_RETRY) { - if (!user_mode(regs)) - goto no_context; - return; - } + return; } perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index 58f69fa07df9..c41c021bbe40 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -314,12 +314,12 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) fault = __do_page_fault(mm, addr, fsr, flags, tsk); - /* If we need to retry but a fatal signal is pending, handle the + /* If we need to retry but a signal is pending, handle the * signal first. We do not need to release the mmap_sem because * it would already be released in __lock_page_or_retry in * mm/filemap.c. */ - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) { - if (!user_mode(regs)) + if (unlikely(fault & VM_FAULT_RETRY && signal_pending(current))) { + if (fatal_signal_pending(current) && !user_mode(regs)) goto no_context; return 0; } diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 1a7e92ab69eb..46c32d639fbf 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -512,13 +512,13 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, if (fault & VM_FAULT_RETRY) { /* -* If we need to retry but a fatal signal is pending, +
[PATCH v4 01/27] mm: gup: rename "nonblocking" to "locked" where proper
There's plenty of places around __get_user_pages() that has a parameter "nonblocking" which does not really mean that "it won't block" (because it can really block) but instead it shows whether the mmap_sem is released by up_read() during the page fault handling mostly when VM_FAULT_RETRY is returned. We have the correct naming in e.g. get_user_pages_locked() or get_user_pages_remote() as "locked", however there're still many places that are using the "nonblocking" as name. Renaming the places to "locked" where proper to better suite the functionality of the variable. While at it, fixing up some of the comments accordingly. Reviewed-by: Mike Rapoport Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- mm/gup.c | 44 +--- mm/hugetlb.c | 8 2 files changed, 25 insertions(+), 27 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index f84e22685aaa..a78d252d6358 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -509,12 +509,12 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, } /* - * mmap_sem must be held on entry. If @nonblocking != NULL and - * *@flags does not include FOLL_NOWAIT, the mmap_sem may be released. - * If it is, *@nonblocking will be set to 0 and -EBUSY returned. + * mmap_sem must be held on entry. If @locked != NULL and *@flags + * does not include FOLL_NOWAIT, the mmap_sem may be released. If it + * is, *@locked will be set to 0 and -EBUSY returned. */ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, - unsigned long address, unsigned int *flags, int *nonblocking) + unsigned long address, unsigned int *flags, int *locked) { unsigned int fault_flags = 0; vm_fault_t ret; @@ -526,7 +526,7 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_WRITE; if (*flags & FOLL_REMOTE) fault_flags |= FAULT_FLAG_REMOTE; - if (nonblocking) + if (locked) fault_flags |= FAULT_FLAG_ALLOW_RETRY; if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; @@ -552,8 +552,8 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, } if (ret & VM_FAULT_RETRY) { - if (nonblocking && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) - *nonblocking = 0; + if (locked && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) + *locked = 0; return -EBUSY; } @@ -630,7 +630,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) * only intends to ensure the pages are faulted in. * @vmas: array of pointers to vmas corresponding to each page. * Or NULL if the caller does not require them. - * @nonblocking: whether waiting for disk IO or mmap_sem contention + * @locked: whether we're still with the mmap_sem held * * Returns number of pages pinned. This may be fewer than the number * requested. If nr_pages is 0 or negative, returns 0. If no pages @@ -659,13 +659,11 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) * appropriate) must be called after the page is finished with, and * before put_page is called. * - * If @nonblocking != NULL, __get_user_pages will not wait for disk IO - * or mmap_sem contention, and if waiting is needed to pin all pages, - * *@nonblocking will be set to 0. Further, if @gup_flags does not - * include FOLL_NOWAIT, the mmap_sem will be released via up_read() in - * this case. + * If @locked != NULL, *@locked will be set to 0 when mmap_sem is + * released by an up_read(). That can happen if @gup_flags does not + * have FOLL_NOWAIT. * - * A caller using such a combination of @nonblocking and @gup_flags + * A caller using such a combination of @locked and @gup_flags * must therefore hold the mmap_sem for reading only, and recognize * when it's been released. Otherwise, it must be held for either * reading or writing and will not be released. @@ -677,7 +675,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, - struct vm_area_struct **vmas, int *nonblocking) + struct vm_area_struct **vmas, int *locked) { long ret = 0, i = 0; struct vm_area_struct *vma = NULL; @@ -721,7 +719,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, if (is_vm_hugetlb_page(vma)) { i = follow_hugetlb_page(mm, vma, pages, vmas, &start, &nr_pages, i, -
[PATCH v4 03/27] userfaultfd: don't retake mmap_sem to emulate NOPAGE
The idea comes from the upstream discussion between Linus and Andrea: https://lkml.org/lkml/2017/10/30/560 A summary to the issue: there was a special path in handle_userfault() in the past that we'll return a VM_FAULT_NOPAGE when we detected non-fatal signals when waiting for userfault handling. We did that by reacquiring the mmap_sem before returning. However that brings a risk in that the vmas might have changed when we retake the mmap_sem and even we could be holding an invalid vma structure. This patch removes the risk path in handle_userfault() then we will be sure that the callers of handle_mm_fault() will know that the VMAs might have changed. Meanwhile with previous patch we don't lose responsiveness as well since the core mm code now can handle the nonfatal userspace signals quickly even if we return VM_FAULT_RETRY. Suggested-by: Andrea Arcangeli Suggested-by: Linus Torvalds Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- fs/userfaultfd.c | 24 1 file changed, 24 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 89800fc7dc9d..b397bc3b954d 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -514,30 +514,6 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) __set_current_state(TASK_RUNNING); - if (return_to_userland) { - if (signal_pending(current) && - !fatal_signal_pending(current)) { - /* -* If we got a SIGSTOP or SIGCONT and this is -* a normal userland page fault, just let -* userland return so the signal will be -* handled and gdb debugging works. The page -* fault code immediately after we return from -* this function is going to release the -* mmap_sem and it's not depending on it -* (unlike gup would if we were not to return -* VM_FAULT_RETRY). -* -* If a fatal signal is pending we still take -* the streamlined VM_FAULT_RETRY failure path -* and there's no need to retake the mmap_sem -* in such case. -*/ - down_read(&mm->mmap_sem); - ret = VM_FAULT_NOPAGE; - } - } - /* * Here we race with the list_del; list_add in * userfaultfd_ctx_read(), however because we don't ever run -- 2.17.1
[PATCH v4 05/27] mm: gup: allow VM_FAULT_RETRY for multiple times
This is the gup counterpart of the change that allows the VM_FAULT_RETRY to happen for more than once. Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- mm/gup.c | 17 + mm/hugetlb.c | 6 -- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index a78d252d6358..46b1d1412364 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -531,7 +531,10 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (*flags & FOLL_TRIED) { - VM_WARN_ON_ONCE(fault_flags & FAULT_FLAG_ALLOW_RETRY); + /* +* Note: FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_TRIED +* can co-exist +*/ fault_flags |= FAULT_FLAG_TRIED; } @@ -946,17 +949,23 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, /* VM_FAULT_RETRY triggered, so seek to the faulting offset */ pages += ret; start += ret << PAGE_SHIFT; + lock_dropped = true; +retry: /* * Repeat on the address that fired VM_FAULT_RETRY -* without FAULT_FLAG_ALLOW_RETRY but with +* with both FAULT_FLAG_ALLOW_RETRY and * FAULT_FLAG_TRIED. */ *locked = 1; - lock_dropped = true; down_read(&mm->mmap_sem); ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED, - pages, NULL, NULL); + pages, NULL, locked); + if (!*locked) { + /* Continue to retry until we succeeded */ + BUG_ON(ret != 0); + goto retry; + } if (ret != 1) { BUG_ON(ret > 1); if (!pages_done) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e77b56141f0c..d14e2cc6f7c1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4268,8 +4268,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (flags & FOLL_TRIED) { - VM_WARN_ON_ONCE(fault_flags & - FAULT_FLAG_ALLOW_RETRY); + /* +* Note: FAULT_FLAG_ALLOW_RETRY and +* FAULT_FLAG_TRIED can co-exist +*/ fault_flags |= FAULT_FLAG_TRIED; } ret = hugetlb_fault(mm, vma, vaddr, fault_flags); -- 2.17.1
[PATCH v4 00/27] userfaultfd: write protection support
This series implements initial write protection support for userfaultfd. Currently both shmem and hugetlbfs are not supported yet, but only anonymous memory. This is the 4nd version of it. The latest code can also be found at: https://github.com/xzpeter/linux/tree/uffd-wp-merged v4 changelog: - add r-bs - use kernel-doc format for fault_flag_allow_retry_first [Jerome] - drop "export wp_page_copy", add new patch to split do_wp_page(), use it in change_pte_range() to replace the wp_page_copy(). [Jerome] (I thought about different ways to do this but I still can't find a 100% good way for all... in this version I still used the do_wp_page_cont naming. We can still discuss this and how we should split do_wp_page) - make sure uffd-wp will also apply to device private entries which HMM uses [Jerome] v3 changelog: - take r-bs - patch 1: fix typo [Jerome] - patch 2: use brackets where proper around (flags & VM_FAULT_RETRY) (there're three places to change, not four...) [Jerome] - patch 4: make sure TRIED is applied correctly on all archs, add more comment to explain the new page fault mechanism [Jerome] - patch 7: in do_swap_page() remove the two lines to remove FAULT_FLAG_WRITE flag [Jerome] - patch 10: another brackets change like above, and in mfill_atomic_pte return -EINVAL when detected wp_copy==1 upon shared memories [Jerome] - patch 12: move _PAGE_CHG_MASK change to patch 8 [Jerome] - patch 14: wp_page_copy() - fix write bit; change_pte_range() - detect PTE change after COW [Jerome] - patch 17: remove last paragraph of commit message, no need to drop the two lines in do_swap_page() since they've been directly dropped in patch 7; touch up remove_migration_pte() to only detect uffd-wp bit if it's read migration entry [Jerome] - add patch: "userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally", which remove _UFFDIO_WRITEPROTECT bit if detected non-anonymous memory during REGISTER; meanwhile fixup the test case for shmem too for expected ioctls returned from REGISTER [Mike] - add patch: "userfaultfd: wp: fixup swap entries in change_pte_range", the new patch will allow to apply the uffd-wp bits upon swap entries directly (e.g., when the page is during migration or the page was swapped out). Please see the patch for detail information. v2 changelog: - add some r-bs - split the patch "mm: userfault: return VM_FAULT_RETRY on signals" into two: one to focus on the signal behavior change, the other to remove the NOPAGE special path in handle_userfault(). Removing the ARC specific change and remove that part of commit message since it's fixed in 4d447455e73b already [Jerome] - return -ENOENT when VMA is invalid for UFFDIO_WRITEPROTECT to match UFFDIO_COPY errno [Mike] - add a new patch to introduce helper to find valid VMA for uffd [Mike] - check against VM_MAYWRITE instead of VM_WRITE when registering UFFD WP [Mike] - MM_CP_DIRTY_ACCT is used incorrectly, fix it up [Jerome] - make sure the lock_page behavior will not be changed [Jerome] - reorder the whole series, introduce the new ioctl last. [Jerome] - fix up the uffdio_writeprotect() following commit df2cc96e77011cf79 to return -EAGAIN when detected mm layout changes [Mike] v1 can be found at: https://lkml.org/lkml/2019/1/21/130 Any comment would be greatly welcomed. Thanks. Overview The uffd-wp work was initialized by Shaohua Li [1], and later continued by Andrea [2]. This series is based upon Andrea's latest userfaultfd tree, and it is a continuous works from both Shaohua and Andrea. Many of the follow up ideas come from Andrea too. Besides the old MISSING register mode of userfaultfd, the new uffd-wp support provides another alternative register mode called UFFDIO_REGISTER_MODE_WP that can be used to listen to not only missing page faults but also write protection page faults, or even they can be registered together. At the same time, the new feature also provides a new userfaultfd ioctl called UFFDIO_WRITEPROTECT which allows the userspace to write protect a range or memory or fixup write permission of faulted pages. Please refer to the document patch "userfaultfd: wp: UFFDIO_REGISTER_MODE_WP documentation update" for more information on the new interface and what it can do. The major workflow of an uffd-wp program should be: 1. Register a memory region with WP mode using UFFDIO_REGISTER_MODE_WP 2. Write protect part of the whole registered region using UFFDIO_WRITEPROTECT, passing in UFFDIO_WRITEPROTECT_MODE_WP to show that we want to write protect the range. 3. Start a working thread that modifies the protected pages, meanwhile listening to UFFD messages. 4. When a write is detected upon the protected range, page fault happens, a UFFD message will be generated and reported to the page fault handling thread 5. The page fault handler thread resolves the page fault using the new UFFDIO_WRITEPROTECT ioctl, b
Re: [PATCH 2/2] pinctrl: tegra: Add Tegra194 pinmux driver
On 4/26/2019 8:26 AM, Krishna Yarlagadda wrote: Tegra194 has PCIE L5 rst and clkreq pins which need to be controlled dynamically at runtime. This driver supports change pinmux for these pins. Pinmux for rest of the pins is set statically by bootloader and will not be changed by this driver Signed-off-by: Krishna Yarlagadda Signed-off-by: Suresh Mangipudi --- drivers/pinctrl/tegra/Kconfig| 4 + drivers/pinctrl/tegra/Makefile | 1 + drivers/pinctrl/tegra/pinctrl-tegra.c| 8 +- drivers/pinctrl/tegra/pinctrl-tegra.h| 8 +- drivers/pinctrl/tegra/pinctrl-tegra194.c | 175 +++ drivers/soc/tegra/Kconfig| 1 + 6 files changed, 189 insertions(+), 8 deletions(-) create mode 100644 drivers/pinctrl/tegra/pinctrl-tegra194.c diff --git a/drivers/pinctrl/tegra/Kconfig b/drivers/pinctrl/tegra/Kconfig index 24e20cc..6f79f1f 100644 --- a/drivers/pinctrl/tegra/Kconfig +++ b/drivers/pinctrl/tegra/Kconfig @@ -23,6 +23,10 @@ config PINCTRL_TEGRA210 bool select PINCTRL_TEGRA +config PINCTRL_TEGRA194 + bool + select PINCTRL_TEGRA + config PINCTRL_TEGRA_XUSB def_bool y if ARCH_TEGRA select GENERIC_PHY diff --git a/drivers/pinctrl/tegra/Makefile b/drivers/pinctrl/tegra/Makefile index bbcb043..ead4e10 100644 --- a/drivers/pinctrl/tegra/Makefile +++ b/drivers/pinctrl/tegra/Makefile @@ -5,4 +5,5 @@ obj-$(CONFIG_PINCTRL_TEGRA30) += pinctrl-tegra30.o obj-$(CONFIG_PINCTRL_TEGRA114)+= pinctrl-tegra114.o obj-$(CONFIG_PINCTRL_TEGRA124)+= pinctrl-tegra124.o obj-$(CONFIG_PINCTRL_TEGRA210)+= pinctrl-tegra210.o +obj-$(CONFIG_PINCTRL_TEGRA194) += pinctrl-tegra194.o obj-$(CONFIG_PINCTRL_TEGRA_XUSB) += pinctrl-tegra-xusb.o diff --git a/drivers/pinctrl/tegra/pinctrl-tegra.c b/drivers/pinctrl/tegra/pinctrl-tegra.c index a5008c0..76e88c4 100644 --- a/drivers/pinctrl/tegra/pinctrl-tegra.c +++ b/drivers/pinctrl/tegra/pinctrl-tegra.c @@ -292,7 +292,7 @@ static int tegra_pinconf_reg(struct tegra_pmx *pmx, const struct tegra_pingroup *g, enum tegra_pinconf_param param, bool report_err, -s8 *bank, s16 *reg, s8 *bit, s8 *width) +s8 *bank, s32 *reg, s8 *bit, s8 *width) { switch (param) { case TEGRA_PINCONF_PARAM_PULL: @@ -451,7 +451,7 @@ static int tegra_pinconf_group_get(struct pinctrl_dev *pctldev, const struct tegra_pingroup *g; int ret; s8 bank, bit, width; - s16 reg; + s32 reg; u32 val, mask; g = &pmx->soc->groups[group]; @@ -480,7 +480,7 @@ static int tegra_pinconf_group_set(struct pinctrl_dev *pctldev, const struct tegra_pingroup *g; int ret, i; s8 bank, bit, width; - s16 reg; + s32 reg; u32 val, mask; g = &pmx->soc->groups[group]; @@ -548,7 +548,7 @@ static void tegra_pinconf_group_dbg_show(struct pinctrl_dev *pctldev, const struct tegra_pingroup *g; int i, ret; s8 bank, bit, width; - s16 reg; + s32 reg; u32 val; g = &pmx->soc->groups[group]; diff --git a/drivers/pinctrl/tegra/pinctrl-tegra.h b/drivers/pinctrl/tegra/pinctrl-tegra.h index 44c7194..82cd947 100644 --- a/drivers/pinctrl/tegra/pinctrl-tegra.h +++ b/drivers/pinctrl/tegra/pinctrl-tegra.h @@ -143,10 +143,10 @@ struct tegra_pingroup { const unsigned *pins; u8 npins; u8 funcs[4]; - s16 mux_reg; - s16 pupd_reg; - s16 tri_reg; - s16 drv_reg; + s32 mux_reg; + s32 pupd_reg; + s32 tri_reg; + s32 drv_reg; u32 mux_bank:2; u32 pupd_bank:2; u32 tri_bank:2; diff --git a/drivers/pinctrl/tegra/pinctrl-tegra194.c b/drivers/pinctrl/tegra/pinctrl-tegra194.c new file mode 100644 index 000..9172a8c --- /dev/null +++ b/drivers/pinctrl/tegra/pinctrl-tegra194.c @@ -0,0 +1,175 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Pinctrl data for the NVIDIA Tegra210 pinmux + * + * Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include +#include +#include +#include +#include + +#include "pinctrl-tegra.h" + +#define _GPIO(offset) (offset) +#define NUM_GPIOS (TEGRA_PIN_PEX_L5_RST_N_PGG1 + 1) + +/* Define unique ID for each pins */ +enum pin_id { + TEGRA_PIN_PEX_L5
Re: [PATCH] tty: Don't force RISCV SBI console as preferred console
On 4/25/19 6:35 AM, Anup Patel wrote: The Linux kernel will auto-disables all boot consoles whenever it gets a preferred real console. Currently on RISC-V systems, if we have a real console which is not RISCV SBI console then boot consoles (such as earlycon=sbi) are not auto-disabled when a real console (ttyS0 or ttySIF0) is available. This results in duplicate prints at boot-time after kernel starts using real console (i.e. ttyS0 or ttySIF0) if "earlycon=" kernel parameter was passed by bootloader. The reason for above issue is that RISCV SBI console always adds itself as preferred console which is causing other real consoles to be not used as preferred console. Do we even need HVC_SBI console to be enabled by default? Disabling CONFIG_HVC_RISCV_SBI seems to be fine while running in QEMU. If we don't need it, I suggest we should remove the config option from defconfig in addition to this patch. Regards, Atish Ideally "console=" kernel parameter passed by bootloaders should be the one selecting a preferred real console. This patch fixes above issue by not forcing RISCV SBI console as preferred console. Fixes: afa6b1ccfad5 ("tty: New RISC-V SBI console driver") Cc: sta...@vger.kernel.org Signed-off-by: Anup Patel --- drivers/tty/hvc/hvc_riscv_sbi.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/tty/hvc/hvc_riscv_sbi.c b/drivers/tty/hvc/hvc_riscv_sbi.c index 75155bde2b88..31f53fa77e4a 100644 --- a/drivers/tty/hvc/hvc_riscv_sbi.c +++ b/drivers/tty/hvc/hvc_riscv_sbi.c @@ -53,7 +53,6 @@ device_initcall(hvc_sbi_init); static int __init hvc_sbi_console_init(void) { hvc_instantiate(0, 0, &hvc_sbi_ops); - add_preferred_console("hvc", 0, NULL); return 0; }
Re: [PATCH] kernel/sched: run nohz idle load balancer on HK_FLAG_MISC CPUs
Peter Zijlstra's on April 25, 2019 9:56 pm: > On Fri, Apr 12, 2019 at 02:26:13PM +1000, Nicholas Piggin wrote: >> The nohz idle balancer runs on the lowest idle CPU. This can >> interfere with isolated CPUs, so confine it to HK_FLAG_MISC >> housekeeping CPUs. >> >> HK_FLAG_SCHED is not used for this because it is not set anywhere >> at the moment. This could be folded into HK_FLAG_SCHED once that >> option is fixed. > > Frederic? Anyway, I thnk I'll take this patch as is. That would be great, thanks. We've been testing it in a staging environment (this is where they noticed the noise in the first place), and results have been as expected: I've been able to test Nick's idle-loop load balancer (ILB) patch, with and without the TEO cpuidle governor. With the ILB patch (and nohz_full) I get a very quiet noise profile with either cpuidle governor (menu or teo). For my tests, I don't see a meaningful difference between the two governors. [...] Bottom line: Nick's patch that constrains the ILB to run on non-nohz cores has a noticeable noise-reduction effect. For this type of workload, the choice of cpuidle governor, menu or teo, is immaterial. This is against a slightly backported RHEL kernel they are using, but no significant differences from upstream in these areas. Thanks, Nick
Re: [PATCH] staging: most: protect potential string overflow
On Wed, Apr 24, 2019 at 10:55 PM Dan Carpenter wrote: > > On Mon, Apr 22, 2019 at 10:20:18PM -0400, Bo YU wrote: > > There maybe cause potential string overflow issue due to use > > strcpy without checking the length > > > > Detected By CoversityScan CID# 1444760 > > > > Fixes: 131ac62253dba:(staging: most: core: use device description as name) > > It doesn't really fix anything, it just silences a static checker > warning. > > > Signed-off-by: Bo YU > > --- > > drivers/staging/most/core.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/staging/most/core.c b/drivers/staging/most/core.c > > index 956daf8c3bd2..0f26cebac91a 100644 > > --- a/drivers/staging/most/core.c > > +++ b/drivers/staging/most/core.c > > @@ -1431,7 +1431,7 @@ int most_register_interface(struct most_interface > > *iface) > > > > INIT_LIST_HEAD(&iface->p->channel_list); > > iface->p->dev_id = id; > > - strcpy(iface->p->name, iface->description); > > + strlcpy(iface->p->name, iface->description, sizeof(iface->p->name)); > > We prefer strscpy() more than strlcpy() these days. Ok,will try it. Thanks, > > regards, > dan carpenter >
Re: [PATCH v2 4/9] powerpc/powernv/npu: use helper pci_dev_id
On 25/04/2019 05:14, Heiner Kallweit wrote: > Use new helper pci_dev_id() to simplify the code. > > Signed-off-by: Heiner Kallweit Reviewed-by: Alexey Kardashevskiy > --- > arch/powerpc/platforms/powernv/npu-dma.c | 14 ++ > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c > b/arch/powerpc/platforms/powernv/npu-dma.c > index dc23d9d2a..495550432 100644 > --- a/arch/powerpc/platforms/powernv/npu-dma.c > +++ b/arch/powerpc/platforms/powernv/npu-dma.c > @@ -1213,9 +1213,8 @@ int pnv_npu2_map_lpar_dev(struct pci_dev *gpdev, > unsigned int lparid, >* Currently we only support radix and non-zero LPCR only makes sense >* for hash tables so skiboot expects the LPCR parameter to be a zero. >*/ > - ret = opal_npu_map_lpar(nphb->opal_id, > - PCI_DEVID(gpdev->bus->number, gpdev->devfn), lparid, > - 0 /* LPCR bits */); > + ret = opal_npu_map_lpar(nphb->opal_id, pci_dev_id(gpdev), lparid, > + 0 /* LPCR bits */); > if (ret) { > dev_err(&gpdev->dev, "Error %d mapping device to LPAR\n", ret); > return ret; > @@ -1224,7 +1223,7 @@ int pnv_npu2_map_lpar_dev(struct pci_dev *gpdev, > unsigned int lparid, > dev_dbg(&gpdev->dev, "init context opalid=%llu msr=%lx\n", > nphb->opal_id, msr); > ret = opal_npu_init_context(nphb->opal_id, 0/*__unused*/, msr, > - PCI_DEVID(gpdev->bus->number, gpdev->devfn)); > + pci_dev_id(gpdev)); > if (ret < 0) > dev_err(&gpdev->dev, "Failed to init context: %d\n", ret); > else > @@ -1258,7 +1257,7 @@ int pnv_npu2_unmap_lpar_dev(struct pci_dev *gpdev) > dev_dbg(&gpdev->dev, "destroy context opalid=%llu\n", > nphb->opal_id); > ret = opal_npu_destroy_context(nphb->opal_id, 0/*__unused*/, > - PCI_DEVID(gpdev->bus->number, gpdev->devfn)); > +pci_dev_id(gpdev)); > if (ret < 0) { > dev_err(&gpdev->dev, "Failed to destroy context: %d\n", ret); > return ret; > @@ -1266,9 +1265,8 @@ int pnv_npu2_unmap_lpar_dev(struct pci_dev *gpdev) > > /* Set LPID to 0 anyway, just to be safe */ > dev_dbg(&gpdev->dev, "Map LPAR opalid=%llu lparid=0\n", nphb->opal_id); > - ret = opal_npu_map_lpar(nphb->opal_id, > - PCI_DEVID(gpdev->bus->number, gpdev->devfn), 0 /*LPID*/, > - 0 /* LPCR bits */); > + ret = opal_npu_map_lpar(nphb->opal_id, pci_dev_id(gpdev), 0 /*LPID*/, > + 0 /* LPCR bits */); > if (ret) > dev_err(&gpdev->dev, "Error %d mapping device to LPAR\n", ret); > > -- Alexey
Re: [PATCH 1/2] clk: imx7ulp: update nic1_bus_clk parent info
On Thu, Apr 25, 2019 at 05:03:31PM -0700, Stephen Boyd wrote: > Quoting Anson Huang (2019-04-24 22:19:07) > > Since i.MX7ULP B0 chip, nic1_bus_clk's parent is changed to > > from nic0_clk directly, update it accordingly. > > > > Signed-off-by: Anson Huang > > Looks ok. Shawn, will you pick it up? Stephen, I prefer you directly pick up any i.MX clock patches that look good, after I already send you PR. I will start again for next cycle around -rc1. Shawn
Re: [PATCH v2 03/12] arm64: dts: tegra210: set thermtrip
Hi Thierry, Eduardo have picked this series to his branch except dts patches. Please check "git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal.git" in the linus branch. They will be merged in the next major kernel release. Could you please take these three dts changes? Here is the list: [PATCH v2 03/12] arm64: dts: tegra210: set thermtrip [PATCH v2 06/12] arm64: dts: tegra210: set gpu hw throttle level [PATCH v2 10/12] arm64: dts: tegra210: set EDP interrupt line Thanks. Wei. On 21/2/2019 6:18 PM, Wei Ni wrote: > Set "nvidia,thermtrips" property, it used to set > HW shutdown temperatures. > > Signed-off-by: Wei Ni > --- > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 15 +-- > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi > b/arch/arm64/boot/dts/nvidia/tegra210.dtsi > index 6574396d2257..582d56820bbb 100644 > --- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi > +++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi > @@ -1410,6 +1410,9 @@ > reset-names = "soctherm"; > #thermal-sensor-cells = <1>; > > + nvidia,thermtrips = + TEGRA124_SOCTHERM_SENSOR_GPU 103000>; > + > throttle-cfgs { > throttle_heavy: heavy { > nvidia,priority = <100>; > @@ -1429,8 +1432,8 @@ > <&soctherm TEGRA124_SOCTHERM_SENSOR_CPU>; > > trips { > - cpu-shutdown-trip { > - temperature = <102500>; > + cpu-critical-trip { > + temperature = <102000>; > hysteresis = <0>; > type = "critical"; > }; > @@ -1457,7 +1460,7 @@ > <&soctherm TEGRA124_SOCTHERM_SENSOR_MEM>; > > trips { > - mem-shutdown-trip { > + mem-critical-trip { > temperature = <103000>; > hysteresis = <0>; > type = "critical"; > @@ -1479,8 +1482,8 @@ > <&soctherm TEGRA124_SOCTHERM_SENSOR_GPU>; > > trips { > - gpu-shutdown-trip { > - temperature = <103000>; > + gpu-critical-trip { > + temperature = <102500>; > hysteresis = <0>; > type = "critical"; > }; > @@ -1507,7 +1510,7 @@ > <&soctherm TEGRA124_SOCTHERM_SENSOR_PLLX>; > > trips { > - pllx-shutdown-trip { > + pllx-critical-trip { > temperature = <103000>; > hysteresis = <0>; > type = "critical"; >
Re: [RFC PATCH v5 4/4] x86/acrn: Add hypercall for ACRN guest
On 2019年04月25日 19:00, Borislav Petkov wrote: On Thu, Apr 25, 2019 at 06:16:02PM +0800, Zhao, Yakui wrote: The parameter register for the VMCALL is predefined in ACRN hypervisor. Now the R8 is used to pass the hcall_id. It seems that there is no special constraint for R8~R15. So the explicit register variable is used so that the R8 can be passed. If you're going to use the constraint "D" for param1, you can just as well do "=a" (result) everywhere since you have the letter constraint for %rax instead of declaring it with "register". Also, you can completely get rid of those "register" declarations and let gcc have all the freedom to pass in hcall_id and the other parameters: Thanks Borislav for providing the code. It seems that it is seldom used in kernel although the explicit register variable is supported by GCC and makes the code look simpler. And it seems that the explicit register variable is not suppoorted by CLAG. So the explicit register variable will be removed. I will follow the asm code from Borislav. Of course one minor change is that the "movq" is used instead of "mov". Is this ok? Thanks unsigned long result; asm volatile("mov %[hcall_id], %%r8\n\t" "vmcall\n\t" : "=a" (result) : [hcall_id] "g" (hcall_id) : "r8"); return result; and %r8 will be in the clobber list so gcc will reload it if needed. gcc turns it into 1040 : 1040: 4c 8b 05 e1 2f 00 00mov0x2fe1(%rip),%r8# 4028 1047: 0f 01 c1vmcall 104a: c3 retq 104b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) here.
[PATCH] ASoC: fsl_sai: Add missing return 0 in remove()
Build warning being reported: sound/soc/fsl/fsl_sai.c: In function 'fsl_sai_remove': sound/soc/fsl/fsl_sai.c:921:1: warning: no return statement in function returning non-void [-Wreturn-type] So this patch just adds a "return 0" to fix it. Fixes: 812ad463e089 ("ASoC: fsl_sai: Add support for runtime pm") Reported-by: Stephen Rothwell Signed-off-by: Nicolin Chen --- sound/soc/fsl/fsl_sai.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c index 26c27dc..8593269 100644 --- a/sound/soc/fsl/fsl_sai.c +++ b/sound/soc/fsl/fsl_sai.c @@ -918,6 +918,8 @@ static int fsl_sai_probe(struct platform_device *pdev) static int fsl_sai_remove(struct platform_device *pdev) { pm_runtime_disable(&pdev->dev); + + return 0; } static const struct of_device_id fsl_sai_ids[] = { -- 2.7.4
[PATCH v2] KVM: x86: Add Intel CPUID.1F cpuid emulation support
Some new systems have multiple software-visible die within each package. Add support to expose Intel V2 Extended Topology Enumeration Leaf CPUID.1F. Co-developed-by: Xiaoyao Li Signed-off-by: Xiaoyao Li Signed-off-by: Like Xu --- ==changelog== v2: - Apply cpuid.1f check rule on Intel SDM page 3-222 Vol.2A - Add comment to handle 0x1f anf 0xb in common code - Reduce check time in a descending-break style v1: https://lkml.org/lkml/2019/4/22/28 arch/x86/kvm/cpuid.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index fd39516..f9b529e 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -425,6 +425,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, switch (function) { case 0: + /* Check if the cpuid leaf 0x1f is actually implemented */ + if (entry->eax >= 0x1f && (cpuid_ebx(0x1f) & 0x)) { + entry->eax = 0x1f; + break; + } entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd)); break; case 1: @@ -544,7 +549,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, entry->edx = edx.full; break; } - /* function 0xb has additional index. */ + /* +* Intel documentation states that 0x1f and 0xb have +* identical formats and thus can be handled by common code. +* (Intel SDM Vol. 2A - Instruction Set Reference - CPUID) +*/ + case 0x1f: case 0xb: { int i, level_type; -- 1.8.3.1
[PATCH v3] sound: isa: gus: fix misuse of %x
Pointers should be printed with %p or %px rather than cast to long type and printed with %lx. Drop the address printing. Signed-off-by: Fuqian Huang --- sound/isa/gus/gus_mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/isa/gus/gus_mem.c b/sound/isa/gus/gus_mem.c index 4ac76f46dd76..d708ae1525e4 100644 --- a/sound/isa/gus/gus_mem.c +++ b/sound/isa/gus/gus_mem.c @@ -306,7 +306,7 @@ static void snd_gf1_mem_info_read(struct snd_info_entry *entry, used = 0; for (block = alloc->first, i = 0; block; block = block->next, i++) { used += block->size; - snd_iprintf(buffer, "Block %i at 0x%lx onboard 0x%x size %i (0x%x):\n", i, (long) block, block->ptr, block->size, block->size); + snd_iprintf(buffer, "Block %i onboard 0x%x size %i (0x%x):\n", i, block->ptr, block->size, block->size); if (block->share || block->share_id[0] || block->share_id[1] || block->share_id[2] || block->share_id[3]) -- 2.11.0
Re: linux-next: build warning after merge of the sound-asoc tree
On Fri, Apr 26, 2019 at 01:05:49PM +1000, Stephen Rothwell wrote: > Hi all, > > After merging the sound-asoc tree, today's linux-next build (arm > multi_v7_defconfig) produced this warning: > > sound/soc/fsl/fsl_sai.c: In function 'fsl_sai_remove': > sound/soc/fsl/fsl_sai.c:921:1: warning: no return statement in function > returning non-void [-Wreturn-type] > } > ^ > > Introduced by commit > > 812ad463e089 ("ASoC: fsl_sai: Add support for runtime pm") Thanks. I am submitting a fix.
linux-next: build warning after merge of the sound-asoc tree
Hi all, After merging the sound-asoc tree, today's linux-next build (arm multi_v7_defconfig) produced this warning: sound/soc/fsl/fsl_sai.c: In function 'fsl_sai_remove': sound/soc/fsl/fsl_sai.c:921:1: warning: no return statement in function returning non-void [-Wreturn-type] } ^ Introduced by commit 812ad463e089 ("ASoC: fsl_sai: Add support for runtime pm") -- Cheers, Stephen Rothwell pgpeZVEZzWGVo.pgp Description: OpenPGP digital signature
Zdravstvuyte! Vas interesuyut kliyentskiye bazy dannykh?
Zdravstvuyte! Vas interesuyut kliyentskiye bazy dannykh?
Re: Re: Re: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
> -原始邮件- > 发件人: "Peter Zijlstra" > 发送时间: 2019-04-25 21:31:05 (星期四) > 收件人: huang...@loongson.cn > 抄送: "Paul Burton" , "st...@rowland.harvard.edu" > , "aki...@gmail.com" , > "andrea.pa...@amarulasolutions.com" , > "boqun.f...@gmail.com" , "dlus...@nvidia.com" > , "dhowe...@redhat.com" , > "j.algl...@ucl.ac.uk" , "luc.maran...@inria.fr" > , "npig...@gmail.com" , > "paul...@linux.ibm.com" , "will.dea...@arm.com" > , "linux-kernel@vger.kernel.org" > , "torva...@linux-foundation.org" > , "Huacai Chen" > 主题: Re: Re: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() > wreckage > > On Thu, Apr 25, 2019 at 08:51:17PM +0800, huang...@loongson.cn wrote: > > > > So basically the initial value of @v is set to 1. > > > > > > Then CPU-1 does atomic_add_unless(v, 1, 0) > > > CPU-2 does atomic_set(v, 0) > > > > > > If CPU1 goes first, it will see 1, which is not 0 and thus add 1 to 1 > > > and obtains 2. Then CPU2 goes and writes 0, so the exist clause sees > > > v==0 and doesn't observe 2. > > > > > > The other way around, CPU-2 goes first, writes a 0, then CPU-1 goes and > > > observes the 0, finds it matches 0 and doesn't add. Again, the exist > > > clause will find 0 doesn't match 2. > > > > > > This all goes unstuck if interleaved like: > > > > > > > > > CPU-1 CPU-2 > > > > > > xor t0, t0 > > > 1:ll t0, v > > > bez t0, 2f > > > sw t0, v > > > add t0, t1 > > > sc t0, v > > > beqz t0, 1b > > > > > > (sorry if I got the MIPS asm wrong; it's not something I normally write) > > > > > > And the store-word from CPU-2 doesn't make the SC from CPU-1 fail. > > > > > > > loongson's llsc bug DOES NOT fail this litmus( we will not get V=2); > > > > only speculative memory access from CPU-1 can "blind" CPU-1(here blind > > means do ll/sc > > wrong), this speculative memory access can be observed corrently by CPU2. > > In this > > case, sw from CPU-2 can get I , which can be observed by CPU-1, and clear > > llbit,then > > failed sc. > > I'm not following, suppose CPU-1 happens as a speculation (imagine > whatever code is required to make that happen before). CPU-2 sw will > cause I on CPU-1's ll but, as in the previous email, CPU-1 will continue > as if it still has E and complete the SC. > > That is; I'm just not seeing why this case would be different from two > competing LL/SCs. > I get your point. I kept my eye on the sw from CPU-2, but forgot the speculative mem access from CPU-1. There is no difference bewteen this one and the former case. = V = 1 CPU-1 CPU-2 xor t0, t0 1: ll t0, V beqz t0, 2f /* if speculative mem access kick cacheline of V out, it can blind CPU-1 and make CPU-1 believe it still hold E on V, and can NOT see the sw from CPU-2 actually invalid V, which should clear LLBit of CPU-1, but not */ sw t0, V // just after sw, V = 0 addiu t0, t0, 1 sc t0, V /* oops, sc write t0(2) into V with LLBit */ /* get V=2 */ beqz t0, 1b nop 2: if speculative mem access *does not* kick out cache line of V, CPU-1 can see sw from CPU-2, and clear LLBit, which cause sc fail and retry, That's OK 北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 (10) 62546668传真: +86 (10) 62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部 分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 This email and its attachments contain confidential information from Loongson Technology Corporation Limited, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it.
[PATCH 1/2] dt-binding: Tegra194 pinctrl support
Add new compatible string and other fields used in pinctrl driver for Tegra194 in nvidia,tegra210-pinmux.txt Signed-off-by: Krishna Yarlagadda --- .../bindings/pinctrl/nvidia,tegra210-pinmux.txt| 43 +++--- 1 file changed, 38 insertions(+), 5 deletions(-) diff --git a/Documentation/devicetree/bindings/pinctrl/nvidia,tegra210-pinmux.txt b/Documentation/devicetree/bindings/pinctrl/nvidia,tegra210-pinmux.txt index 85f2114..c4e802d 100644 --- a/Documentation/devicetree/bindings/pinctrl/nvidia,tegra210-pinmux.txt +++ b/Documentation/devicetree/bindings/pinctrl/nvidia,tegra210-pinmux.txt @@ -1,7 +1,7 @@ -NVIDIA Tegra210 pinmux controller +NVIDIA Tegra210/194 pinmux controller Required properties: -- compatible: "nvidia,tegra210-pinmux" +- compatible: "nvidia,tegra210-pinmux" or "nvidia,tegra194-pinmux" - reg: Should contain a list of base address and size pairs for: - first entry: The APB_MISC_GP_*_PADCTRL registers (pad control) - second entry: The PINMUX_AUX_* registers (pinmux) @@ -83,6 +83,10 @@ Valid values for pin and group names (nvidia,pin) are: These correspond to Tegra PINMUX_AUX_* (pinmux) registers. Any property that exists in those registers may be set for the following pin names. + Tegra194: +pex_l5_clkreq_n_pgg0, pex_l5_rst_n_pgg1 + + Tegra210: In Tegra210, many pins also have a dedicated APB_MISC_GP_*_PADCTRL register. Where that is true, and property that exists in that register may also be set on the following pin names. @@ -127,12 +131,15 @@ Valid values for pin and group names (nvidia,pin) are: registers. Note that where one of these registers controls a single pin for which a PINMUX_AUX_* exists, see the list above for the pin name to use when configuring the pinmux. - + Tegra210: pa6, pcc7, pe6, pe7, ph6, pk0, pk1, pk2, pk3, pk4, pk5, pk6, pk7, pl0, pl1, pz0, pz1, pz2, pz3, pz4, pz5, sdmmc1, sdmmc2, sdmmc3, sdmmc4 + Tegra194: +pex_l5_clkreq_n_pgg0, pex_l5_rst_n_pgg1 Valid values for nvidia,functions are: + Tegra210: aud, bcl, blink, ccla, cec, cldvfs, clk, core, cpu, displaya, displayb, dmic1, dmic2, dmic3, dp, dtv, extperiph3, i2c1, i2c2, i2c3, i2cpmu, i2cvi, i2s1, i2s2, i2s3, i2s4a, i2s4b, i2s5a, i2s5b, iqc0, iqc1, jtag, pe, pe0, @@ -140,9 +147,12 @@ Valid values for nvidia,functions are: sdmmc1, sdmmc3, shutdown, soc, sor0, sor1, spdif, spi1, spi2, spi3, spi4, sys, touch, uart, uarta, uartb, uartc, uartd, usb, vgp1, vgp2, vgp3, vgp4, vgp5, vgp6, vimclk, vimclk2 + Tegra194: +pe5 -Example: +Examples: + Tegra210: pinmux: pinmux@7800 { compatible = "nvidia,tegra210-pinmux"; reg = <0x0 0x78d4 0x0 0x2a8>, /* Pad control registers */ @@ -163,4 +173,27 @@ Example: }; }; }; -}; + + Tegra194: + tegra_pinctrl: pinmux: pinmux@243 { + compatible = "nvidia,tegra194-pinmux"; + reg = <0x243 0x17000 + 0xc30 0x4000>; + #gpio-range-cells = <2>; + pex_rst_c5_out_state: pex_rst_c5_out { + pex_rst { + nvidia,pins = "pex_l5_rst_n_pgg1"; + nvidia,schmitt = ; + nvidia,lpdr = ; + nvidia,enable-input = ; + nvidia,io-high-voltage = ; + nvidia,tristate = ; + nvidia,pull = ; + }; + }; + }; + pinmuxtest@0 { + compatible = "nvidia,tegra194-pinmux-test"; + pinctrl-names = "pex_rst"; + pinctrl-0 = <&pex_rst_c5_out_state>; + }; -- 2.7.4
[PATCH 2/2] pinctrl: tegra: Add Tegra194 pinmux driver
Tegra194 has PCIE L5 rst and clkreq pins which need to be controlled dynamically at runtime. This driver supports change pinmux for these pins. Pinmux for rest of the pins is set statically by bootloader and will not be changed by this driver Signed-off-by: Krishna Yarlagadda Signed-off-by: Suresh Mangipudi --- drivers/pinctrl/tegra/Kconfig| 4 + drivers/pinctrl/tegra/Makefile | 1 + drivers/pinctrl/tegra/pinctrl-tegra.c| 8 +- drivers/pinctrl/tegra/pinctrl-tegra.h| 8 +- drivers/pinctrl/tegra/pinctrl-tegra194.c | 175 +++ drivers/soc/tegra/Kconfig| 1 + 6 files changed, 189 insertions(+), 8 deletions(-) create mode 100644 drivers/pinctrl/tegra/pinctrl-tegra194.c diff --git a/drivers/pinctrl/tegra/Kconfig b/drivers/pinctrl/tegra/Kconfig index 24e20cc..6f79f1f 100644 --- a/drivers/pinctrl/tegra/Kconfig +++ b/drivers/pinctrl/tegra/Kconfig @@ -23,6 +23,10 @@ config PINCTRL_TEGRA210 bool select PINCTRL_TEGRA +config PINCTRL_TEGRA194 + bool + select PINCTRL_TEGRA + config PINCTRL_TEGRA_XUSB def_bool y if ARCH_TEGRA select GENERIC_PHY diff --git a/drivers/pinctrl/tegra/Makefile b/drivers/pinctrl/tegra/Makefile index bbcb043..ead4e10 100644 --- a/drivers/pinctrl/tegra/Makefile +++ b/drivers/pinctrl/tegra/Makefile @@ -5,4 +5,5 @@ obj-$(CONFIG_PINCTRL_TEGRA30) += pinctrl-tegra30.o obj-$(CONFIG_PINCTRL_TEGRA114) += pinctrl-tegra114.o obj-$(CONFIG_PINCTRL_TEGRA124) += pinctrl-tegra124.o obj-$(CONFIG_PINCTRL_TEGRA210) += pinctrl-tegra210.o +obj-$(CONFIG_PINCTRL_TEGRA194) += pinctrl-tegra194.o obj-$(CONFIG_PINCTRL_TEGRA_XUSB) += pinctrl-tegra-xusb.o diff --git a/drivers/pinctrl/tegra/pinctrl-tegra.c b/drivers/pinctrl/tegra/pinctrl-tegra.c index a5008c0..76e88c4 100644 --- a/drivers/pinctrl/tegra/pinctrl-tegra.c +++ b/drivers/pinctrl/tegra/pinctrl-tegra.c @@ -292,7 +292,7 @@ static int tegra_pinconf_reg(struct tegra_pmx *pmx, const struct tegra_pingroup *g, enum tegra_pinconf_param param, bool report_err, -s8 *bank, s16 *reg, s8 *bit, s8 *width) +s8 *bank, s32 *reg, s8 *bit, s8 *width) { switch (param) { case TEGRA_PINCONF_PARAM_PULL: @@ -451,7 +451,7 @@ static int tegra_pinconf_group_get(struct pinctrl_dev *pctldev, const struct tegra_pingroup *g; int ret; s8 bank, bit, width; - s16 reg; + s32 reg; u32 val, mask; g = &pmx->soc->groups[group]; @@ -480,7 +480,7 @@ static int tegra_pinconf_group_set(struct pinctrl_dev *pctldev, const struct tegra_pingroup *g; int ret, i; s8 bank, bit, width; - s16 reg; + s32 reg; u32 val, mask; g = &pmx->soc->groups[group]; @@ -548,7 +548,7 @@ static void tegra_pinconf_group_dbg_show(struct pinctrl_dev *pctldev, const struct tegra_pingroup *g; int i, ret; s8 bank, bit, width; - s16 reg; + s32 reg; u32 val; g = &pmx->soc->groups[group]; diff --git a/drivers/pinctrl/tegra/pinctrl-tegra.h b/drivers/pinctrl/tegra/pinctrl-tegra.h index 44c7194..82cd947 100644 --- a/drivers/pinctrl/tegra/pinctrl-tegra.h +++ b/drivers/pinctrl/tegra/pinctrl-tegra.h @@ -143,10 +143,10 @@ struct tegra_pingroup { const unsigned *pins; u8 npins; u8 funcs[4]; - s16 mux_reg; - s16 pupd_reg; - s16 tri_reg; - s16 drv_reg; + s32 mux_reg; + s32 pupd_reg; + s32 tri_reg; + s32 drv_reg; u32 mux_bank:2; u32 pupd_bank:2; u32 tri_bank:2; diff --git a/drivers/pinctrl/tegra/pinctrl-tegra194.c b/drivers/pinctrl/tegra/pinctrl-tegra194.c new file mode 100644 index 000..9172a8c --- /dev/null +++ b/drivers/pinctrl/tegra/pinctrl-tegra194.c @@ -0,0 +1,175 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Pinctrl data for the NVIDIA Tegra210 pinmux + * + * Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include +#include +#include +#include +#include + +#include "pinctrl-tegra.h" + +#define _GPIO(offset) (offset) +#define NUM_GPIOS (TEGRA_PIN_PEX_L5_RST_N_PGG1 + 1) + +/* Define unique ID for each pins */ +enum pin_id { + TEGRA_PIN_PEX_L5_CLKREQ_N_PGG0 = _GPIO(256), + TEGRA_PIN_PEX_L5_RST_N_PGG1 = _GPIO(257)
Re: [PATCH] bcache: avoid clang -Wunintialized warning
On 2019/4/26 2:08 上午, Nathan Chancellor wrote: > On Fri, Mar 22, 2019 at 03:35:00PM +0100, Arnd Bergmann wrote: >> clang has identified a code path in which it thinks a >> variable may be unused: >> >> drivers/md/bcache/alloc.c:333:4: error: variable 'bucket' is used >> uninitialized whenever 'if' condition is false >> [-Werror,-Wsometimes-uninitialized] >> fifo_pop(&ca->free_inc, bucket); >> ^~~ >> drivers/md/bcache/util.h:219:27: note: expanded from macro 'fifo_pop' >> #define fifo_pop(fifo, i) fifo_pop_front(fifo, (i)) >> ^ >> drivers/md/bcache/util.h:189:6: note: expanded from macro 'fifo_pop_front' >> if (_r) { \ >> ^~ >> drivers/md/bcache/alloc.c:343:46: note: uninitialized use occurs here >> allocator_wait(ca, bch_allocator_push(ca, bucket)); >> ^~ >> drivers/md/bcache/alloc.c:287:7: note: expanded from macro 'allocator_wait' >> if (cond) \ >> ^~~~ >> drivers/md/bcache/alloc.c:333:4: note: remove the 'if' if its condition is >> always true >> fifo_pop(&ca->free_inc, bucket); >> ^ >> drivers/md/bcache/util.h:219:27: note: expanded from macro 'fifo_pop' >> #define fifo_pop(fifo, i) fifo_pop_front(fifo, (i)) >> ^ >> drivers/md/bcache/util.h:189:2: note: expanded from macro 'fifo_pop_front' >> if (_r) { \ >> ^ >> drivers/md/bcache/alloc.c:331:15: note: initialize the variable 'bucket' to >> silence this warning >> long bucket; >>^ >> >> This cannot happen in practice because we only enter the loop >> if there is at least one element in the list. >> >> Slightly rearranging the code makes this clearer to both the >> reader and the compiler, which avoids the warning. >> >> Signed-off-by: Arnd Bergmann >> --- >> drivers/md/bcache/alloc.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c >> index 5002838ea476..f8986effcb50 100644 >> --- a/drivers/md/bcache/alloc.c >> +++ b/drivers/md/bcache/alloc.c >> @@ -327,10 +327,11 @@ static int bch_allocator_thread(void *arg) >> * possibly issue discards to them, then we add the bucket to >> * the free list: >> */ >> -while (!fifo_empty(&ca->free_inc)) { >> +while (1) { >> long bucket; >> >> -fifo_pop(&ca->free_inc, bucket); >> +if (!fifo_pop(&ca->free_inc, bucket)) >> +break; >> >> if (ca->discard) { >> mutex_unlock(&ca->set->bucket_lock); >> -- >> 2.20.0 >> > > Hi all, > > Could someone please review/pick this up? This is one of two remaining > -Wsometimes-uninitialized warnings among arm, arm64, and x86_64 > all{yes,mod}config and I'd like to get it turned on as soon as possible > to catch more bugs. Hi Nathan, It is in Jens' block tree for-next branch already, for Linux v5.2 merge window. Thanks. -- Coly Li
RE: [RFC PATCH 0/5] New fallback workflow for heterogeneous memory system
>-Original Message- >From: Dan Williams [mailto:dan.j.willi...@intel.com] >Sent: Thursday, April 25, 2019 11:43 PM >To: Du, Fan >Cc: Michal Hocko ; a...@linux-foundation.org; Wu, >Fengguang ; Hansen, Dave >; xishi.qiuxi...@alibaba-inc.com; Huang, Ying >; linux...@kvack.org; linux-kernel@vger.kernel.org >Subject: Re: [RFC PATCH 0/5] New fallback workflow for heterogeneous >memory system > >On Thu, Apr 25, 2019 at 1:05 AM Du, Fan wrote: >> >> >> >> >-Original Message- >> >From: owner-linux...@kvack.org [mailto:owner-linux...@kvack.org] On >> >Behalf Of Michal Hocko >> >Sent: Thursday, April 25, 2019 3:54 PM >> >To: Du, Fan >> >Cc: a...@linux-foundation.org; Wu, Fengguang >; >> >Williams, Dan J ; Hansen, Dave >> >; xishi.qiuxi...@alibaba-inc.com; Huang, Ying >> >; linux...@kvack.org; >linux-kernel@vger.kernel.org >> >Subject: Re: [RFC PATCH 0/5] New fallback workflow for heterogeneous >> >memory system >> > >> >On Thu 25-04-19 07:41:40, Du, Fan wrote: >> >> >> >> >> >> >-Original Message- >> >> >From: Michal Hocko [mailto:mho...@kernel.org] >> >> >Sent: Thursday, April 25, 2019 2:37 PM >> >> >To: Du, Fan >> >> >Cc: a...@linux-foundation.org; Wu, Fengguang >> >; >> >> >Williams, Dan J ; Hansen, Dave >> >> >; xishi.qiuxi...@alibaba-inc.com; Huang, Ying >> >> >; linux...@kvack.org; >> >linux-kernel@vger.kernel.org >> >> >Subject: Re: [RFC PATCH 0/5] New fallback workflow for heterogeneous >> >> >memory system >> >> > >> >> >On Thu 25-04-19 09:21:30, Fan Du wrote: >> >> >[...] >> >> >> However PMEM has different characteristics from DRAM, >> >> >> the more reasonable or desirable fallback style would be: >> >> >> DRAM node 0 -> DRAM node 1 -> PMEM node 2 -> PMEM node 3. >> >> >> When DRAM is exhausted, try PMEM then. >> >> > >> >> >Why and who does care? NUMA is fundamentally about memory nodes >> >with >> >> >different access characteristics so why is PMEM any special? >> >> >> >> Michal, thanks for your comments! >> >> >> >> The "different" lies in the local or remote access, usually the underlying >> >> memory is the same type, i.e. DRAM. >> >> >> >> By "special", PMEM is usually in gigantic capacity than DRAM per dimm, >> >> while with different read/write access latency than DRAM. >> > >> >You are describing a NUMA in general here. Yes access to different NUMA >> >nodes has a different read/write latency. But that doesn't make PMEM >> >really special from a regular DRAM. >> >> Not the numa distance b/w cpu and PMEM node make PMEM different >than >> DRAM. The difference lies in the physical layer. The access latency >characteristics >> comes from media level. > >No, there is no such thing as a "PMEM node". I've pushed back on this >broken concept in the past [1] [2]. Consider that PMEM could be as >fast as DRAM for technologies like NVDIMM-N or in emulation >environments. These attempts to look at persistence as an attribute of >performance are entirely missing the point that the system can have >multiple varied memory types and the platform firmware needs to >enumerate these performance properties in the HMAT on ACPI platforms. >Any scheme that only considers a binary DRAM and not-DRAM property is >immediately invalidated the moment the OS needs to consider a 3rd or >4th memory type, or a more varied connection topology. Dan, Thanks for your comments! I've understood your point from the very beginning time of your post before. Below is my something in my mind as a [standalone personal contributor] only: a. I fully recognized what HMAT is designed for. b. I understood your point for the "type" thing is temporal, and think you are right about your point. A generic approach is indeed required, however I what to elaborate the point of the problem I'm trying to solve for customer, not how we and other people solve it one way or another.. Customer require to fully utilized system memory, no matter DRAM, 1st generation PMEM, future xth generation PMEM which beats DRAM. Customer require to explicitly [coarse grained] control the memory allocation for different latency/bandwidth. Maybe it's more worthwhile to think what is needed essentially to solve the problem, And make sure it scale well enough for some period. a. Build fallback list for heterogeneous system. I prefer to build it per HMAT, because HMAT expose the latency/bandwidth from local node Perspective, it's already standardized in ACPI Spec. NUMA node distance from SLIT wouldn't be more accurately helpful for heterogeneous memory system anymore. b. Provide explicit page allocation option for frequently read accessed pages request. This requirement is well justified as well. All scenario both in kernel or user level, don't care about write latency should leverage this option to archive overall optimal performance. c. NUMA balancing for heterogeneous system. I'm aware of this topic, but it's not what I in mind(a. b.) right now. >[1]: >https://lore.kernel.org/lkml/CAPcyv4heiUbZvP7Ewoy-Hy=-mPrdjCj