Re: [PATCH v11 0/4] set VSESR_EL2 by user space and support NOTIFY_SEI notification
Hi James, thanks for this mail. On 2018/4/10 22:15, James Morse wrote: > Hi Dongjiu Geng, > > On 09/04/18 22:36, Dongjiu Geng wrote: >> 1. Detect whether KVM can set set guest SError syndrome >> 2. Support to Set VSESR_EL2 and inject SError by user space. >> 3. Support live migration to keep SError pending state and VSESR_EL2 value. >> 4. ACPI 6.1 adds support for NOTIFY_SEI as a GHES notification mechanism, so >> support this >>notification in software, KVM or kernel ARCH code call handle_guest_sei() >> to let ACP driver >>to handle this notification. > > Please don't post code during the merge-window, will this apply to v4.17-rc1? > We > can't know until its tagged. I do not know when it is merge-window. About the apply version, it does not have limited. > > > This series is doing two separate things, please split it into two series. OK, thanks! > > But on the ACPI front: I don't see how any OS can support your NOTIFY_SEI when > firmware is ignoring the normal world's PSTATE.A. > > The latest lobe of that discussion was on the list here: > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1611496.html I have replied the mail. I still have some questions that need to clarify with you. After clarification, we will follow that. The question is in the reply of this mail "https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1611496.html"; > > > As it is, we would need to spot SError being delivered while SError is masked, > spray nasty messages about firmware being horrifically buggy, then panic(). > For > a corrected error, this looks bad, but its preferable to letting firmware > silently overwrite the exception registers, causing linux to spin through the > vectors 'eret' with all exceptions masked. > I still think its best to wait for firmware that does the right thing. Let us discuss that in another mail. In a summary, I think firmware follow below rule can be OK, right? 1. The exception came from the EL that SError should be routed to(according to hcr_EL2.{AMO, TGE}),but PSTATE.A was set, EL3 firmware can't deliver SError; 2. The exception came from the EL that SError should not be routed to(according to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3 firmware still deliver SError > > > Thanks, > > James > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8
Dear James, Thanks for this mail and sorry for my late response. 2018-02-16 1:55 GMT+08:00 James Morse : > Hi gengdongjiu, liu jun > > On 05/02/18 11:24, gengdongjiu wrote: [] >> >>> Is the emulated SError routed following the routing rules for HCR_EL2.{AMO, >>> TGE}? >> >> Yes, it is. > > ... and yet ... > > >>> What does your firmware do when it wants to emulate SError but its masked? >>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had >>> PSTATE.A set. >>> e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the >>> emulated SError should go to EL1. This effectively masks SError.) >> >> Currently we does not consider much about the mask status(SPSR). > > .. this is a problem. > > If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception > interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't eret > to > EL2. This should never happen, SError is effectively masked if you are running > at an EL higher than the one its routed to. > > More obviously: if the exception came from the EL that SError should be routed > to, but PSTATE.A was set, you can't deliver SError. Masking SError is the only James, I summarized the masking and routing rules for SError to confirm with you for the firmware first solution, 1. If the HCR_EL2.{AMO,TGE} is set, which means the SError should route to EL2, When system happens SError and trap to EL3, If EL3 find HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set, and find this SError come from EL2, it will not deliver an SError: store the RAS error in the BERT and 'reboot'; but if it find that this SError come from EL1 or EL0, it also need to deliver an SError, right? 2. If the HCR_EL2.{AMO,TGE} is not set, which means the SError should route to EL1, When system happens SError and trap to EL3, If EL3 find HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set, and find this SError come from EL1, it will not deliver an SError: store the RAS error in the BERT and 'reboot'; but if it find that this SError come from EL0, it also need to deliver an SError, right? > way the OS has to indicate it can't take an exception right now. VBAR_EL1 may > be > 'wrong' if we're doing some power-management, the FAR/ELR/ESR/SPSR registers > may > contain live values that the OS would lose if you deliver another exception > over > the top. > > If you deliver an emulated-SError as the OS eret's, your new ELR will point at > the eret instruction and the CPU will spin on this instruction forever. > > You have to honour the masking and routing rules for SError, otherwise no OS > can > run safely with this firmware. > > >> I remember that you ever suggested firmware should reboot if the mask status >> is set(SPSR), right? > > Yes, this is my suggestion of what to do if you can't deliver an SError: store > the RAS error in the BERT and 'reboot'. > > >> I CC "liu jun" who is our UEFI firmware Architect, >> if you have firmware requirements, you can raise again. > > (UEFI? I didn't think there was any of that at EL3, but I'm not familiar with > all the 'PI' bits). > > The requirement is your emulated-SError from EL3 looks exactly like a > physical-SError as if EL3 wasn't implemented. > Your CPU has to handle cases where it can't deliver an SError, your emulation > has to do the same. > > This is not something any OS can work around. > > >>> Answers to these let us determine whether a bug is in the firmware or the >>> kernel. If firmware is expecting the OS to do something special, I'd like >>> to know >>> about it from the beginning! >> >> I know your meaning, thanks for raising it again. > > > Happy new year, > > James > ___ > kvmarm mailing list > kvm...@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers
On 04/11/2018 07:51 PM, Jae Hyun Yoo wrote: On 4/11/2018 5:34 PM, Guenter Roeck wrote: On 04/11/2018 02:59 PM, Jae Hyun Yoo wrote: Hi Guenter, Thanks a lot for sharing your time. Please see my inline answers. On 4/10/2018 3:28 PM, Guenter Roeck wrote: On Tue, Apr 10, 2018 at 11:32:11AM -0700, Jae Hyun Yoo wrote: This commit adds PECI cputemp and dimmtemp hwmon drivers. Signed-off-by: Jae Hyun Yoo Reviewed-by: Haiyue Wang Reviewed-by: James Feist Reviewed-by: Vernon Mauery Cc: Alan Cox Cc: Andrew Jeffery Cc: Andrew Lunn Cc: Andy Shevchenko Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Fengguang Wu Cc: Greg KH Cc: Guenter Roeck Cc: Jason M Biils Cc: Jean Delvare Cc: Joel Stanley Cc: Julia Cartwright Cc: Miguel Ojeda Cc: Milton Miller II Cc: Pavel Machek Cc: Randy Dunlap Cc: Stef van Os Cc: Sumeet R Pawnikar --- drivers/hwmon/Kconfig | 28 ++ drivers/hwmon/Makefile | 2 + drivers/hwmon/peci-cputemp.c | 783 ++ drivers/hwmon/peci-dimmtemp.c | 432 +++ 4 files changed, 1245 insertions(+) create mode 100644 drivers/hwmon/peci-cputemp.c create mode 100644 drivers/hwmon/peci-dimmtemp.c diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig index f249a4428458..c52f610f81d0 100644 --- a/drivers/hwmon/Kconfig +++ b/drivers/hwmon/Kconfig @@ -1259,6 +1259,34 @@ config SENSORS_NCT7904 This driver can also be built as a module. If so, the module will be called nct7904. +config SENSORS_PECI_CPUTEMP + tristate "PECI CPU temperature monitoring support" + depends on OF + depends on PECI + help + If you say yes here you get support for the generic Intel PECI + cputemp driver which provides Digital Thermal Sensor (DTS) thermal + readings of the CPU package and CPU cores that are accessible using + the PECI Client Command Suite via the processor PECI client. + Check Documentation/hwmon/peci-cputemp for details. + + This driver can also be built as a module. If so, the module + will be called peci-cputemp. + +config SENSORS_PECI_DIMMTEMP + tristate "PECI DIMM temperature monitoring support" + depends on OF + depends on PECI + help + If you say yes here you get support for the generic Intel PECI hwmon + driver which provides Digital Thermal Sensor (DTS) thermal readings of + DIMM components that are accessible using the PECI Client Command + Suite via the processor PECI client. + Check Documentation/hwmon/peci-dimmtemp for details. + + This driver can also be built as a module. If so, the module + will be called peci-dimmtemp. + config SENSORS_NSA320 tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors" depends on GPIOLIB && OF diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile index e7d52a36e6c4..48d9598fcd3a 100644 --- a/drivers/hwmon/Makefile +++ b/drivers/hwmon/Makefile @@ -136,6 +136,8 @@ obj-$(CONFIG_SENSORS_NCT7802) += nct7802.o obj-$(CONFIG_SENSORS_NCT7904) += nct7904.o obj-$(CONFIG_SENSORS_NSA320) += nsa320-hwmon.o obj-$(CONFIG_SENSORS_NTC_THERMISTOR) += ntc_thermistor.o +obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o +obj-$(CONFIG_SENSORS_PECI_DIMMTEMP) += peci-dimmtemp.o obj-$(CONFIG_SENSORS_PC87360) += pc87360.o obj-$(CONFIG_SENSORS_PC87427) += pc87427.o obj-$(CONFIG_SENSORS_PCF8591) += pcf8591.o diff --git a/drivers/hwmon/peci-cputemp.c b/drivers/hwmon/peci-cputemp.c new file mode 100644 index ..f0bc92687512 --- /dev/null +++ b/drivers/hwmon/peci-cputemp.c @@ -0,0 +1,783 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2018 Intel Corporation + +#include +#include +#include Is this include needed ? No it isn't. Will drop the line. +#include +#include +#include +#include + +#define TEMP_TYPE_PECI 6 /* Sensor type 6: Intel PECI */ + +#define CORE_MAX_ON_HSX 18 /* Max number of cores on Haswell */ +#define CORE_MAX_ON_BDX 24 /* Max number of cores on Broadwell */ +#define CORE_MAX_ON_SKX 28 /* Max number of cores on Skylake */ + +#define DEFAULT_CHANNEL_NUMS 5 +#define CORETEMP_CHANNEL_NUMS CORE_MAX_ON_SKX +#define CPUTEMP_CHANNEL_NUMS (DEFAULT_CHANNEL_NUMS + CORETEMP_CHANNEL_NUMS) + +#define CLIENT_CPU_ID_MASK 0xf0ff0 /* Mask for Family / Model info */ + +#define UPDATE_INTERVAL_MIN HZ + +enum cpu_gens { + CPU_GEN_HSX, /* Haswell Xeon */ + CPU_GEN_BRX, /* Broadwell Xeon */ + CPU_GEN_SKX, /* Skylake Xeon */ + CPU_GEN_MAX +}; + +struct cpu_gen_info { + u32 type; + u32 cpu_id; + u32 core_max; +}; + +struct temp_data { + bool valid; + s32 value; + unsigned long last_updated; +}; + +struct temp_group { + struct temp_data die; + struct temp_data dts_margin; + struct temp_data tcontrol; + struct temp_data tthrottle; + struct temp_data tjmax; + struct temp_data core[CORETEMP_CHANNEL_NUMS]
Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers
On 4/11/2018 5:34 PM, Guenter Roeck wrote: On 04/11/2018 02:59 PM, Jae Hyun Yoo wrote: Hi Guenter, Thanks a lot for sharing your time. Please see my inline answers. On 4/10/2018 3:28 PM, Guenter Roeck wrote: On Tue, Apr 10, 2018 at 11:32:11AM -0700, Jae Hyun Yoo wrote: This commit adds PECI cputemp and dimmtemp hwmon drivers. Signed-off-by: Jae Hyun Yoo Reviewed-by: Haiyue Wang Reviewed-by: James Feist Reviewed-by: Vernon Mauery Cc: Alan Cox Cc: Andrew Jeffery Cc: Andrew Lunn Cc: Andy Shevchenko Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Fengguang Wu Cc: Greg KH Cc: Guenter Roeck Cc: Jason M Biils Cc: Jean Delvare Cc: Joel Stanley Cc: Julia Cartwright Cc: Miguel Ojeda Cc: Milton Miller II Cc: Pavel Machek Cc: Randy Dunlap Cc: Stef van Os Cc: Sumeet R Pawnikar --- drivers/hwmon/Kconfig | 28 ++ drivers/hwmon/Makefile | 2 + drivers/hwmon/peci-cputemp.c | 783 ++ drivers/hwmon/peci-dimmtemp.c | 432 +++ 4 files changed, 1245 insertions(+) create mode 100644 drivers/hwmon/peci-cputemp.c create mode 100644 drivers/hwmon/peci-dimmtemp.c diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig index f249a4428458..c52f610f81d0 100644 --- a/drivers/hwmon/Kconfig +++ b/drivers/hwmon/Kconfig @@ -1259,6 +1259,34 @@ config SENSORS_NCT7904 This driver can also be built as a module. If so, the module will be called nct7904. +config SENSORS_PECI_CPUTEMP + tristate "PECI CPU temperature monitoring support" + depends on OF + depends on PECI + help + If you say yes here you get support for the generic Intel PECI + cputemp driver which provides Digital Thermal Sensor (DTS) thermal + readings of the CPU package and CPU cores that are accessible using + the PECI Client Command Suite via the processor PECI client. + Check Documentation/hwmon/peci-cputemp for details. + + This driver can also be built as a module. If so, the module + will be called peci-cputemp. + +config SENSORS_PECI_DIMMTEMP + tristate "PECI DIMM temperature monitoring support" + depends on OF + depends on PECI + help + If you say yes here you get support for the generic Intel PECI hwmon + driver which provides Digital Thermal Sensor (DTS) thermal readings of + DIMM components that are accessible using the PECI Client Command + Suite via the processor PECI client. + Check Documentation/hwmon/peci-dimmtemp for details. + + This driver can also be built as a module. If so, the module + will be called peci-dimmtemp. + config SENSORS_NSA320 tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors" depends on GPIOLIB && OF diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile index e7d52a36e6c4..48d9598fcd3a 100644 --- a/drivers/hwmon/Makefile +++ b/drivers/hwmon/Makefile @@ -136,6 +136,8 @@ obj-$(CONFIG_SENSORS_NCT7802) += nct7802.o obj-$(CONFIG_SENSORS_NCT7904) += nct7904.o obj-$(CONFIG_SENSORS_NSA320) += nsa320-hwmon.o obj-$(CONFIG_SENSORS_NTC_THERMISTOR) += ntc_thermistor.o +obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o +obj-$(CONFIG_SENSORS_PECI_DIMMTEMP) += peci-dimmtemp.o obj-$(CONFIG_SENSORS_PC87360) += pc87360.o obj-$(CONFIG_SENSORS_PC87427) += pc87427.o obj-$(CONFIG_SENSORS_PCF8591) += pcf8591.o diff --git a/drivers/hwmon/peci-cputemp.c b/drivers/hwmon/peci-cputemp.c new file mode 100644 index ..f0bc92687512 --- /dev/null +++ b/drivers/hwmon/peci-cputemp.c @@ -0,0 +1,783 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2018 Intel Corporation + +#include +#include +#include Is this include needed ? No it isn't. Will drop the line. +#include +#include +#include +#include + +#define TEMP_TYPE_PECI 6 /* Sensor type 6: Intel PECI */ + +#define CORE_MAX_ON_HSX 18 /* Max number of cores on Haswell */ +#define CORE_MAX_ON_BDX 24 /* Max number of cores on Broadwell */ +#define CORE_MAX_ON_SKX 28 /* Max number of cores on Skylake */ + +#define DEFAULT_CHANNEL_NUMS 5 +#define CORETEMP_CHANNEL_NUMS CORE_MAX_ON_SKX +#define CPUTEMP_CHANNEL_NUMS (DEFAULT_CHANNEL_NUMS + CORETEMP_CHANNEL_NUMS) + +#define CLIENT_CPU_ID_MASK 0xf0ff0 /* Mask for Family / Model info */ + +#define UPDATE_INTERVAL_MIN HZ + +enum cpu_gens { + CPU_GEN_HSX, /* Haswell Xeon */ + CPU_GEN_BRX, /* Broadwell Xeon */ + CPU_GEN_SKX, /* Skylake Xeon */ + CPU_GEN_MAX +}; + +struct cpu_gen_info { + u32 type; + u32 cpu_id; + u32 core_max; +}; + +struct temp_data { + bool valid; + s32 value; + unsigned long last_updated; +}; + +struct temp_group { + struct temp_data die; + struct temp_data dts_margin; + struct temp_data tcontrol; + struct temp_data tthrottle; + struct temp_data tjmax; + struct temp_data core[CORETEMP_CHANNEL_NUMS]; +}; + +struct peci_cpute
Re: [PATCH v3 05/10] ARM: dts: aspeed: peci: Add PECI node
On 4/11/2018 4:52 AM, Joel Stanley wrote: On 11 April 2018 at 04:02, Jae Hyun Yoo wrote: This commit adds PECI bus/adapter node of AST24xx/AST25xx into aspeed-g4 and aspeed-g5. The patches to the device trees get merged by the ASPEED maintainer (me). Once you have the bindings reviewed you can send the patches to me and the linux-aspeed list (I've got a pending patch to maintainers that will ensure get_maintainers.pl does the right thing as far as email addresses go). I'd suggest dropping it from your series and re-sending once the bindings and driver are reviewed. Cheers, Joel Do you mean that bindings and driver of ASPEED peci adapter driver including documents? Thanks, -Jae -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 04/10] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
Hi Joel, On 4/11/2018 4:52 AM, Joel Stanley wrote: On 11 April 2018 at 04:02, Jae Hyun Yoo wrote: This commit adds a dt-bindings document of PECI adapter driver for Aspeed We try to capitalise ASPEED. Got it. Will capitalize all Aspeed words. AST24xx/25xx SoCs. --- .../devicetree/bindings/peci/peci-aspeed.txt | 60 ++ 1 file changed, 60 insertions(+) create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.txt diff --git a/Documentation/devicetree/bindings/peci/peci-aspeed.txt b/Documentation/devicetree/bindings/peci/peci-aspeed.txt new file mode 100644 index ..4598bb8c20fa --- /dev/null +++ b/Documentation/devicetree/bindings/peci/peci-aspeed.txt @@ -0,0 +1,60 @@ +Device tree configuration for PECI buses on the AST24XX and AST25XX SoCs. + +Required properties: +- compatible: Should be "aspeed,ast2400-peci" or "aspeed,ast2500-peci" + - aspeed,ast2400-peci: Aspeed AST2400 family PECI +controller + - aspeed,ast2500-peci: Aspeed AST2500 family PECI +controller +- reg : Should contain PECI controller registers location and + length. +- #address-cells: Should be <1>. +- #size-cells : Should be <0>. +- interrupts: Should contain PECI controller interrupt. +- clocks: Should contain clock source for PECI controller. + Should reference clkin. Are you sure that this is driven by clkin? Most peripherals on the Aspeed are attached to the apb, so should reference that clock. According to the datasheet, PECI controller module is attached to apb but its clock source is the 24MHz external clock. +- clock_frequency : Should contain the operation frequency of PECI controller + in units of Hz. + 187500 ~ 2400 Can you explain why you need both the parent clock and this frequency to be specified? Based on this setting, driver code makes clock divisor value to set operation clock of PECI controller which is adjustable. + +Optional properties: +- msg-timing-nego : Message timing negotiation period. This value will Perhaps msg-timing-period? Or just msg-timing? Will use msg-timing instead. + determine the period of message timing negotiation to be + issued by PECI controller. The unit of the programmed + value is four times of PECI clock period. + 0 ~ 255 (default: 1) +- addr-timing-nego : Address timing negotiation period. This value will + determine the period of address timing negotiation to be + issued by PECI controller. The unit of the programmed + value is four times of PECI clock period. + 0 ~ 255 (default: 1) +- rd-sampling-point : Read sampling point selection. The whole period of a bit + time will be divided into 16 time frames. This value will + determine the time frame in which the controller will + sample PECI signal for data read back. Usually in the + middle of a bit time is the best. + 0 ~ 15 (default: 8) +- cmd_timeout_ms: Command timeout in units of ms. + 1 ~ 6 (default: 1000) + +Example: + peci: peci@1e78b000 { + compatible = "simple-bus"; + #address-cells = <1>; + #size-cells = <1>; + ranges = <0x0 0x1e78b000 0x60>; + + peci0: peci-bus@0 { + compatible = "aspeed,ast2500-peci"; + reg = <0x0 0x60>; + #address-cells = <1>; + #size-cells = <0>; + interrupts = <15>; + clocks = <&clk_clkin>; + clock-frequency = <2400>; + msg-timing-nego = <1>; + addr-timing-nego = <1>; + rd-sampling-point = <8>; + cmd-timeout-ms = <1000>; + }; + }; -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 01/10] Documentations: dt-bindings: Add documents of generic PECI bus, adapter and client drivers
Hi Joel, On 4/11/2018 4:52 AM, Joel Stanley wrote: Hi Jae, On 11 April 2018 at 04:02, Jae Hyun Yoo wrote: This commit adds documents of generic PECI bus, adapter and client drivers. Signed-off-by: Jae Hyun Yoo Reviewed-by: Haiyue Wang Reviewed-by: James Feist Reviewed-by: Vernon Mauery Cc: Alan Cox Cc: Andrew Jeffery Cc: Andrew Lunn Cc: Andy Shevchenko Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Fengguang Wu Cc: Greg KH Cc: Guenter Roeck Cc: Jason M Biils Cc: Jean Delvare Cc: Joel Stanley Cc: Julia Cartwright Cc: Miguel Ojeda Cc: Milton Miller II Cc: Pavel Machek Cc: Randy Dunlap Cc: Stef van Os Cc: Sumeet R Pawnikar That's a hefty cc list. I can't see Rob Herring though, and he's usually the person who you need to convince to get your bindings accepted. I recommend using ./scripts/get_maintainers.pl to build your CC list, and then add others you think are relevant. I'm not sure what the guidelines are for generic bindings, so I'll defer to Rob for this patch. Cheers, Joel Thanks a lot for letting me know that. I'll do as you suggested. -Jae --- .../devicetree/bindings/peci/peci-adapter.txt | 23 .../devicetree/bindings/peci/peci-bus.txt | 15 + .../devicetree/bindings/peci/peci-client.txt | 25 ++ 3 files changed, 63 insertions(+) create mode 100644 Documentation/devicetree/bindings/peci/peci-adapter.txt create mode 100644 Documentation/devicetree/bindings/peci/peci-bus.txt create mode 100644 Documentation/devicetree/bindings/peci/peci-client.txt diff --git a/Documentation/devicetree/bindings/peci/peci-adapter.txt b/Documentation/devicetree/bindings/peci/peci-adapter.txt new file mode 100644 index ..9221374f6b11 --- /dev/null +++ b/Documentation/devicetree/bindings/peci/peci-adapter.txt @@ -0,0 +1,23 @@ +Generic device tree configuration for PECI adapters. + +Required properties: +- compatible : Should contain hardware specific definition strings that can + match an adapter driver implementation. +- reg: Should contain PECI controller registers location and length. +- #address-cells : Should be <1>. +- #size-cells: Should be <0>. + +Example: + peci: peci@1000 { + compatible = "simple-bus"; + #address-cells = <1>; + #size-cells = <1>; + ranges = <0x0 0x1000 0x1000>; + + peci0: peci-bus@0 { + compatible = "soc,soc-peci"; + reg = <0x0 0x1000>; + #address-cells = <1>; + #size-cells = <0>; + }; + }; diff --git a/Documentation/devicetree/bindings/peci/peci-bus.txt b/Documentation/devicetree/bindings/peci/peci-bus.txt new file mode 100644 index ..90bcc791ccb0 --- /dev/null +++ b/Documentation/devicetree/bindings/peci/peci-bus.txt @@ -0,0 +1,15 @@ +Generic device tree configuration for PECI buses. + +Required properties: +- compatible : Should be "simple-bus". +- #address-cells : Should be <1>. +- #size-cells: Should be <1>. +- ranges : Should contain PECI controller registers ranges. + +Example: + peci: peci@1000 { + compatible = "simple-bus"; + #address-cells = <1>; + #size-cells = <1>; + ranges = <0x0 0x1000 0x1000>; + }; diff --git a/Documentation/devicetree/bindings/peci/peci-client.txt b/Documentation/devicetree/bindings/peci/peci-client.txt new file mode 100644 index ..8e2bfd8532f6 --- /dev/null +++ b/Documentation/devicetree/bindings/peci/peci-client.txt @@ -0,0 +1,25 @@ +Generic device tree configuration for PECI clients. + +Required properties: +- compatible : Should contain target device specific definition strings that can + match a client driver implementation. +- reg: Should contain address of a client CPU. Address range of CPU + clients is starting from 0x30 based on PECI specification. + <0x30> .. <0x37> (depends on the PECI_OFFSET_MAX definition) + +Example: + peci-bus@0 { + #address-cells = <1>; + #size-cells = <0>; + < more properties > + + function@cpu0 { + compatible = "device,function"; + reg = <0x30>; + }; + + function@cpu1 { + compatible = "device,function"; + reg = <0x31>; + }; + }; -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 06/10] drivers/peci: Add a PECI adapter driver for Aspeed AST24xx/AST25xx
Hello Joel, Thanks for sharing your time. Please see my answers inline. On 4/11/2018 4:51 AM, Joel Stanley wrote: Hello Jae, On 11 April 2018 at 04:02, Jae Hyun Yoo wrote: This commit adds PECI adapter driver implementation for Aspeed AST24xx/AST25xx. The driver is looking good! It looks like you've done some kind of review that we weren't allowed to see, which is a double edged sword - I might be asking about things that you've already spoken about with someone else. I'm only just learning about PECI, but I do have some general comments below. Yes, it took a hidden review process between v2 and v3. I know it's an unusual process but it was requested. Hopefully, change logs in cover letter could roughly provide the details. Thanks for your comments. --- drivers/peci/Kconfig | 28 +++ drivers/peci/Makefile | 3 + drivers/peci/peci-aspeed.c | 504 + 3 files changed, 535 insertions(+) create mode 100644 drivers/peci/peci-aspeed.c diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig index 1fbc13f9e6c2..0e33420365de 100644 --- a/drivers/peci/Kconfig +++ b/drivers/peci/Kconfig @@ -14,4 +14,32 @@ config PECI processors and chipset components to external monitoring or control devices. + If you want PECI support, you should say Y here and also to the + specific driver for your bus adapter(s) below. + +if PECI + +# +# PECI hardware bus configuration +# + +menu "PECI Hardware Bus support" + +config PECI_ASPEED + tristate "Aspeed AST24xx/AST25xx PECI support" I think just saying ASPEED PECI support is enough. That way if the next ASPEED SoC happens to have PECI we don't need to update all of the help text :) Agreed. I'll change the description. + select REGMAP_MMIO + depends on OF + depends on ARCH_ASPEED || COMPILE_TEST + help + Say Y here if you want support for the Platform Environment Control + Interface (PECI) bus adapter driver on the Aspeed AST24XX and AST25XX + SoCs. + + This support is also available as a module. If so, the module + will be called peci-aspeed. + +endmenu + +endif # PECI + endmenu diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile index 9e8615e0d3ff..886285e69765 100644 --- a/drivers/peci/Makefile +++ b/drivers/peci/Makefile @@ -4,3 +4,6 @@ # Core functionality obj-$(CONFIG_PECI) += peci-core.o + +# Hardware specific bus drivers +obj-$(CONFIG_PECI_ASPEED) += peci-aspeed.o diff --git a/drivers/peci/peci-aspeed.c b/drivers/peci/peci-aspeed.c new file mode 100644 index ..be2a1f327eb1 --- /dev/null +++ b/drivers/peci/peci-aspeed.c @@ -0,0 +1,504 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2012-2017 ASPEED Technology Inc. +// Copyright (c) 2018 Intel Corporation + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define DUMP_DEBUG 0 + +/* Aspeed PECI Registers */ +#define AST_PECI_CTRL 0x00 Nit: we use ASPEED instead of AST in the upstream kernel to distingush from the aspeed sdk drivers. If you feel strongly about this then I won't insist you change. Okay then, better change it now than later. Will change all defines. +#define AST_PECI_TIMING 0x04 +#define AST_PECI_CMD 0x08 +#define AST_PECI_CMD_CTRL 0x0c +#define AST_PECI_EXP_FCS 0x10 +#define AST_PECI_CAP_FCS 0x14 +#define AST_PECI_INT_CTRL 0x18 +#define AST_PECI_INT_STS 0x1c +#define AST_PECI_W_DATA0 0x20 +#define AST_PECI_W_DATA1 0x24 +#define AST_PECI_W_DATA2 0x28 +#define AST_PECI_W_DATA3 0x2c +#define AST_PECI_R_DATA0 0x30 +#define AST_PECI_R_DATA1 0x34 +#define AST_PECI_R_DATA2 0x38 +#define AST_PECI_R_DATA3 0x3c +#define AST_PECI_W_DATA4 0x40 +#define AST_PECI_W_DATA5 0x44 +#define AST_PECI_W_DATA6 0x48 +#define AST_PECI_W_DATA7 0x4c +#define AST_PECI_R_DATA4 0x50 +#define AST_PECI_R_DATA5 0x54 +#define AST_PECI_R_DATA6 0x58 +#define AST_PECI_R_DATA7 0x5c + +/* AST_PECI_CTRL - 0x00 : Control Register */ +#define PECI_CTRL_SAMPLING_MASK GENMASK(19, 16) +#define PECI_CTRL_SAMPLING(x) (((x) << 16) & PECI_CTRL_SAMPLING_MASK) +#define PECI_CTRL_SAMPLING_GET(x) (((x) & PECI_CTRL_SAMPLING_MASK) >> 16) +#define PECI_CTRL_READ_MODE_MASKGENMASK(13, 12) +#define PECI_CTRL_READ_MODE(x) (((x) << 12) & PECI_CTRL_READ_MODE_MASK) +#define PECI_CTRL_READ_MODE_GET(x) (((x) & PECI_CTRL_READ_MODE_MASK) >> 12) +#define PECI_CTRL_READ_MODE_COUNT BIT(12) +#define PECI_CTRL_READ_MODE_DBG BIT(13) +#define PECI_CTRL_CLK_SOURCE_MASK BIT(11) +#define PECI_CTRL_CLK_SOURCE(x) (((x) << 11) & PECI_CTRL_CLK_SOURCE_MASK) +#define PECI_CTRL_CLK_SOURCE_GET(x) (((x) & PECI_CTRL_CLK_SOURCE_MASK) >> 11) +#define PECI_CTRL_CLK_DIV_MASK GENMASK(10, 8) +#define PECI_CTRL_CLK_DIV(x)(((x) << 8) & PECI_CTRL_CLK_DIV_MASK) +#define PECI_CTRL_CLK_DIV_GET(x)(((x) & PECI_CTRL_CLK_DIV_MASK)
Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers
On 04/11/2018 02:59 PM, Jae Hyun Yoo wrote: Hi Guenter, Thanks a lot for sharing your time. Please see my inline answers. On 4/10/2018 3:28 PM, Guenter Roeck wrote: On Tue, Apr 10, 2018 at 11:32:11AM -0700, Jae Hyun Yoo wrote: This commit adds PECI cputemp and dimmtemp hwmon drivers. Signed-off-by: Jae Hyun Yoo Reviewed-by: Haiyue Wang Reviewed-by: James Feist Reviewed-by: Vernon Mauery Cc: Alan Cox Cc: Andrew Jeffery Cc: Andrew Lunn Cc: Andy Shevchenko Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Fengguang Wu Cc: Greg KH Cc: Guenter Roeck Cc: Jason M Biils Cc: Jean Delvare Cc: Joel Stanley Cc: Julia Cartwright Cc: Miguel Ojeda Cc: Milton Miller II Cc: Pavel Machek Cc: Randy Dunlap Cc: Stef van Os Cc: Sumeet R Pawnikar --- drivers/hwmon/Kconfig | 28 ++ drivers/hwmon/Makefile | 2 + drivers/hwmon/peci-cputemp.c | 783 ++ drivers/hwmon/peci-dimmtemp.c | 432 +++ 4 files changed, 1245 insertions(+) create mode 100644 drivers/hwmon/peci-cputemp.c create mode 100644 drivers/hwmon/peci-dimmtemp.c diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig index f249a4428458..c52f610f81d0 100644 --- a/drivers/hwmon/Kconfig +++ b/drivers/hwmon/Kconfig @@ -1259,6 +1259,34 @@ config SENSORS_NCT7904 This driver can also be built as a module. If so, the module will be called nct7904. +config SENSORS_PECI_CPUTEMP + tristate "PECI CPU temperature monitoring support" + depends on OF + depends on PECI + help + If you say yes here you get support for the generic Intel PECI + cputemp driver which provides Digital Thermal Sensor (DTS) thermal + readings of the CPU package and CPU cores that are accessible using + the PECI Client Command Suite via the processor PECI client. + Check Documentation/hwmon/peci-cputemp for details. + + This driver can also be built as a module. If so, the module + will be called peci-cputemp. + +config SENSORS_PECI_DIMMTEMP + tristate "PECI DIMM temperature monitoring support" + depends on OF + depends on PECI + help + If you say yes here you get support for the generic Intel PECI hwmon + driver which provides Digital Thermal Sensor (DTS) thermal readings of + DIMM components that are accessible using the PECI Client Command + Suite via the processor PECI client. + Check Documentation/hwmon/peci-dimmtemp for details. + + This driver can also be built as a module. If so, the module + will be called peci-dimmtemp. + config SENSORS_NSA320 tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors" depends on GPIOLIB && OF diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile index e7d52a36e6c4..48d9598fcd3a 100644 --- a/drivers/hwmon/Makefile +++ b/drivers/hwmon/Makefile @@ -136,6 +136,8 @@ obj-$(CONFIG_SENSORS_NCT7802) += nct7802.o obj-$(CONFIG_SENSORS_NCT7904) += nct7904.o obj-$(CONFIG_SENSORS_NSA320) += nsa320-hwmon.o obj-$(CONFIG_SENSORS_NTC_THERMISTOR) += ntc_thermistor.o +obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o +obj-$(CONFIG_SENSORS_PECI_DIMMTEMP) += peci-dimmtemp.o obj-$(CONFIG_SENSORS_PC87360) += pc87360.o obj-$(CONFIG_SENSORS_PC87427) += pc87427.o obj-$(CONFIG_SENSORS_PCF8591) += pcf8591.o diff --git a/drivers/hwmon/peci-cputemp.c b/drivers/hwmon/peci-cputemp.c new file mode 100644 index ..f0bc92687512 --- /dev/null +++ b/drivers/hwmon/peci-cputemp.c @@ -0,0 +1,783 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2018 Intel Corporation + +#include +#include +#include Is this include needed ? No it isn't. Will drop the line. +#include +#include +#include +#include + +#define TEMP_TYPE_PECI 6 /* Sensor type 6: Intel PECI */ + +#define CORE_MAX_ON_HSX 18 /* Max number of cores on Haswell */ +#define CORE_MAX_ON_BDX 24 /* Max number of cores on Broadwell */ +#define CORE_MAX_ON_SKX 28 /* Max number of cores on Skylake */ + +#define DEFAULT_CHANNEL_NUMS 5 +#define CORETEMP_CHANNEL_NUMS CORE_MAX_ON_SKX +#define CPUTEMP_CHANNEL_NUMS (DEFAULT_CHANNEL_NUMS + CORETEMP_CHANNEL_NUMS) + +#define CLIENT_CPU_ID_MASK 0xf0ff0 /* Mask for Family / Model info */ + +#define UPDATE_INTERVAL_MIN HZ + +enum cpu_gens { + CPU_GEN_HSX, /* Haswell Xeon */ + CPU_GEN_BRX, /* Broadwell Xeon */ + CPU_GEN_SKX, /* Skylake Xeon */ + CPU_GEN_MAX +}; + +struct cpu_gen_info { + u32 type; + u32 cpu_id; + u32 core_max; +}; + +struct temp_data { + bool valid; + s32 value; + unsigned long last_updated; +}; + +struct temp_group { + struct temp_data die; + struct temp_data dts_margin; + struct temp_data tcontrol; + struct temp_data tthrottle; + struct temp_data tjmax; + struct temp_data core[CORETEMP_CHANNEL_NUMS]; +}; + +struct peci_cputemp { + struct peci_client *client; + struct device *dev;
Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers
Hi Guenter, Thanks a lot for sharing your time. Please see my inline answers. On 4/10/2018 3:28 PM, Guenter Roeck wrote: On Tue, Apr 10, 2018 at 11:32:11AM -0700, Jae Hyun Yoo wrote: This commit adds PECI cputemp and dimmtemp hwmon drivers. Signed-off-by: Jae Hyun Yoo Reviewed-by: Haiyue Wang Reviewed-by: James Feist Reviewed-by: Vernon Mauery Cc: Alan Cox Cc: Andrew Jeffery Cc: Andrew Lunn Cc: Andy Shevchenko Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Fengguang Wu Cc: Greg KH Cc: Guenter Roeck Cc: Jason M Biils Cc: Jean Delvare Cc: Joel Stanley Cc: Julia Cartwright Cc: Miguel Ojeda Cc: Milton Miller II Cc: Pavel Machek Cc: Randy Dunlap Cc: Stef van Os Cc: Sumeet R Pawnikar --- drivers/hwmon/Kconfig | 28 ++ drivers/hwmon/Makefile| 2 + drivers/hwmon/peci-cputemp.c | 783 ++ drivers/hwmon/peci-dimmtemp.c | 432 +++ 4 files changed, 1245 insertions(+) create mode 100644 drivers/hwmon/peci-cputemp.c create mode 100644 drivers/hwmon/peci-dimmtemp.c diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig index f249a4428458..c52f610f81d0 100644 --- a/drivers/hwmon/Kconfig +++ b/drivers/hwmon/Kconfig @@ -1259,6 +1259,34 @@ config SENSORS_NCT7904 This driver can also be built as a module. If so, the module will be called nct7904. +config SENSORS_PECI_CPUTEMP + tristate "PECI CPU temperature monitoring support" + depends on OF + depends on PECI + help + If you say yes here you get support for the generic Intel PECI + cputemp driver which provides Digital Thermal Sensor (DTS) thermal + readings of the CPU package and CPU cores that are accessible using + the PECI Client Command Suite via the processor PECI client. + Check Documentation/hwmon/peci-cputemp for details. + + This driver can also be built as a module. If so, the module + will be called peci-cputemp. + +config SENSORS_PECI_DIMMTEMP + tristate "PECI DIMM temperature monitoring support" + depends on OF + depends on PECI + help + If you say yes here you get support for the generic Intel PECI hwmon + driver which provides Digital Thermal Sensor (DTS) thermal readings of + DIMM components that are accessible using the PECI Client Command + Suite via the processor PECI client. + Check Documentation/hwmon/peci-dimmtemp for details. + + This driver can also be built as a module. If so, the module + will be called peci-dimmtemp. + config SENSORS_NSA320 tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors" depends on GPIOLIB && OF diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile index e7d52a36e6c4..48d9598fcd3a 100644 --- a/drivers/hwmon/Makefile +++ b/drivers/hwmon/Makefile @@ -136,6 +136,8 @@ obj-$(CONFIG_SENSORS_NCT7802) += nct7802.o obj-$(CONFIG_SENSORS_NCT7904) += nct7904.o obj-$(CONFIG_SENSORS_NSA320) += nsa320-hwmon.o obj-$(CONFIG_SENSORS_NTC_THERMISTOR) += ntc_thermistor.o +obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o +obj-$(CONFIG_SENSORS_PECI_DIMMTEMP)+= peci-dimmtemp.o obj-$(CONFIG_SENSORS_PC87360) += pc87360.o obj-$(CONFIG_SENSORS_PC87427) += pc87427.o obj-$(CONFIG_SENSORS_PCF8591) += pcf8591.o diff --git a/drivers/hwmon/peci-cputemp.c b/drivers/hwmon/peci-cputemp.c new file mode 100644 index ..f0bc92687512 --- /dev/null +++ b/drivers/hwmon/peci-cputemp.c @@ -0,0 +1,783 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2018 Intel Corporation + +#include +#include +#include Is this include needed ? No it isn't. Will drop the line. +#include +#include +#include +#include + +#define TEMP_TYPE_PECI6 /* Sensor type 6: Intel PECI */ + +#define CORE_MAX_ON_HSX 18 /* Max number of cores on Haswell */ +#define CORE_MAX_ON_BDX 24 /* Max number of cores on Broadwell */ +#define CORE_MAX_ON_SKX 28 /* Max number of cores on Skylake */ + +#define DEFAULT_CHANNEL_NUMS 5 +#define CORETEMP_CHANNEL_NUMS CORE_MAX_ON_SKX +#define CPUTEMP_CHANNEL_NUMS (DEFAULT_CHANNEL_NUMS + CORETEMP_CHANNEL_NUMS) + +#define CLIENT_CPU_ID_MASK0xf0ff0 /* Mask for Family / Model info */ + +#define UPDATE_INTERVAL_MIN HZ + +enum cpu_gens { + CPU_GEN_HSX, /* Haswell Xeon */ + CPU_GEN_BRX, /* Broadwell Xeon */ + CPU_GEN_SKX, /* Skylake Xeon */ + CPU_GEN_MAX +}; + +struct cpu_gen_info { + u32 type; + u32 cpu_id; + u32 core_max; +}; + +struct temp_data { + bool valid; + s32 value; + unsigned long last_updated; +}; + +struct temp_group { + struct temp_data die; + struct temp_data dts_margin; + struct temp_data tcontrol; + struct temp_data tthrottle; + struct temp_data tjmax; + struct temp_data core[CORETEMP_CHANNEL_NUMS]; +}; + +struct peci
Re: [RFC bpf-next v2 7/8] bpf: add documentation for eBPF helpers (51-57)
2018-04-11 16:44 UTC+0100 ~ Quentin Monnet > 2018-04-10 09:58 UTC-0700 ~ Yonghong Song >> On 4/10/18 7:41 AM, Quentin Monnet wrote: >>> Add documentation for eBPF helper functions to bpf.h user header file. >>> This documentation can be parsed with the Python script provided in >>> another commit of the patch series, in order to provide a RST document >>> that can later be converted into a man page. >>> >>> The objective is to make the documentation easily understandable and >>> accessible to all eBPF developers, including beginners. >>> >>> This patch contains descriptions for the following helper functions: >>> >>> Helpers from Lawrence: >>> - bpf_setsockopt() >>> - bpf_getsockopt() >>> - bpf_sock_ops_cb_flags_set() >>> >>> Helpers from Yonghong: >>> - bpf_perf_event_read_value() >>> - bpf_perf_prog_read_value() >>> >>> Helper from Josef: >>> - bpf_override_return() >>> >>> Helper from Andrey: >>> - bpf_bind() >>> >>> Cc: Lawrence Brakmo >>> Cc: Yonghong Song >>> Cc: Josef Bacik >>> Cc: Andrey Ignatov >>> Signed-off-by: Quentin Monnet >>> --- > [...] > > Thanks Yonghong for the review! > > I have a favor to ask of you. I got a bounce for Kaixu Xia's email > address, and I don't know what alternative email address I could use. I > CC-ed to have a review for helper bpf_perf_event_read() (in patch 6 of > this series), which is rather close to bpf_perf_event_read_value(). > Would you mind having a look at that one too, please? The description is > not long. Well I read again the description I wrote, and actually the one for bpf_perf_evnet_read() is nearly a subset of the one for perf_event_read_value(). So the same comments that you raised earlier apply, there's probably nothing more to review. But if you notice that some important info is missing for bpf_perf_event_read(), I'm interested too! Quentin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC bpf-next v2 8/8] bpf: add documentation for eBPF helpers (58-64)
2018-04-11 12:17 UTC+0200 ~ Jesper Dangaard Brouer > On Tue, 10 Apr 2018 15:41:57 +0100 > Quentin Monnet wrote: > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index 7343af4196c8..db090ad03626 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -1250,6 +1250,51 @@ union bpf_attr { >> * Return >> * 0 on success, or a negative error in case of failure. >> * >> + * int bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags) >> + * Description >> + * Redirect the packet to the endpoint referenced by *map* at >> + * index *key*. Depending on its type, his *map* can contain >> + * references to net devices (for forwarding packets through other >> + * ports), or to CPUs (for redirecting XDP frames to another CPU; >> + * but this is not fully implemented as of this writing). > > Stating that CPUMAP redirect "is not fully implemented" is confusing. > The issue is that CPUMAP only works for "native" XDP. > > What about saying: > > "[...] or to CPUs (for redirecting XDP frames to another CPU; > but this is only implemented for native XDP as of this writing)" > Fine by me, I will change it. Thank you Jesper for the review! Quentin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC bpf-next v2 7/8] bpf: add documentation for eBPF helpers (51-57)
2018-04-10 10:50 UTC-0700 ~ Andrey Ignatov > Quentin Monnet [Tue, 2018-04-10 07:43 -0700]: >> + * int bpf_bind(struct bpf_sock_addr_kern *ctx, struct sockaddr *addr, int >> addr_len) >> + * Description >> + * Bind the socket associated to *ctx* to the address pointed by >> + * *addr*, of length *addr_len*. This allows for making outgoing >> + * connection from the desired IP address, which can be useful for >> + * example when all processes inside a cgroup should use one >> + * single IP address on a host that has multiple IP configured. >> + * >> + * This helper works for IPv4 and IPv6, TCP and UDP sockets. The >> + * domain (*addr*\ **->sa_family**) must be **AF_INET** (or >> + * **AF_INET6**). Looking for a free port to bind to can be >> + * expensive, therefore binding to port is not permitted by the >> + * helper: *addr*\ **->sin_port** (or **sin6_port**, respectively) >> + * must be set to zero. >> + * >> + * As for the remote end, both parts of it can be overridden, >> + * remote IP and remote port. This can be useful if an application >> + * inside a cgroup wants to connect to another application inside >> + * the same cgroup or to itself, but knows nothing about the IP >> + * address assigned to the cgroup. > > The last paragraph ("As for the remote end ...") is not relevant to > bpf_bind() and should be removed. It's about sys_connect hook itself > that can call to bpf_bind() but also has other functionality (and that > other functionality is described by this paragraph). Thanks Andrey, I will remove this paragraph. Quentin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC bpf-next v2 7/8] bpf: add documentation for eBPF helpers (51-57)
2018-04-10 09:58 UTC-0700 ~ Yonghong Song > On 4/10/18 7:41 AM, Quentin Monnet wrote: >> Add documentation for eBPF helper functions to bpf.h user header file. >> This documentation can be parsed with the Python script provided in >> another commit of the patch series, in order to provide a RST document >> that can later be converted into a man page. >> >> The objective is to make the documentation easily understandable and >> accessible to all eBPF developers, including beginners. >> >> This patch contains descriptions for the following helper functions: >> >> Helpers from Lawrence: >> - bpf_setsockopt() >> - bpf_getsockopt() >> - bpf_sock_ops_cb_flags_set() >> >> Helpers from Yonghong: >> - bpf_perf_event_read_value() >> - bpf_perf_prog_read_value() >> >> Helper from Josef: >> - bpf_override_return() >> >> Helper from Andrey: >> - bpf_bind() >> >> Cc: Lawrence Brakmo >> Cc: Yonghong Song >> Cc: Josef Bacik >> Cc: Andrey Ignatov >> Signed-off-by: Quentin Monnet >> --- >> include/uapi/linux/bpf.h | 184 >> +++ >> 1 file changed, 184 insertions(+) >> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index 15d9ccafebbe..7343af4196c8 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h [...] >> @@ -1255,6 +1277,168 @@ union bpf_attr { >> * performed again. >> * Return >> * 0 on success, or a negative error in case of failure. >> + * >> + * int bpf_perf_event_read_value(struct bpf_map *map, u64 flags, >> struct bpf_perf_event_value *buf, u32 buf_size) >> + * Description >> + * Read the value of a perf event counter, and store it into >> *buf* >> + * of size *buf_size*. This helper relies on a *map* of type >> + * **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of the perf >> + * event counter is selected at the creation of the *map*. The > > The nature of the perf event counter is selected when *map* is updated > with perf_event fd's. > Thanks, I will fix it. >> + * *map* is an array whose size is the number of available CPU >> + * cores, and each cell contains a value relative to one >> core. The > > It is confusing to mix core/cpu here. Maybe just use perf_event > convention, always using cpu? > Right, I'll remove occurrences of "core". >> + * value to retrieve is indicated by *flags*, that contains the >> + * index of the core to look up, masked with >> + * **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to >> + * **BPF_F_CURRENT_CPU** to indicate that the value for the >> + * current CPU core should be retrieved. >> + * >> + * This helper behaves in a way close to >> + * **bpf_perf_event_read**\ () helper, save that instead of >> + * just returning the value observed, it fills the *buf* >> + * structure. This allows for additional data to be >> retrieved: in >> + * particular, the enabled and running times (in *buf*\ >> + * **->enabled** and *buf*\ **->running**, respectively) are >> + * copied. >> + * >> + * These values are interesting, because hardware PMU >> (Performance >> + * Monitoring Unit) counters are limited resources. When >> there are >> + * more PMU based perf events opened than available counters, >> + * kernel will multiplex these events so each event gets certain >> + * percentage (but not all) of the PMU time. In case that >> + * multiplexing happens, the number of samples or counter value >> + * will not reflect the case compared to when no multiplexing >> + * occurs. This makes comparison between different runs >> difficult. >> + * Typically, the counter value should be normalized before >> + * comparing to other experiments. The usual normalization is >> done >> + * as follows. >> + * >> + * :: >> + * >> + * normalized_counter = counter * t_enabled / t_running >> + * >> + * Where t_enabled is the time enabled for event and >> t_running is >> + * the time running for event since last normalization. The >> + * enabled and running times are accumulated since the perf >> event >> + * open. To achieve scaling factor between two invocations of an >> + * eBPF program, users can can use CPU id as the key (which is >> + * typical for perf array usage model) to remember the previous >> + * value and do the calculation inside the eBPF program. >> + * Return >> + * 0 on success, or a negative error in case of failure. >> + * [...] Thanks Yonghong for the review! I have a favor to ask of you. I got a bounce for Kaixu Xia's email address, and I don't know what alternative email address I could use. I CC-ed to have a review for helper bpf_perf_event_read() (in patch 6 of this series), which is rather close to bpf_perf_event_read_value(). Wou
[RFC bpf-next v2 3/8] bpf: add documentation for eBPF helpers (12-22)
2018-04-10 15:43 UTC-0700 ~ Alexei Starovoitov > On Tue, Apr 10, 2018 at 03:41:52PM +0100, Quentin Monnet wrote: >> Add documentation for eBPF helper functions to bpf.h user header file. >> This documentation can be parsed with the Python script provided in >> another commit of the patch series, in order to provide a RST document >> that can later be converted into a man page. >> >> The objective is to make the documentation easily understandable and >> accessible to all eBPF developers, including beginners. >> >> This patch contains descriptions for the following helper functions, all >> writter by Alexei: >> >> - bpf_get_current_pid_tgid() >> - bpf_get_current_uid_gid() >> - bpf_get_current_comm() >> - bpf_skb_vlan_push() >> - bpf_skb_vlan_pop() >> - bpf_skb_get_tunnel_key() >> - bpf_skb_set_tunnel_key() >> - bpf_redirect() >> - bpf_perf_event_output() >> - bpf_get_stackid() >> - bpf_get_current_task() >> >> Cc: Alexei Starovoitov >> Signed-off-by: Quentin Monnet >> --- >> include/uapi/linux/bpf.h | 237 >> +++ >> 1 file changed, 237 insertions(+) >> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index 2bc653a3a20f..f3ea8824efbc 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -580,6 +580,243 @@ union bpf_attr { >> * performed again. >> * Return >> * 0 on success, or a negative error in case of failure. >> + * >> + * u64 bpf_get_current_pid_tgid(void) >> + * Return >> + * A 64-bit integer containing the current tgid and pid, and >> + * created as such: >> + * *current_task*\ **->tgid << 32 \|** >> + * *current_task*\ **->pid**. >> + * >> + * u64 bpf_get_current_uid_gid(void) >> + * Return >> + * A 64-bit integer containing the current GID and UID, and >> + * created as such: *current_gid* **<< 32 \|** *current_uid*. >> + * >> + * int bpf_get_current_comm(char *buf, u32 size_of_buf) >> + * Description >> + * Copy the **comm** attribute of the current task into *buf* of >> + * *size_of_buf*. The **comm** attribute contains the name of >> + * the executable (excluding the path) for the current task. The >> + * *size_of_buf* must be strictly positive. On success, the > > that reminds me that we probably should relax it to ARG_CONST_SIZE_OR_ZERO. > The programs won't be passing an actual zero into it, but it helps > a lot to tell verifier that zero is also valid, since programs > become much simpler. > Ok. No change to helper description for now, we will update here when your patch lands. >> + * helper makes sure that the *buf* is NUL-terminated. On failure, >> + * it is filled with zeroes. >> + * Return >> + * 0 on success, or a negative error in case of failure. >> + * >> + * int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 >> vlan_tci) >> + * Description >> + * Push a *vlan_tci* (VLAN tag control information) of protocol >> + * *vlan_proto* to the packet associated to *skb*, then update >> + * the checksum. Note that if *vlan_proto* is different from >> + * **ETH_P_8021Q** and **ETH_P_8021AD**, it is considered to >> + * be **ETH_P_8021Q**. >> + * >> + * A call to this helper is susceptible to change data from the >> + * packet. Therefore, at load time, all checks on pointers >> + * previously done by the verifier are invalidated and must be >> + * performed again. >> + * Return >> + * 0 on success, or a negative error in case of failure. >> + * >> + * int bpf_skb_vlan_pop(struct sk_buff *skb) >> + * Description >> + * Pop a VLAN header from the packet associated to *skb*. >> + * >> + * A call to this helper is susceptible to change data from the >> + * packet. Therefore, at load time, all checks on pointers >> + * previously done by the verifier are invalidated and must be >> + * performed again. >> + * Return >> + * 0 on success, or a negative error in case of failure. >> + * >> + * int bpf_skb_get_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key >> *key, u32 size, u64 flags) >> + * Description >> + * Get tunnel metadata. This helper takes a pointer *key* to an >> + * empty **struct bpf_tunnel_key** of **size**, that will be >> + * filled with tunnel metadata for the packet associated to *skb*. >> + * The *flags* can be set to **BPF_F_TUNINFO_IPV6**, which >> + * indicates that the tunnel is based on IPv6 protocol instead of >> + * IPv4. >> + * >> + * This is typically used on the receive path to perform a lookup >> + * or a packet redirection based on the value of *key*: > > above is correct, but feels a bit cryptic. > May be give more concrete example for particular tunneling protocol like gre > and say that tu
[RFC bpf-next v2 2/8] bpf: add documentation for eBPF helpers (01-11)
2018-04-10 10:56 UTC-0700 ~ Alexei Starovoitov > On Tue, Apr 10, 2018 at 03:41:51PM +0100, Quentin Monnet wrote: >> Add documentation for eBPF helper functions to bpf.h user header file. >> This documentation can be parsed with the Python script provided in >> another commit of the patch series, in order to provide a RST document >> that can later be converted into a man page. >> >> The objective is to make the documentation easily understandable and >> accessible to all eBPF developers, including beginners. >> >> This patch contains descriptions for the following helper functions, all >> written by Alexei: >> >> - bpf_map_lookup_elem() >> - bpf_map_update_elem() >> - bpf_map_delete_elem() >> - bpf_probe_read() >> - bpf_ktime_get_ns() >> - bpf_trace_printk() >> - bpf_skb_store_bytes() >> - bpf_l3_csum_replace() >> - bpf_l4_csum_replace() >> - bpf_tail_call() >> - bpf_clone_redirect() >> >> Cc: Alexei Starovoitov >> Signed-off-by: Quentin Monnet >> --- >> include/uapi/linux/bpf.h | 199 >> +++ >> 1 file changed, 199 insertions(+) >> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index 45f77f01e672..2bc653a3a20f 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -381,6 +381,205 @@ union bpf_attr { >> * intentional, removing them would break paragraphs for rst2man. >> * >> * Start of BPF helper function descriptions: >> + * >> + * void *bpf_map_lookup_elem(struct bpf_map *map, void *key) >> + * Description >> + * Perform a lookup in *map* for an entry associated to *key*. >> + * Return >> + * Map value associated to *key*, or **NULL** if no entry was >> + * found. >> + * >> + * int bpf_map_update_elem(struct bpf_map *map, void *key, void *value, u64 >> flags) >> + * Description >> + * Add or update the value of the entry associated to *key* in >> + * *map* with *value*. *flags* is one of: >> + * >> + * **BPF_NOEXIST** >> + * The entry for *key* must not exist in the map. >> + * **BPF_EXIST** >> + * The entry for *key* must already exist in the map. >> + * **BPF_ANY** >> + * No condition on the existence of the entry for *key*. >> + * >> + * These flags are only useful for maps of type >> + * **BPF_MAP_TYPE_HASH**. For all other map types, **BPF_ANY** >> + * should be used. > > I think that's not entirely accurate. > The flags work as expected for all other map types as well > and for lru map, sockmap, map in map the flags have practical use cases. > Ok, I missed that. I have to go back and check how the flags are used for those maps. I will cook up something cleaner for the next version of the set. >> + * Return >> + * 0 on success, or a negative error in case of failure. >> + * [...] >> + * >> + * int bpf_trace_printk(const char *fmt, u32 fmt_size, ...) >> + * Description >> + * This helper is a "printk()-like" facility for debugging. It >> + * prints a message defined by format *fmt* (of size *fmt_size*) >> + * to file *\/sys/kernel/debug/tracing/trace* from DebugFS, if >> + * available. It can take up to three additional **u64** >> + * arguments (as an eBPF helpers, the total number of arguments is >> + * limited to five). Each time the helper is called, it appends a >> + * line that looks like the following: >> + * >> + * :: >> + * >> + * telnet-470 [001] .N.. 419421.045894: 0x0001: BPF >> command: 2 >> + * >> + * In the above: >> + * >> + * * ``telnet`` is the name of the current task. >> + * * ``470`` is the PID of the current task. >> + * * ``001`` is the CPU number on which the task is >> + *running. >> + * * In ``.N..``, each character refers to a set of >> + *options (whether irqs are enabled, scheduling >> + *options, whether hard/softirqs are running, level of >> + *preempt_disabled respectively). **N** means that >> + ***TIF_NEED_RESCHED** and **PREEMPT_NEED_RESCHED** >> + *are set. >> + * * ``419421.045894`` is a timestamp. >> + * * ``0x0001`` is a fake value used by BPF for the >> + *instruction pointer register. >> + * * ``BPF command: 2`` is the message formatted with >> + **fmt*. > > the above depends on how trace_pipe was configured. It's a default > configuration for many, but would be good to explain this a bit better. > I did not know about that. Would you have a pointer about how to configure trace_pipe, please? >> + * >> + * The conversion specifiers supported by *fmt* are similar, but >> + * more limited tha
[RFC bpf-next v2 1/8] bpf: add script and prepare bpf.h for new helpers documentation
2018-04-10 11:16 UTC-0700 ~ Alexei Starovoitov > On Tue, Apr 10, 2018 at 03:41:50PM +0100, Quentin Monnet wrote: >> Remove previous "overview" of eBPF helpers from user bpf.h header. >> Replace it by a comment explaining how to process the new documentation >> (to come in following patches) with a Python script to produce RST, then >> man page documentation. >> >> Also add the aforementioned Python script under scripts/. It is used to >> process include/uapi/linux/bpf.h and to extract helper descriptions, to >> turn it into a RST document that can further be processed with rst2man >> to produce a man page. The script takes one "--filename " >> option. If the script is launched from scripts/ in the kernel root >> directory, it should be able to find the location of the header to >> parse, and "--filename " is then optional. If it cannot >> find the file, then the option becomes mandatory. RST-formatted >> documentation is printed to standard output. >> >> Typical workflow for producing the final man page would be: >> >> $ ./scripts/bpf_helpers_doc.py \ >> --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst >> $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7 >> $ man /tmp/bpf-helpers.7 >> >> Note that the tool kernel-doc cannot be used to document eBPF helpers, >> whose signatures are not available directly in the header files >> (pre-processor directives are used to produce them at the beginning of >> the compilation process). >> >> Signed-off-by: Quentin Monnet >> --- >> include/uapi/linux/bpf.h | 406 >> ++-- >> scripts/bpf_helpers_doc.py | 414 >> + >> 2 files changed, 430 insertions(+), 390 deletions(-) >> create mode 100755 scripts/bpf_helpers_doc.py >> [...] >> diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py >> new file mode 100755 >> index ..3a15ba3f0a83 >> --- /dev/null >> +++ b/scripts/bpf_helpers_doc.py >> @@ -0,0 +1,414 @@ >> +#!/usr/bin/python3 >> +# >> +# Copyright (C) 2018 Netronome Systems, Inc. >> +# >> +# This software is licensed under the GNU General License Version 2, >> +# June 1991 as shown in the file COPYING in the top-level directory of this >> +# source tree. > > please use SPDX instead. > Same as for bpftool, our layers remain a bit cautious about it. I'd be happy to change it for SPDX as a follow-up when I get the green light. >> +# >> +# THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" >> +# WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, >> +# BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS >> +# FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND >> PERFORMANCE >> +# OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME >> +# THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. >> + [...] >> +class PrinterRST(Printer): >> +""" >> +A printer for dumping collected information about helpers as a >> ReStructured >> +Text page compatible with the rst2man program, which can be used to >> +generate a manual page for the helpers. >> +@helpers: array of Helper objects to print to standard output >> +""" >> +def print_header(self): >> +header = '''\ >> +.. Copyright (C) 2018 Netronome Systems, Inc. > > I think would be good to capture copyrights of all authors that added > the helpers being documented. Since a lot of text was copied from commit > logs it's only fair to preserve the copyrights. > Such man page file is automatically generated by the python script > and script itself is copyrighted by Netronome. That's fine, but the text > of man page is not netronome only. > I'm not sure what would be the solution. May be something like: > " > Copyright (C) All BPF authors and contributors from 2011 to present > See git log include/uapi/linux/bpf.h for details > " > ? Seems fair indeed. I do not have a better suggestion myself, so I will stick to yours. Out of curiosity, why 2011 for the year? I thought you introduced eBPF in the kernel in 2014 (bd4cf0ed331a), and I do not believe helpers have any link with cBPF? >> +.. >> +.. %%%LICENSE_START(VERBATIM) >> +.. Permission is granted to make and distribute verbatim copies of this >> +.. manual provided the copyright notice and this permission notice are >> +.. preserved on all copies. >> +.. >> +.. Permission is granted to copy and distribute modified versions of this >> +.. manual under the conditions for verbatim copying, provided that the >> +.. entire resulting derived work is distributed under the terms of a >> +.. permission notice identical to this one. >> +.. >> +.. Since the Linux kernel and libraries are constantly changing, this >> +.. manual page may be incorrect or out-of-date. The author(s) assume no >> +.. responsibility for errors or omissions, or for damages resulting from >> +.. the use of the information c
[RFC tip/locking/lockdep v6 01/20] lockdep/Documention: Recursive read lock detection reasoning
This patch add the documentation piece for the reasoning of deadlock detection related to recursive read lock. The following sections are added: * Explain what is a recursive read lock, and what deadlock cases they could introduce. * Introduce the notations for different types of dependencies, and the definition of strong paths. * Proof for a closed strong path is both sufficient and necessary for deadlock detections with recursive read locks involved. The proof could also explain why we call the path "strong" Signed-off-by: Boqun Feng --- Documentation/locking/lockdep-design.txt | 178 +++ 1 file changed, 178 insertions(+) diff --git a/Documentation/locking/lockdep-design.txt b/Documentation/locking/lockdep-design.txt index 9de1c158d44c..6bb9e90e2c4f 100644 --- a/Documentation/locking/lockdep-design.txt +++ b/Documentation/locking/lockdep-design.txt @@ -284,3 +284,181 @@ Run the command and save the output, then compare against the output from a later run of this command to identify the leakers. This same output can also help you find situations where runtime lock initialization has been omitted. + +Recursive read locks: +- + +Lockdep now is equipped with deadlock detection for recursive read locks. + +Recursive read locks, as their name indicates, are the locks able to be +acquired recursively. Unlike non-recursive read locks, recursive read locks +only get blocked by current write lock *holders* other than write lock +*waiters*, for example: + + TASK A: TASK B: + + read_lock(X); + + write_lock(X); + + read_lock(X); + +is not a deadlock for recursive read locks, as while the task B is waiting for +the lock X, the second read_lock() doesn't need to wait because it's a recursive +read lock. However if the read_lock() is non-recursive read lock, then the above +case is a deadlock, because even if the write_lock() in TASK B can not get the +lock, but it can block the second read_lock() in TASK A. + +Note that a lock can be a write lock (exclusive lock), a non-recursive read +lock (non-recursive shared lock) or a recursive read lock (recursive shared +lock), depending on the lock operations used to acquire it (more specifically, +the value of the 'read' parameter for lock_acquire()). In other words, a single +lock instance has three types of acquisition depending on the acquisition +functions: exclusive, non-recursive read, and recursive read. + +To be concise, we call that write locks and non-recursive read locks as +"non-recursive" locks and recursive read locks as "recursive" locks. + +Recursive locks don't block each other, while non-recursive locks do (this is +even true for two non-recursive read locks). A non-recursive lock can block the +corresponding recursive lock, and vice versa. + +A deadlock case with recursive locks involved is as follow: + + TASK A: TASK B: + + read_lock(X); + read_lock(Y); + write_lock(Y); + write_lock(X); + +Task A is waiting for task B to read_unlock() Y and task B is waiting for task +A to read_unlock() X. + +Dependency types and strong dependency paths: +- +In order to detect deadlocks as above, lockdep needs to track different dependencies. +There are 4 categories for dependency edges in the lockdep graph: + +1) -(NN)->: non-recursive to non-recursive dependency. "X -(NN)-> Y" means +X -> Y and both X and Y are non-recursive locks. + +2) -(RN)->: recursive to non-recursive dependency. "X -(RN)-> Y" means +X -> Y and X is recursive read lock and Y is non-recursive lock. + +3) -(NR)->: non-recursive to recursive dependency, "X -(NR)-> Y" means +X -> Y and X is non-recursive lock and Y is recursive lock. + +4) -(RR)->: recursive to recursive dependency, "X -(RR)-> Y" means +X -> Y and both X and Y are recursive locks. + +Note that given two locks, they may have multiple dependencies between them, for example: + + TASK A: + + read_lock(X); + write_lock(Y); + ... + + TASK B: + + write_lock(X); + write_lock(Y); + +, we have both X -(RN)-> Y and X -(NN)-> Y in the dependency graph. + +We use -(*N)-> for edges that is either -(RN)-> or -(NN)->, the similar for -(N*)->, +-(*R)-> and -(R*)-> + +A "path" is a series of conjunct dependency edges in the graph. And we define a +"strong" path, which indicates the strong dependency throughout each dependency +in the path, as the path that doesn't have two conjunct edges (dependencies) as +-(*R)-> and -(R*)->. In other words, a "strong" path is a path from a lock +walking to another through the lock dependencies, and if X -> Y -> Z in the +path (where X, Y, Z are locks), if the walk from X to Y is through a -(NR)-> or +-(RR)-> dependency,
[RFC tip/locking/lockdep v6 04/20] lockdep: Redefine LOCK_*_STATE* bits
There are three types of lock acquisitions: write, non-recursive read and recursive read, among which write locks and non-recursive read locks have no difference from a viewpoint for deadlock detections, because a write acquisition of the corresponding lock on an independent CPU or task makes a non-recursive read lock act as a write lock in the sense of deadlock. So we could treat them as the same type (named as "non-recursive lock") in lockdep. As in the irq lock inversion detection (safe->unsafe deadlock detection), we used to differ write lock with read lock (non-recursive and recursive ones), such a classification could be improved as non-recursive read lock behaves the same as write lock, so this patch redefines the meanings of LOCK_{USED_IN, ENABLED}_STATE*. old: LOCK_* : stands for write lock LOCK_*_READ: stands for read lock(non-recursive and recursive) new: LOCK_* : stands for non-recursive(write lock and non-recursive read lock) LOCK_*_RR: stands for recursive read lock Such a change is needed for a future improvement on recursive read related irq inversion deadlock detection. Signed-off-by: Boqun Feng --- Documentation/locking/lockdep-design.txt | 6 +++--- kernel/locking/lockdep.c | 28 ++-- kernel/locking/lockdep_internals.h | 16 kernel/locking/lockdep_proc.c| 12 ++-- 4 files changed, 31 insertions(+), 31 deletions(-) diff --git a/Documentation/locking/lockdep-design.txt b/Documentation/locking/lockdep-design.txt index 6bb9e90e2c4f..53ede30ce16d 100644 --- a/Documentation/locking/lockdep-design.txt +++ b/Documentation/locking/lockdep-design.txt @@ -30,9 +30,9 @@ State The validator tracks lock-class usage history into 4n + 1 separate state bits: - 'ever held in STATE context' -- 'ever held as readlock in STATE context' +- 'ever held as recursive readlock in STATE context' - 'ever held with STATE enabled' -- 'ever held as readlock with STATE enabled' +- 'ever held as recurisve readlock with STATE enabled' Where STATE can be either one of (kernel/locking/lockdep_states.h) - hardirq @@ -51,7 +51,7 @@ locking error messages, inside curlies. A contrived example: (&sio_locks[i].lock){-.-...}, at: [] mutex_lock+0x21/0x24 -The bit position indicates STATE, STATE-read, for each of the states listed +The bit position indicates STATE, STATE-RR, for each of the states listed above, and the character displayed in each indicates: '.' acquired while irqs disabled and not in irq context diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index f39a071ef0a8..14af2327b52a 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -448,10 +448,10 @@ DEFINE_PER_CPU(struct lockdep_stats, lockdep_stats); */ #define __USAGE(__STATE) \ - [LOCK_USED_IN_##__STATE] = "IN-"__stringify(__STATE)"-W", \ - [LOCK_ENABLED_##__STATE] = __stringify(__STATE)"-ON-W", \ - [LOCK_USED_IN_##__STATE##_READ] = "IN-"__stringify(__STATE)"-R",\ - [LOCK_ENABLED_##__STATE##_READ] = __stringify(__STATE)"-ON-R", + [LOCK_USED_IN_##__STATE] = "IN-"__stringify(__STATE), \ + [LOCK_ENABLED_##__STATE] = __stringify(__STATE)"-ON", \ + [LOCK_USED_IN_##__STATE##_RR] = "IN-"__stringify(__STATE)"-RR", \ + [LOCK_ENABLED_##__STATE##_RR] = __stringify(__STATE)"-ON-RR", static const char *usage_str[] = { @@ -492,7 +492,7 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS]) #define LOCKDEP_STATE(__STATE) \ usage[i++] = get_usage_char(class, LOCK_USED_IN_##__STATE); \ - usage[i++] = get_usage_char(class, LOCK_USED_IN_##__STATE##_READ); + usage[i++] = get_usage_char(class, LOCK_USED_IN_##__STATE##_RR); #include "lockdep_states.h" #undef LOCKDEP_STATE @@ -1645,7 +1645,7 @@ static const char *state_names[] = { static const char *state_rnames[] = { #define LOCKDEP_STATE(__STATE) \ - __stringify(__STATE)"-READ", + __stringify(__STATE)"-RR", #include "lockdep_states.h" #undef LOCKDEP_STATE }; @@ -3039,14 +3039,14 @@ static int mark_irqflags(struct task_struct *curr, struct held_lock *hlock) * mark the lock as used in these contexts: */ if (!hlock->trylock) { - if (hlock->read) { + if (hlock->read == 2) { if (curr->hardirq_context) if (!mark_lock(curr, hlock, - LOCK_USED_IN_HARDIRQ_READ)) + LOCK_USED_IN_HARDIRQ_RR)) return 0; if (curr->softirq_context) if (!mark_lock(curr, hlock, - LOCK
Re: [PATCH v3 06/10] drivers/peci: Add a PECI adapter driver for Aspeed AST24xx/AST25xx
Hello Jae, On 11 April 2018 at 04:02, Jae Hyun Yoo wrote: > This commit adds PECI adapter driver implementation for Aspeed > AST24xx/AST25xx. The driver is looking good! It looks like you've done some kind of review that we weren't allowed to see, which is a double edged sword - I might be asking about things that you've already spoken about with someone else. I'm only just learning about PECI, but I do have some general comments below. > --- > drivers/peci/Kconfig | 28 +++ > drivers/peci/Makefile | 3 + > drivers/peci/peci-aspeed.c | 504 > + > 3 files changed, 535 insertions(+) > create mode 100644 drivers/peci/peci-aspeed.c > > diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig > index 1fbc13f9e6c2..0e33420365de 100644 > --- a/drivers/peci/Kconfig > +++ b/drivers/peci/Kconfig > @@ -14,4 +14,32 @@ config PECI > processors and chipset components to external monitoring or control > devices. > > + If you want PECI support, you should say Y here and also to the > + specific driver for your bus adapter(s) below. > + > +if PECI > + > +# > +# PECI hardware bus configuration > +# > + > +menu "PECI Hardware Bus support" > + > +config PECI_ASPEED > + tristate "Aspeed AST24xx/AST25xx PECI support" I think just saying ASPEED PECI support is enough. That way if the next ASPEED SoC happens to have PECI we don't need to update all of the help text :) > + select REGMAP_MMIO > + depends on OF > + depends on ARCH_ASPEED || COMPILE_TEST > + help > + Say Y here if you want support for the Platform Environment Control > + Interface (PECI) bus adapter driver on the Aspeed AST24XX and > AST25XX > + SoCs. > + > + This support is also available as a module. If so, the module > + will be called peci-aspeed. > + > +endmenu > + > +endif # PECI > + > endmenu > diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile > index 9e8615e0d3ff..886285e69765 100644 > --- a/drivers/peci/Makefile > +++ b/drivers/peci/Makefile > @@ -4,3 +4,6 @@ > > # Core functionality > obj-$(CONFIG_PECI) += peci-core.o > + > +# Hardware specific bus drivers > +obj-$(CONFIG_PECI_ASPEED) += peci-aspeed.o > diff --git a/drivers/peci/peci-aspeed.c b/drivers/peci/peci-aspeed.c > new file mode 100644 > index ..be2a1f327eb1 > --- /dev/null > +++ b/drivers/peci/peci-aspeed.c > @@ -0,0 +1,504 @@ > +// SPDX-License-Identifier: GPL-2.0 > +// Copyright (C) 2012-2017 ASPEED Technology Inc. > +// Copyright (c) 2018 Intel Corporation > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define DUMP_DEBUG 0 > + > +/* Aspeed PECI Registers */ > +#define AST_PECI_CTRL 0x00 Nit: we use ASPEED instead of AST in the upstream kernel to distingush from the aspeed sdk drivers. If you feel strongly about this then I won't insist you change. > +#define AST_PECI_TIMING 0x04 > +#define AST_PECI_CMD 0x08 > +#define AST_PECI_CMD_CTRL 0x0c > +#define AST_PECI_EXP_FCS 0x10 > +#define AST_PECI_CAP_FCS 0x14 > +#define AST_PECI_INT_CTRL 0x18 > +#define AST_PECI_INT_STS 0x1c > +#define AST_PECI_W_DATA0 0x20 > +#define AST_PECI_W_DATA1 0x24 > +#define AST_PECI_W_DATA2 0x28 > +#define AST_PECI_W_DATA3 0x2c > +#define AST_PECI_R_DATA0 0x30 > +#define AST_PECI_R_DATA1 0x34 > +#define AST_PECI_R_DATA2 0x38 > +#define AST_PECI_R_DATA3 0x3c > +#define AST_PECI_W_DATA4 0x40 > +#define AST_PECI_W_DATA5 0x44 > +#define AST_PECI_W_DATA6 0x48 > +#define AST_PECI_W_DATA7 0x4c > +#define AST_PECI_R_DATA4 0x50 > +#define AST_PECI_R_DATA5 0x54 > +#define AST_PECI_R_DATA6 0x58 > +#define AST_PECI_R_DATA7 0x5c > + > +/* AST_PECI_CTRL - 0x00 : Control Register */ > +#define PECI_CTRL_SAMPLING_MASK GENMASK(19, 16) > +#define PECI_CTRL_SAMPLING(x) (((x) << 16) & PECI_CTRL_SAMPLING_MASK) > +#define PECI_CTRL_SAMPLING_GET(x) (((x) & PECI_CTRL_SAMPLING_MASK) >> 16) > +#define PECI_CTRL_READ_MODE_MASKGENMASK(13, 12) > +#define PECI_CTRL_READ_MODE(x) (((x) << 12) & PECI_CTRL_READ_MODE_MASK) > +#define PECI_CTRL_READ_MODE_GET(x) (((x) & PECI_CTRL_READ_MODE_MASK) >> 12) > +#define PECI_CTRL_READ_MODE_COUNT BIT(12) > +#define PECI_CTRL_READ_MODE_DBG BIT(13) > +#define PECI_CTRL_CLK_SOURCE_MASK BIT(11) > +#define PECI_CTRL_CLK_SOURCE(x) (((x) << 11) & PECI_CTRL_CLK_SOURCE_MASK) > +#define PECI_CTRL_CLK_SOURCE_GET(x) (((x) & PECI_CTRL_CLK_SOURCE_MASK) >> 11) > +#define PECI_CTRL_CLK_DIV_MASK GENMASK(10, 8) > +#define PECI_CTRL_CLK_DIV(x)(((x) << 8) & PECI_CTRL_CLK_DIV_MASK) > +#define PECI_CTRL_CLK_DIV_GET(x)(((x) & PECI_CTRL_CLK_DIV_MASK) >> 8) > +#define PECI_CTRL_INVERT_OUTBIT(7) > +#define PECI_CTRL_INVERT_IN BIT(6) > +#define PECI_CTRL_BUS_CONTENT_ENBIT(5) > +#define PECI_CTRL_PECI_EN BIT(4) > +#define PECI_CTRL_PECI_CLK
Re: [PATCH v3 01/10] Documentations: dt-bindings: Add documents of generic PECI bus, adapter and client drivers
Hi Jae, On 11 April 2018 at 04:02, Jae Hyun Yoo wrote: > This commit adds documents of generic PECI bus, adapter and client drivers. > > Signed-off-by: Jae Hyun Yoo > Reviewed-by: Haiyue Wang > Reviewed-by: James Feist > Reviewed-by: Vernon Mauery > Cc: Alan Cox > Cc: Andrew Jeffery > Cc: Andrew Lunn > Cc: Andy Shevchenko > Cc: Arnd Bergmann > Cc: Benjamin Herrenschmidt > Cc: Fengguang Wu > Cc: Greg KH > Cc: Guenter Roeck > Cc: Jason M Biils > Cc: Jean Delvare > Cc: Joel Stanley > Cc: Julia Cartwright > Cc: Miguel Ojeda > Cc: Milton Miller II > Cc: Pavel Machek > Cc: Randy Dunlap > Cc: Stef van Os > Cc: Sumeet R Pawnikar That's a hefty cc list. I can't see Rob Herring though, and he's usually the person who you need to convince to get your bindings accepted. I recommend using ./scripts/get_maintainers.pl to build your CC list, and then add others you think are relevant. I'm not sure what the guidelines are for generic bindings, so I'll defer to Rob for this patch. Cheers, Joel > --- > .../devicetree/bindings/peci/peci-adapter.txt | 23 > .../devicetree/bindings/peci/peci-bus.txt | 15 + > .../devicetree/bindings/peci/peci-client.txt | 25 > ++ > 3 files changed, 63 insertions(+) > create mode 100644 Documentation/devicetree/bindings/peci/peci-adapter.txt > create mode 100644 Documentation/devicetree/bindings/peci/peci-bus.txt > create mode 100644 Documentation/devicetree/bindings/peci/peci-client.txt > > diff --git a/Documentation/devicetree/bindings/peci/peci-adapter.txt > b/Documentation/devicetree/bindings/peci/peci-adapter.txt > new file mode 100644 > index ..9221374f6b11 > --- /dev/null > +++ b/Documentation/devicetree/bindings/peci/peci-adapter.txt > @@ -0,0 +1,23 @@ > +Generic device tree configuration for PECI adapters. > + > +Required properties: > +- compatible : Should contain hardware specific definition strings that > can > + match an adapter driver implementation. > +- reg: Should contain PECI controller registers location and > length. > +- #address-cells : Should be <1>. > +- #size-cells: Should be <0>. > + > +Example: > + peci: peci@1000 { > + compatible = "simple-bus"; > + #address-cells = <1>; > + #size-cells = <1>; > + ranges = <0x0 0x1000 0x1000>; > + > + peci0: peci-bus@0 { > + compatible = "soc,soc-peci"; > + reg = <0x0 0x1000>; > + #address-cells = <1>; > + #size-cells = <0>; > + }; > + }; > diff --git a/Documentation/devicetree/bindings/peci/peci-bus.txt > b/Documentation/devicetree/bindings/peci/peci-bus.txt > new file mode 100644 > index ..90bcc791ccb0 > --- /dev/null > +++ b/Documentation/devicetree/bindings/peci/peci-bus.txt > @@ -0,0 +1,15 @@ > +Generic device tree configuration for PECI buses. > + > +Required properties: > +- compatible : Should be "simple-bus". > +- #address-cells : Should be <1>. > +- #size-cells: Should be <1>. > +- ranges : Should contain PECI controller registers ranges. > + > +Example: > + peci: peci@1000 { > + compatible = "simple-bus"; > + #address-cells = <1>; > + #size-cells = <1>; > + ranges = <0x0 0x1000 0x1000>; > + }; > diff --git a/Documentation/devicetree/bindings/peci/peci-client.txt > b/Documentation/devicetree/bindings/peci/peci-client.txt > new file mode 100644 > index ..8e2bfd8532f6 > --- /dev/null > +++ b/Documentation/devicetree/bindings/peci/peci-client.txt > @@ -0,0 +1,25 @@ > +Generic device tree configuration for PECI clients. > + > +Required properties: > +- compatible : Should contain target device specific definition strings that > can > + match a client driver implementation. > +- reg: Should contain address of a client CPU. Address range of CPU > + clients is starting from 0x30 based on PECI specification. > + <0x30> .. <0x37> (depends on the PECI_OFFSET_MAX definition) > + > +Example: > + peci-bus@0 { > + #address-cells = <1>; > + #size-cells = <0>; > + < more properties > > + > + function@cpu0 { > + compatible = "device,function"; > + reg = <0x30>; > + }; > + > + function@cpu1 { > + compatible = "device,function"; > + reg = <0x31>; > + }; > + }; > -- > 2.16.2 > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 04/10] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
On 11 April 2018 at 04:02, Jae Hyun Yoo wrote: > This commit adds a dt-bindings document of PECI adapter driver for Aspeed We try to capitalise ASPEED. > AST24xx/25xx SoCs. > --- > .../devicetree/bindings/peci/peci-aspeed.txt | 60 > ++ > 1 file changed, 60 insertions(+) > create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.txt > > diff --git a/Documentation/devicetree/bindings/peci/peci-aspeed.txt > b/Documentation/devicetree/bindings/peci/peci-aspeed.txt > new file mode 100644 > index ..4598bb8c20fa > --- /dev/null > +++ b/Documentation/devicetree/bindings/peci/peci-aspeed.txt > @@ -0,0 +1,60 @@ > +Device tree configuration for PECI buses on the AST24XX and AST25XX SoCs. > + > +Required properties: > +- compatible: Should be "aspeed,ast2400-peci" or > "aspeed,ast2500-peci" > + - aspeed,ast2400-peci: Aspeed AST2400 family PECI > +controller > + - aspeed,ast2500-peci: Aspeed AST2500 family PECI > +controller > +- reg : Should contain PECI controller registers location and > + length. > +- #address-cells: Should be <1>. > +- #size-cells : Should be <0>. > +- interrupts: Should contain PECI controller interrupt. > +- clocks: Should contain clock source for PECI controller. > + Should reference clkin. Are you sure that this is driven by clkin? Most peripherals on the Aspeed are attached to the apb, so should reference that clock. > +- clock_frequency : Should contain the operation frequency of PECI > controller > + in units of Hz. > + 187500 ~ 2400 Can you explain why you need both the parent clock and this frequency to be specified? > + > +Optional properties: > +- msg-timing-nego : Message timing negotiation period. This value will Perhaps msg-timing-period? Or just msg-timing? > + determine the period of message timing negotiation to be > + issued by PECI controller. The unit of the programmed > + value is four times of PECI clock period. > + 0 ~ 255 (default: 1) > +- addr-timing-nego : Address timing negotiation period. This value will > + determine the period of address timing negotiation to be > + issued by PECI controller. The unit of the programmed > + value is four times of PECI clock period. > + 0 ~ 255 (default: 1) > +- rd-sampling-point : Read sampling point selection. The whole period of a > bit > + time will be divided into 16 time frames. This value > will > + determine the time frame in which the controller will > + sample PECI signal for data read back. Usually in the > + middle of a bit time is the best. > + 0 ~ 15 (default: 8) > +- cmd_timeout_ms: Command timeout in units of ms. > + 1 ~ 6 (default: 1000) > + > +Example: > + peci: peci@1e78b000 { > + compatible = "simple-bus"; > + #address-cells = <1>; > + #size-cells = <1>; > + ranges = <0x0 0x1e78b000 0x60>; > + > + peci0: peci-bus@0 { > + compatible = "aspeed,ast2500-peci"; > + reg = <0x0 0x60>; > + #address-cells = <1>; > + #size-cells = <0>; > + interrupts = <15>; > + clocks = <&clk_clkin>; > + clock-frequency = <2400>; > + msg-timing-nego = <1>; > + addr-timing-nego = <1>; > + rd-sampling-point = <8>; > + cmd-timeout-ms = <1000>; > + }; > + }; > -- > 2.16.2 > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 05/10] ARM: dts: aspeed: peci: Add PECI node
On 11 April 2018 at 04:02, Jae Hyun Yoo wrote: > This commit adds PECI bus/adapter node of AST24xx/AST25xx into > aspeed-g4 and aspeed-g5. > The patches to the device trees get merged by the ASPEED maintainer (me). Once you have the bindings reviewed you can send the patches to me and the linux-aspeed list (I've got a pending patch to maintainers that will ensure get_maintainers.pl does the right thing as far as email addresses go). I'd suggest dropping it from your series and re-sending once the bindings and driver are reviewed. Cheers, Joel -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] mm: remove odd HAVE_PTE_SPECIAL
On Wed 11-04-18 12:32:07, Laurent Dufour wrote: [...] > Andrew, should I send a v4 or could you wipe the 2 __maybe_unsued when > applying > the patch ? A follow $patch-fix should be better rather than post this again and spam people with more emails. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] mm: remove odd HAVE_PTE_SPECIAL
On 11/04/2018 11:09, Christophe LEROY wrote: > > > Le 11/04/2018 à 11:03, Laurent Dufour a écrit : >> >> >> On 11/04/2018 10:58, Christophe LEROY wrote: >>> >>> >>> Le 11/04/2018 à 10:03, Laurent Dufour a écrit : Remove the additional define HAVE_PTE_SPECIAL and rely directly on CONFIG_ARCH_HAS_PTE_SPECIAL. There is no functional change introduced by this patch Signed-off-by: Laurent Dufour --- mm/memory.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 96910c625daa..7f7dc7b2a341 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -817,17 +817,12 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, * PFNMAP mappings in order to support COWable mappings. * */ -#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL -# define HAVE_PTE_SPECIAL 1 -#else -# define HAVE_PTE_SPECIAL 0 -#endif struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte, bool with_public_device) { unsigned long pfn = pte_pfn(pte); - if (HAVE_PTE_SPECIAL) { + if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) { if (likely(!pte_special(pte))) goto check_pfn; if (vma->vm_ops && vma->vm_ops->find_special_page) @@ -862,7 +857,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, return NULL; } - /* !HAVE_PTE_SPECIAL case follows: */ + /* !CONFIG_ARCH_HAS_PTE_SPECIAL case follows: */ if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { if (vma->vm_flags & VM_MIXEDMAP) { @@ -881,7 +876,8 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, if (is_zero_pfn(pfn)) return NULL; -check_pfn: + +check_pfn: __maybe_unused >>> >>> See below >>> if (unlikely(pfn > highest_memmap_pfn)) { print_bad_pte(vma, addr, pte, NULL); return NULL; @@ -891,7 +887,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, * NOTE! We still have PageReserved() pages in the page tables. * eg. VDSO mappings can cause them to exist. */ -out: +out: __maybe_unused >>> >>> Why do you need that change ? >>> >>> There is no reason for the compiler to complain. It would complain if the >>> goto >>> was within a #ifdef, but all the purpose of using IS_ENABLED() is to allow >>> the >>> compiler to properly handle all possible cases. That's all the force of >>> IS_ENABLED() compared to ifdefs, and that the reason why they are >>> plebicited, >>> ref Linux Codying style for a detailed explanation. >> >> Fair enough. >> >> Should I submit a v4 just to remove these so ugly __maybe_unused ? >> > > Most likely, unless the mm maintainer agrees to remove them by himself when > applying your patch ? That was my point. Andrew, should I send a v4 or could you wipe the 2 __maybe_unsued when applying the patch ? Thanks, Laurent. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC bpf-next v2 8/8] bpf: add documentation for eBPF helpers (58-64)
On Tue, 10 Apr 2018 15:41:57 +0100 Quentin Monnet wrote: > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 7343af4196c8..db090ad03626 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -1250,6 +1250,51 @@ union bpf_attr { > * Return > * 0 on success, or a negative error in case of failure. > * > + * int bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags) > + * Description > + * Redirect the packet to the endpoint referenced by *map* at > + * index *key*. Depending on its type, his *map* can contain > + * references to net devices (for forwarding packets through other > + * ports), or to CPUs (for redirecting XDP frames to another CPU; > + * but this is not fully implemented as of this writing). Stating that CPUMAP redirect "is not fully implemented" is confusing. The issue is that CPUMAP only works for "native" XDP. What about saying: "[...] or to CPUs (for redirecting XDP frames to another CPU; but this is only implemented for native XDP as of this writing)" -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] mm: remove odd HAVE_PTE_SPECIAL
Le 11/04/2018 à 11:03, Laurent Dufour a écrit : On 11/04/2018 10:58, Christophe LEROY wrote: Le 11/04/2018 à 10:03, Laurent Dufour a écrit : Remove the additional define HAVE_PTE_SPECIAL and rely directly on CONFIG_ARCH_HAS_PTE_SPECIAL. There is no functional change introduced by this patch Signed-off-by: Laurent Dufour --- mm/memory.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 96910c625daa..7f7dc7b2a341 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -817,17 +817,12 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, * PFNMAP mappings in order to support COWable mappings. * */ -#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL -# define HAVE_PTE_SPECIAL 1 -#else -# define HAVE_PTE_SPECIAL 0 -#endif struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte, bool with_public_device) { unsigned long pfn = pte_pfn(pte); - if (HAVE_PTE_SPECIAL) { + if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) { if (likely(!pte_special(pte))) goto check_pfn; if (vma->vm_ops && vma->vm_ops->find_special_page) @@ -862,7 +857,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, return NULL; } - /* !HAVE_PTE_SPECIAL case follows: */ + /* !CONFIG_ARCH_HAS_PTE_SPECIAL case follows: */ if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { if (vma->vm_flags & VM_MIXEDMAP) { @@ -881,7 +876,8 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, if (is_zero_pfn(pfn)) return NULL; -check_pfn: + +check_pfn: __maybe_unused See below if (unlikely(pfn > highest_memmap_pfn)) { print_bad_pte(vma, addr, pte, NULL); return NULL; @@ -891,7 +887,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, * NOTE! We still have PageReserved() pages in the page tables. * eg. VDSO mappings can cause them to exist. */ -out: +out: __maybe_unused Why do you need that change ? There is no reason for the compiler to complain. It would complain if the goto was within a #ifdef, but all the purpose of using IS_ENABLED() is to allow the compiler to properly handle all possible cases. That's all the force of IS_ENABLED() compared to ifdefs, and that the reason why they are plebicited, ref Linux Codying style for a detailed explanation. Fair enough. Should I submit a v4 just to remove these so ugly __maybe_unused ? Most likely, unless the mm maintainer agrees to remove them by himself when applying your patch ? Christophe -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] mm: remove odd HAVE_PTE_SPECIAL
On 11/04/2018 10:58, Christophe LEROY wrote: > > > Le 11/04/2018 à 10:03, Laurent Dufour a écrit : >> Remove the additional define HAVE_PTE_SPECIAL and rely directly on >> CONFIG_ARCH_HAS_PTE_SPECIAL. >> >> There is no functional change introduced by this patch >> >> Signed-off-by: Laurent Dufour >> --- >> mm/memory.c | 19 --- >> 1 file changed, 8 insertions(+), 11 deletions(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index 96910c625daa..7f7dc7b2a341 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -817,17 +817,12 @@ static void print_bad_pte(struct vm_area_struct *vma, >> unsigned long addr, >> * PFNMAP mappings in order to support COWable mappings. >> * >> */ >> -#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL >> -# define HAVE_PTE_SPECIAL 1 >> -#else >> -# define HAVE_PTE_SPECIAL 0 >> -#endif >> struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long >> addr, >> pte_t pte, bool with_public_device) >> { >> unsigned long pfn = pte_pfn(pte); >> - if (HAVE_PTE_SPECIAL) { >> + if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) { >> if (likely(!pte_special(pte))) >> goto check_pfn; >> if (vma->vm_ops && vma->vm_ops->find_special_page) >> @@ -862,7 +857,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, >> unsigned long addr, >> return NULL; >> } >> - /* !HAVE_PTE_SPECIAL case follows: */ >> + /* !CONFIG_ARCH_HAS_PTE_SPECIAL case follows: */ >> if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { >> if (vma->vm_flags & VM_MIXEDMAP) { >> @@ -881,7 +876,8 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, >> unsigned long addr, >> if (is_zero_pfn(pfn)) >> return NULL; >> -check_pfn: >> + >> +check_pfn: __maybe_unused > > See below > >> if (unlikely(pfn > highest_memmap_pfn)) { >> print_bad_pte(vma, addr, pte, NULL); >> return NULL; >> @@ -891,7 +887,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, >> unsigned long addr, >> * NOTE! We still have PageReserved() pages in the page tables. >> * eg. VDSO mappings can cause them to exist. >> */ >> -out: >> +out: __maybe_unused > > Why do you need that change ? > > There is no reason for the compiler to complain. It would complain if the goto > was within a #ifdef, but all the purpose of using IS_ENABLED() is to allow the > compiler to properly handle all possible cases. That's all the force of > IS_ENABLED() compared to ifdefs, and that the reason why they are plebicited, > ref Linux Codying style for a detailed explanation. Fair enough. Should I submit a v4 just to remove these so ugly __maybe_unused ? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] mm: remove odd HAVE_PTE_SPECIAL
Le 11/04/2018 à 10:41, Laurent Dufour a écrit : On 11/04/2018 10:33, Michal Hocko wrote: On Wed 11-04-18 10:03:36, Laurent Dufour wrote: @@ -881,7 +876,8 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, if (is_zero_pfn(pfn)) return NULL; -check_pfn: + +check_pfn: __maybe_unused if (unlikely(pfn > highest_memmap_pfn)) { print_bad_pte(vma, addr, pte, NULL); return NULL; @@ -891,7 +887,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, * NOTE! We still have PageReserved() pages in the page tables. * eg. VDSO mappings can cause them to exist. */ -out: +out: __maybe_unused return pfn_to_page(pfn); Why do we need this ugliness all of the sudden? Indeed the compiler doesn't complaint but in theory it should since these labels are not used depending on CONFIG_ARCH_HAS_PTE_SPECIAL. Why should it complain ? Regards Christophe -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] mm: remove odd HAVE_PTE_SPECIAL
Le 11/04/2018 à 10:03, Laurent Dufour a écrit : Remove the additional define HAVE_PTE_SPECIAL and rely directly on CONFIG_ARCH_HAS_PTE_SPECIAL. There is no functional change introduced by this patch Signed-off-by: Laurent Dufour --- mm/memory.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 96910c625daa..7f7dc7b2a341 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -817,17 +817,12 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, * PFNMAP mappings in order to support COWable mappings. * */ -#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL -# define HAVE_PTE_SPECIAL 1 -#else -# define HAVE_PTE_SPECIAL 0 -#endif struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte, bool with_public_device) { unsigned long pfn = pte_pfn(pte); - if (HAVE_PTE_SPECIAL) { + if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) { if (likely(!pte_special(pte))) goto check_pfn; if (vma->vm_ops && vma->vm_ops->find_special_page) @@ -862,7 +857,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, return NULL; } - /* !HAVE_PTE_SPECIAL case follows: */ + /* !CONFIG_ARCH_HAS_PTE_SPECIAL case follows: */ if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { if (vma->vm_flags & VM_MIXEDMAP) { @@ -881,7 +876,8 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, if (is_zero_pfn(pfn)) return NULL; -check_pfn: + +check_pfn: __maybe_unused See below if (unlikely(pfn > highest_memmap_pfn)) { print_bad_pte(vma, addr, pte, NULL); return NULL; @@ -891,7 +887,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, * NOTE! We still have PageReserved() pages in the page tables. * eg. VDSO mappings can cause them to exist. */ -out: +out: __maybe_unused Why do you need that change ? There is no reason for the compiler to complain. It would complain if the goto was within a #ifdef, but all the purpose of using IS_ENABLED() is to allow the compiler to properly handle all possible cases. That's all the force of IS_ENABLED() compared to ifdefs, and that the reason why they are plebicited, ref Linux Codying style for a detailed explanation. Christophe return pfn_to_page(pfn); } @@ -904,7 +900,7 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, /* * There is no pmd_special() but there may be special pmds, e.g. * in a direct-access (dax) mapping, so let's just replicate the -* !HAVE_PTE_SPECIAL case from vm_normal_page() here. +* !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_page() here. */ if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { if (vma->vm_flags & VM_MIXEDMAP) { @@ -1933,7 +1929,8 @@ static int __vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, * than insert_pfn). If a zero_pfn were inserted into a VM_MIXEDMAP * without pte special, it would there be refcounted as a normal page. */ - if (!HAVE_PTE_SPECIAL && !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) { + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && + !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) { struct page *page; /* -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] mm: remove odd HAVE_PTE_SPECIAL
On Wed 11-04-18 10:41:23, Laurent Dufour wrote: > On 11/04/2018 10:33, Michal Hocko wrote: > > On Wed 11-04-18 10:03:36, Laurent Dufour wrote: > >> @@ -881,7 +876,8 @@ struct page *_vm_normal_page(struct vm_area_struct > >> *vma, unsigned long addr, > >> > >>if (is_zero_pfn(pfn)) > >>return NULL; > >> -check_pfn: > >> + > >> +check_pfn: __maybe_unused > >>if (unlikely(pfn > highest_memmap_pfn)) { > >>print_bad_pte(vma, addr, pte, NULL); > >>return NULL; > >> @@ -891,7 +887,7 @@ struct page *_vm_normal_page(struct vm_area_struct > >> *vma, unsigned long addr, > >> * NOTE! We still have PageReserved() pages in the page tables. > >> * eg. VDSO mappings can cause them to exist. > >> */ > >> -out: > >> +out: __maybe_unused > >>return pfn_to_page(pfn); > > > > Why do we need this ugliness all of the sudden? > Indeed the compiler doesn't complaint but in theory it should since these > labels are not used depending on CONFIG_ARCH_HAS_PTE_SPECIAL. Well, such a warning would be quite pointless so I would rather not make the code ugly. The value of unused label is quite questionable to start with... -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] mm: remove odd HAVE_PTE_SPECIAL
On 11/04/2018 10:33, Michal Hocko wrote: > On Wed 11-04-18 10:03:36, Laurent Dufour wrote: >> @@ -881,7 +876,8 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, >> unsigned long addr, >> >> if (is_zero_pfn(pfn)) >> return NULL; >> -check_pfn: >> + >> +check_pfn: __maybe_unused >> if (unlikely(pfn > highest_memmap_pfn)) { >> print_bad_pte(vma, addr, pte, NULL); >> return NULL; >> @@ -891,7 +887,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, >> unsigned long addr, >> * NOTE! We still have PageReserved() pages in the page tables. >> * eg. VDSO mappings can cause them to exist. >> */ >> -out: >> +out: __maybe_unused >> return pfn_to_page(pfn); > > Why do we need this ugliness all of the sudden? Indeed the compiler doesn't complaint but in theory it should since these labels are not used depending on CONFIG_ARCH_HAS_PTE_SPECIAL. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/2] mm: introduce ARCH_HAS_PTE_SPECIAL
On Wed 11-04-18 10:03:35, Laurent Dufour wrote: > Currently the PTE special supports is turned on in per architecture header > files. Most of the time, it is defined in arch/*/include/asm/pgtable.h > depending or not on some other per architecture static definition. > > This patch introduce a new configuration variable to manage this directly > in the Kconfig files. It would later replace __HAVE_ARCH_PTE_SPECIAL. > > Here notes for some architecture where the definition of > __HAVE_ARCH_PTE_SPECIAL is not obvious: > > arm > __HAVE_ARCH_PTE_SPECIAL which is currently defined in > arch/arm/include/asm/pgtable-3level.h which is included by > arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set. > So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE. > > powerpc > __HAVE_ARCH_PTE_SPECIAL is defined in 2 files: > - arch/powerpc/include/asm/book3s/64/pgtable.h > - arch/powerpc/include/asm/pte-common.h > The first one is included if (PPC_BOOK3S & PPC64) while the second is > included in all the other cases. > So select ARCH_HAS_PTE_SPECIAL all the time. > > sparc: > __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) && > defined(__arch64__) which are defined through the compiler in > sparc/Makefile if !SPARC32 which I assume to be if SPARC64. > So select ARCH_HAS_PTE_SPECIAL if SPARC64 > > There is no functional change introduced by this patch. > > Suggested-by: Jerome Glisse > Reviewed-by: Jerome Glisse > Acked-by: David Rientjes > Signed-off-by: Laurent Dufour Looks good to me. I have checked x86 and the generic code and it looks good to me. Anyway arch maintainers really have to double check this. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] mm: remove odd HAVE_PTE_SPECIAL
On Wed 11-04-18 10:03:36, Laurent Dufour wrote: > @@ -881,7 +876,8 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, > unsigned long addr, > > if (is_zero_pfn(pfn)) > return NULL; > -check_pfn: > + > +check_pfn: __maybe_unused > if (unlikely(pfn > highest_memmap_pfn)) { > print_bad_pte(vma, addr, pte, NULL); > return NULL; > @@ -891,7 +887,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, > unsigned long addr, >* NOTE! We still have PageReserved() pages in the page tables. >* eg. VDSO mappings can cause them to exist. >*/ > -out: > +out: __maybe_unused > return pfn_to_page(pfn); Why do we need this ugliness all of the sudden? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/2] move __HAVE_ARCH_PTE_SPECIAL in Kconfig
The per architecture __HAVE_ARCH_PTE_SPECIAL is defined statically in the per architecture header files. This doesn't allow to make other configuration dependent on it. The first patch of this series is replacing __HAVE_ARCH_PTE_SPECIAL by CONFIG_ARCH_HAS_PTE_SPECIAL defined into the Kconfig files, setting it automatically when architectures was already setting it in header file. The second patch is removing the odd define HAVE_PTE_SPECIAL which is a duplicate of CONFIG_ARCH_HAS_PTE_SPECIAL. There is no functional change introduced by this series. -- Changes since v2: * remove __HAVE_ARCH_PTE_SPECIAL in arch/riscv/include/asm/pgtable-bits.h * use IS_ENABLED() instead of #ifdef blocks in patch 2 Laurent Dufour (2): mm: introduce ARCH_HAS_PTE_SPECIAL mm: remove odd HAVE_PTE_SPECIAL .../features/vm/pte_special/arch-support.txt | 2 +- arch/arc/Kconfig | 1 + arch/arc/include/asm/pgtable.h| 2 -- arch/arm/Kconfig | 1 + arch/arm/include/asm/pgtable-3level.h | 1 - arch/arm64/Kconfig| 1 + arch/arm64/include/asm/pgtable.h | 2 -- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/book3s/64/pgtable.h | 3 --- arch/powerpc/include/asm/pte-common.h | 3 --- arch/riscv/Kconfig| 1 + arch/riscv/include/asm/pgtable-bits.h | 3 --- arch/s390/Kconfig | 1 + arch/s390/include/asm/pgtable.h | 1 - arch/sh/Kconfig | 1 + arch/sh/include/asm/pgtable.h | 2 -- arch/sparc/Kconfig| 1 + arch/sparc/include/asm/pgtable_64.h | 3 --- arch/x86/Kconfig | 1 + arch/x86/include/asm/pgtable_types.h | 1 - include/linux/pfn_t.h | 4 ++-- mm/Kconfig| 3 +++ mm/gup.c | 4 ++-- mm/memory.c | 19 --- 24 files changed, 25 insertions(+), 37 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/2] mm: introduce ARCH_HAS_PTE_SPECIAL
Currently the PTE special supports is turned on in per architecture header files. Most of the time, it is defined in arch/*/include/asm/pgtable.h depending or not on some other per architecture static definition. This patch introduce a new configuration variable to manage this directly in the Kconfig files. It would later replace __HAVE_ARCH_PTE_SPECIAL. Here notes for some architecture where the definition of __HAVE_ARCH_PTE_SPECIAL is not obvious: arm __HAVE_ARCH_PTE_SPECIAL which is currently defined in arch/arm/include/asm/pgtable-3level.h which is included by arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set. So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE. powerpc __HAVE_ARCH_PTE_SPECIAL is defined in 2 files: - arch/powerpc/include/asm/book3s/64/pgtable.h - arch/powerpc/include/asm/pte-common.h The first one is included if (PPC_BOOK3S & PPC64) while the second is included in all the other cases. So select ARCH_HAS_PTE_SPECIAL all the time. sparc: __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) && defined(__arch64__) which are defined through the compiler in sparc/Makefile if !SPARC32 which I assume to be if SPARC64. So select ARCH_HAS_PTE_SPECIAL if SPARC64 There is no functional change introduced by this patch. Suggested-by: Jerome Glisse Reviewed-by: Jerome Glisse Acked-by: David Rientjes Signed-off-by: Laurent Dufour --- Documentation/features/vm/pte_special/arch-support.txt | 2 +- arch/arc/Kconfig | 1 + arch/arc/include/asm/pgtable.h | 2 -- arch/arm/Kconfig | 1 + arch/arm/include/asm/pgtable-3level.h | 1 - arch/arm64/Kconfig | 1 + arch/arm64/include/asm/pgtable.h | 2 -- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/book3s/64/pgtable.h | 3 --- arch/powerpc/include/asm/pte-common.h | 3 --- arch/riscv/Kconfig | 1 + arch/riscv/include/asm/pgtable-bits.h | 3 --- arch/s390/Kconfig | 1 + arch/s390/include/asm/pgtable.h| 1 - arch/sh/Kconfig| 1 + arch/sh/include/asm/pgtable.h | 2 -- arch/sparc/Kconfig | 1 + arch/sparc/include/asm/pgtable_64.h| 3 --- arch/x86/Kconfig | 1 + arch/x86/include/asm/pgtable_types.h | 1 - include/linux/pfn_t.h | 4 ++-- mm/Kconfig | 3 +++ mm/gup.c | 4 ++-- mm/memory.c| 2 +- 24 files changed, 18 insertions(+), 27 deletions(-) diff --git a/Documentation/features/vm/pte_special/arch-support.txt b/Documentation/features/vm/pte_special/arch-support.txt index 055004f467d2..cd05924ea875 100644 --- a/Documentation/features/vm/pte_special/arch-support.txt +++ b/Documentation/features/vm/pte_special/arch-support.txt @@ -1,6 +1,6 @@ # # Feature name: pte_special -# Kconfig: __HAVE_ARCH_PTE_SPECIAL +# Kconfig: ARCH_HAS_PTE_SPECIAL # description: arch supports the pte_special()/pte_mkspecial() VM APIs # --- diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index d76bf4a83740..8516e2b0239a 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -44,6 +44,7 @@ config ARC select HAVE_GENERIC_DMA_COHERENT select HAVE_KERNEL_GZIP select HAVE_KERNEL_LZMA + select ARCH_HAS_PTE_SPECIAL config MIGHT_HAVE_PCI bool diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h index 08fe33830d4b..8ec5599a0957 100644 --- a/arch/arc/include/asm/pgtable.h +++ b/arch/arc/include/asm/pgtable.h @@ -320,8 +320,6 @@ PTE_BIT_FUNC(mkexec,|= (_PAGE_EXECUTE)); PTE_BIT_FUNC(mkspecial,|= (_PAGE_SPECIAL)); PTE_BIT_FUNC(mkhuge, |= (_PAGE_HW_SZ)); -#define __HAVE_ARCH_PTE_SPECIAL - static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot)); diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index a7f8e7f4b88f..c088c851b235 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -8,6 +8,7 @@ config ARM select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE + select ARCH_HAS_PTE_SPECIAL if ARM_LPAE select ARCH_HAS_SET_MEMORY select ARCH_HAS_PHYS_TO_DMA select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 2a4836087358..6d50a11d7793 100644 --- a/arch/arm/include/asm
[PATCH v3 2/2] mm: remove odd HAVE_PTE_SPECIAL
Remove the additional define HAVE_PTE_SPECIAL and rely directly on CONFIG_ARCH_HAS_PTE_SPECIAL. There is no functional change introduced by this patch Signed-off-by: Laurent Dufour --- mm/memory.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 96910c625daa..7f7dc7b2a341 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -817,17 +817,12 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, * PFNMAP mappings in order to support COWable mappings. * */ -#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL -# define HAVE_PTE_SPECIAL 1 -#else -# define HAVE_PTE_SPECIAL 0 -#endif struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte, bool with_public_device) { unsigned long pfn = pte_pfn(pte); - if (HAVE_PTE_SPECIAL) { + if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) { if (likely(!pte_special(pte))) goto check_pfn; if (vma->vm_ops && vma->vm_ops->find_special_page) @@ -862,7 +857,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, return NULL; } - /* !HAVE_PTE_SPECIAL case follows: */ + /* !CONFIG_ARCH_HAS_PTE_SPECIAL case follows: */ if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { if (vma->vm_flags & VM_MIXEDMAP) { @@ -881,7 +876,8 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, if (is_zero_pfn(pfn)) return NULL; -check_pfn: + +check_pfn: __maybe_unused if (unlikely(pfn > highest_memmap_pfn)) { print_bad_pte(vma, addr, pte, NULL); return NULL; @@ -891,7 +887,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, * NOTE! We still have PageReserved() pages in the page tables. * eg. VDSO mappings can cause them to exist. */ -out: +out: __maybe_unused return pfn_to_page(pfn); } @@ -904,7 +900,7 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, /* * There is no pmd_special() but there may be special pmds, e.g. * in a direct-access (dax) mapping, so let's just replicate the -* !HAVE_PTE_SPECIAL case from vm_normal_page() here. +* !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_page() here. */ if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { if (vma->vm_flags & VM_MIXEDMAP) { @@ -1933,7 +1929,8 @@ static int __vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, * than insert_pfn). If a zero_pfn were inserted into a VM_MIXEDMAP * without pte special, it would there be refcounted as a normal page. */ - if (!HAVE_PTE_SPECIAL && !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) { + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && + !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) { struct page *page; /* -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html