[tip:perf/urgent] uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
Commit-ID: 297e765e390a2ac996000b5f7228cbd84d995174 Gitweb: http://git.kernel.org/tip/297e765e390a2ac996000b5f7228cbd84d995174 Author: Marcin Nowakowski AuthorDate: Tue, 13 Dec 2016 11:40:57 +0100 Committer: Ingo Molnar CommitDate: Sun, 18 Dec 2016 09:42:11 +0100 uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation Commit: 72e6ae285a1d ('ARM: 8043/1: uprobes need icache flush after xol write' ... has introduced an arch-specific method to ensure all caches are flushed appropriately after an instruction is written to an XOL page. However, when the XOL area is created and the out-of-line breakpoint instruction is copied, caches are not flushed at all and stale data may be found in icache. Replace a simple copy_to_page() with arch_uprobe_copy_ixol() to allow the arch to ensure all caches are updated accordingly. This change fixes uprobes on MIPS InterAptiv (tested on Creator Ci40). Signed-off-by: Marcin Nowakowski Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Victor Kamensky Cc: linux-m...@linux-mips.org Link: http://lkml.kernel.org/r/1481625657-22850-1-git-send-email-marcin.nowakow...@imgtec.com Signed-off-by: Ingo Molnar --- kernel/events/uprobes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index f9ec9ad..b5916b4 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1194,7 +1194,7 @@ static struct xol_area *__create_xol_area(unsigned long vaddr) /* Reserve the 1st slot for get_trampoline_vaddr() */ set_bit(0, area->bitmap); atomic_set(&area->slot_count, 1); - copy_to_page(area->pages[0], 0, &insn, UPROBE_SWBP_INSN_SIZE); + arch_uprobe_copy_ixol(area->pages[0], 0, &insn, UPROBE_SWBP_INSN_SIZE); if (!xol_add_vma(mm, area)) return area;
Re: [PATCH v4 1/5] pinctrl: aspeed: dt: Fix compatibles for the System Control Unit
On Tue, Dec 20, 2016 at 6:05 PM, Andrew Jeffery wrote: > Reference the SoC-specific compatible string in the examples as > required. > > Signed-off-by: Andrew Jeffery Acked-by: Joel Stanley > --- > Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt | 11 --- > 1 file changed, 8 insertions(+), 3 deletions(-)
Re: [GIT PULL] MFD for v4.10
On Mon, 19 Dec 2016, Linus Torvalds wrote: > On Mon, Dec 19, 2016 at 3:47 AM, Lee Jones wrote: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git HEAD > > Nothing there. > > I'm assuming you meant the "for-linus-4.10" tag. Yes. Brain malfunction trying to get the PR out too quickly... More haste, less speed! -- Lee Jones Linaro STMicroelectronics Landing Team Lead Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
[PATCH v4 3/5] pinctrl: aspeed-g4: Add mux configuration for all pins
The patch introducing the g4 pinctrl driver implemented a smattering of pins to flesh out the implementation of the core and provide bare-bones support for some OpenPOWER platforms. Now, update the bindings document to reflect the complete functionality and implement the necessary pin configuration tables in the driver. Cc: Timothy Pearson Signed-off-by: Andrew Jeffery Acked-by: Joel Stanley Acked-by: Rob Herring --- .../devicetree/bindings/pinctrl/pinctrl-aspeed.txt | 19 +- drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c | 1097 +++- 2 files changed, 1096 insertions(+), 20 deletions(-) diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt index fb7694ec033d..a645d0be3347 100644 --- a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt @@ -44,13 +44,18 @@ supported: aspeed,ast2400-pinctrl, aspeed,g4-pinctrl: -ACPI BMCINT DDCCLK DDCDAT FLACK FLBUSY FLWP GPID0 GPIE0 GPIE2 GPIE4 GPIE6 I2C10 -I2C11 I2C12 I2C13 I2C3 I2C4 I2C5 I2C6 I2C7 I2C8 I2C9 LPCPD LPCPME LPCSMI MDIO1 -MDIO2 NCTS1 NCTS3 NCTS4 NDCD1 NDCD3 NDCD4 NDSR1 NDSR3 NDTR1 NDTR3 NRI1 NRI3 -NRI4 NRTS1 NRTS3 PWM0 PWM1 PWM2 PWM3 PWM4 PWM5 PWM6 PWM7 RGMII1 RMII1 ROM16 -ROM8 ROMCS1 ROMCS2 ROMCS3 ROMCS4 RXD1 RXD3 RXD4 SD1 SGPMI SIOPBI SIOPBO TIMER3 -TIMER5 TIMER6 TIMER7 TIMER8 TXD1 TXD3 TXD4 UART6 VGAHS VGAVS VPI18 VPI24 VPI30 -VPO12 VPO24 +ACPI ADC0 ADC1 ADC10 ADC11 ADC12 ADC13 ADC14 ADC15 ADC2 ADC3 ADC4 ADC5 ADC6 +ADC7 ADC8 ADC9 BMCINT DDCCLK DDCDAT EXTRST FLACK FLBUSY FLWP GPID GPID0 GPID2 +GPID4 GPID6 GPIE0 GPIE2 GPIE4 GPIE6 I2C10 I2C11 I2C12 I2C13 I2C14 I2C3 I2C4 +I2C5 I2C6 I2C7 I2C8 I2C9 LPCPD LPCPME LPCRST LPCSMI MAC1LINK MAC2LINK MDIO1 +MDIO2 NCTS1 NCTS2 NCTS3 NCTS4 NDCD1 NDCD2 NDCD3 NDCD4 NDSR1 NDSR2 NDSR3 NDSR4 +NDTR1 NDTR2 NDTR3 NDTR4 NDTS4 NRI1 NRI2 NRI3 NRI4 NRTS1 NRTS2 NRTS3 OSCCLK PWM0 +PWM1 PWM2 PWM3 PWM4 PWM5 PWM6 PWM7 RGMII1 RGMII2 RMII1 RMII2 ROM16 ROM8 ROMCS1 +ROMCS2 ROMCS3 ROMCS4 RXD1 RXD2 RXD3 RXD4 SALT1 SALT2 SALT3 SALT4 SD1 SD2 SGPMCK +SGPMI SGPMLD SGPMO SGPSCK SGPSI0 SGPSI1 SGPSLD SIOONCTRL SIOPBI SIOPBO SIOPWREQ +SIOPWRGD SIOS3 SIOS5 SIOSCI SPI1 SPI1DEBUG SPI1PASSTHRU SPICS1 TIMER3 TIMER4 +TIMER5 TIMER6 TIMER7 TIMER8 TXD1 TXD2 TXD3 TXD4 UART6 USBCKI VGABIOS_ROM VGAHS +VGAVS VPI18 VPI24 VPI30 VPO12 VPO24 WDTRST1 WDTRST2 aspeed,ast2500-pinctrl, aspeed,g5-pinctrl: diff --git a/drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c b/drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c index 558bd102416c..09b668415c56 100644 --- a/drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c +++ b/drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c @@ -43,9 +43,18 @@ * Not all pins have their signals defined (yet). */ +#define D6 0 +SSSF_PIN_DECL(D6, GPIOA0, MAC1LINK, SIG_DESC_SET(SCU80, 0)); + +#define B5 1 +SSSF_PIN_DECL(B5, GPIOA1, MAC2LINK, SIG_DESC_SET(SCU80, 1)); + #define A4 2 SSSF_PIN_DECL(A4, GPIOA2, TIMER3, SIG_DESC_SET(SCU80, 2)); +#define E6 3 +SSSF_PIN_DECL(E6, GPIOA3, TIMER4, SIG_DESC_SET(SCU80, 3)); + #define I2C9_DESC SIG_DESC_SET(SCU90, 22) #define C5 4 @@ -80,6 +89,26 @@ MS_PIN_DECL(D5, GPIOA7, MDIO2, TIMER8); FUNC_GROUP_DECL(TIMER8, D5); FUNC_GROUP_DECL(MDIO2, A3, D5); +#define J21 8 +SSSF_PIN_DECL(J21, GPIOB0, SALT1, SIG_DESC_SET(SCU80, 8)); + +#define J20 9 +SSSF_PIN_DECL(J20, GPIOB1, SALT2, SIG_DESC_SET(SCU80, 9)); + +#define H18 10 +SSSF_PIN_DECL(H18, GPIOB2, SALT3, SIG_DESC_SET(SCU80, 10)); + +#define F18 11 +SSSF_PIN_DECL(F18, GPIOB3, SALT4, SIG_DESC_SET(SCU80, 11)); + +#define E19 12 +SIG_EXPR_DECL(LPCRST, LPCRST, SIG_DESC_SET(SCU80, 12)); +SIG_EXPR_DECL(LPCRST, LPCRSTS, SIG_DESC_SET(HW_STRAP1, 14)); +SIG_EXPR_LIST_DECL_DUAL(LPCRST, LPCRST, LPCRSTS); +SS_PIN_DECL(E19, GPIOB4, LPCRST); + +FUNC_GROUP_DECL(LPCRST, E19); + #define H19 13 #define H19_DESCSIG_DESC_SET(SCU80, 13) SIG_EXPR_LIST_DECL_SINGLE(LPCPD, LPCPD, H19_DESC); @@ -92,6 +121,19 @@ FUNC_GROUP_DECL(LPCSMI, H19); #define H20 14 SSSF_PIN_DECL(H20, GPIOB6, LPCPME, SIG_DESC_SET(SCU80, 14)); +#define E18 15 +SIG_EXPR_LIST_DECL_SINGLE(EXTRST, EXTRST, + SIG_DESC_SET(SCU80, 15), + SIG_DESC_BIT(SCU90, 31, 0), + SIG_DESC_SET(SCU3C, 3)); +SIG_EXPR_LIST_DECL_SINGLE(SPICS1, SPICS1, + SIG_DESC_SET(SCU80, 15), + SIG_DESC_SET(SCU90, 31)); +MS_PIN_DECL(E18, GPIOB7, EXTRST, SPICS1); + +FUNC_GROUP_DECL(EXTRST, E18); +FUNC_GROUP_DECL(SPICS1, E18); + #define SD1_DESC SIG_DESC_SET(SCU90, 0) #define I2C10_DESC SIG_DESC_SET(SCU90, 23) @@ -170,6 +212,62 @@ MS_PIN_DECL(D16, GPIOD1, SD2CMD, GPID0OUT); FUNC_GROUP_DECL(GPID0, A18, D16); +#define GPID2_DESC SIG_DESC_SET(SCU8C, 9) + +#define B17 26 +SIG_EXPR_LIST_DECL_SINGLE(SD2DAT0, SD2, SD2_DESC); +SIG_EXPR_DECL(GPID2IN, GPID2, GPID2_DESC); +SIG_EXPR_DECL(GPID2IN, GPID, GPID_DESC); +SIG_EXPR_LIST_DECL_DUAL(GPID2IN, GPID2, GPID); +MS_PIN_DECL(B17, GPIOD2,
[PATCH v4 2/5] pinctrl: aspeed: Read and write bits in LPC and GFX controllers
The System Control Unit IP block in the Aspeed SoCs is typically where the pinmux configuration is found, but not always. A number of pins depend on state in one of LPC Host Control (LHC) or SoC Display Controller (GFX) IP blocks, so the Aspeed pinmux drivers should have the means to adjust these as necessary. We use syscon to cast a regmap over the GFX and LPC blocks, which is used as an arbitration layer between the relevant driver and the pinctrl subsystem. The regmaps are then exposed to the SoC-specific pinctrl drivers by phandles in the devicetree, and are selected during a mux request by querying a new 'ip' member in struct aspeed_sig_desc. Signed-off-by: Andrew Jeffery Reviewed-by: Joel Stanley --- Joel: I kept your r-b tag here despite reworking the g5 example bindings, as you've given your r-b for the lpc bindings which are what I have added. .../devicetree/bindings/pinctrl/pinctrl-aspeed.txt | 80 -- drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c | 18 +-- drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c | 48 -- drivers/pinctrl/aspeed/pinctrl-aspeed.c| 161 + drivers/pinctrl/aspeed/pinctrl-aspeed.h| 32 ++-- 5 files changed, 242 insertions(+), 97 deletions(-) diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt index b2efb73337c6..fb7694ec033d 100644 --- a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt @@ -1,15 +1,23 @@ +== Aspeed Pin Controllers --- +== The Aspeed SoCs vary in functionality inside a generation but have a common mux device register layout. -Required properties: -- compatible : Should be any one of the following: - "aspeed,ast2400-pinctrl" - "aspeed,g4-pinctrl" - "aspeed,ast2500-pinctrl" - "aspeed,g5-pinctrl" +Required properties for g4: +- compatible : Should be one of the following: + "aspeed,ast2400-pinctrl" + "aspeed,g4-pinctrl" + +Required properties for g5: +- compatible : Should be one of the following: + "aspeed,ast2500-pinctrl" + "aspeed,g5-pinctrl" + +- aspeed,external-nodes: A cell of phandles to external controller nodes: + 0: compatible with "aspeed,ast2500-gfx", "syscon" + 1: compatible with "aspeed,ast2500-lhc", "syscon" The pin controller node should be the child of a syscon node with the required property: @@ -24,7 +32,7 @@ Refer to the the bindings described in Documentation/devicetree/bindings/mfd/syscon.txt Subnode Format --- +== The required properties of child nodes are (as defined in pinctrl-bindings): - function @@ -51,8 +59,11 @@ I2C9 MAC1LINK MDIO1 MDIO2 OSCCLK PEWAKE PWM0 PWM1 PWM2 PWM3 PWM4 PWM5 PWM6 PWM7 RGMII1 RGMII2 RMII1 RMII2 SD1 SPI1 SPI1DEBUG SPI1PASSTHRU TIMER4 TIMER5 TIMER6 TIMER7 TIMER8 VGABIOSROM +Examples + -Examples: +g4 Example +-- syscon: scu@1e6e2000 { compatible = "aspeed,ast2400-scu", "syscon", "simple-mfd"; @@ -68,5 +79,56 @@ syscon: scu@1e6e2000 { }; }; +g5 Example +-- + +ahb { + apb { + syscon: scu@1e6e2000 { + compatible = "aspeed,ast2500-scu", "syscon", "simple-mfd"; + reg = <0x1e6e2000 0x1a8>; + + pinctrl: pinctrl { + compatible = "aspeed,g5-pinctrl"; + aspeed,external-nodes = <&gfx &lhc>; + + pinctrl_i2c3_default: i2c3_default { + function = "I2C3"; + groups = "I2C3"; + }; + }; + }; + + gfx: display@1e6e6000 { + compatible = "aspeed,ast2500-gfx", "syscon"; + reg = <0x1e6e6000 0x1000>; + }; + }; + + lpc: lpc@1e789000 { + compatible = "aspeed,ast2500-lpc", "simple-mfd"; + reg = <0x1e789000 0x1000>; + + #address-cells = <1>; + #size-cells = <1>; + ranges = <0x0 0x1e789000 0x1000>; + + lpc_host: lpc-host@80 { + compatible = "aspeed,ast2500-lpc-host", "simple-mfd", "syscon"; + reg = <0x80 0x1e0>; + reg-io-width = <4>; + + #address-cells = <1>; + #size-cells = <1>; + ranges = <0x0 0x80 0x1e0>; + + lhc: lhc@20 { +
[PATCH v4 4/5] pinctrl: aspeed-g5: Add mux configuration for all pins
The patch introducing the g5 pinctrl driver implemented a smattering of pins to flesh out the implementation of the core and provide bare-bones support for some OpenPOWER platforms and the AST2500 evaluation board. Now, update the bindings document to reflect the complete functionality and implement the necessary pin configuration tables in the driver. Signed-off-by: Andrew Jeffery Acked-by: Joel Stanley Acked-by: Rob Herring --- .../devicetree/bindings/pinctrl/pinctrl-aspeed.txt | 17 +- drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c | 1476 +++- drivers/pinctrl/aspeed/pinctrl-aspeed.h|1 + 3 files changed, 1487 insertions(+), 7 deletions(-) diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt index a645d0be3347..b98e6f030da8 100644 --- a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt @@ -59,10 +59,19 @@ VGAVS VPI18 VPI24 VPI30 VPO12 VPO24 WDTRST1 WDTRST2 aspeed,ast2500-pinctrl, aspeed,g5-pinctrl: -GPID0 GPID2 GPIE0 I2C10 I2C11 I2C12 I2C13 I2C14 I2C3 I2C4 I2C5 I2C6 I2C7 I2C8 -I2C9 MAC1LINK MDIO1 MDIO2 OSCCLK PEWAKE PWM0 PWM1 PWM2 PWM3 PWM4 PWM5 PWM6 PWM7 -RGMII1 RGMII2 RMII1 RMII2 SD1 SPI1 SPI1DEBUG SPI1PASSTHRU TIMER4 TIMER5 TIMER6 -TIMER7 TIMER8 VGABIOSROM +ACPI ADC0 ADC1 ADC10 ADC11 ADC12 ADC13 ADC14 ADC15 ADC2 ADC3 ADC4 ADC5 ADC6 +ADC7 ADC8 ADC9 BMCINT DDCCLK DDCDAT ESPI FWSPICS1 FWSPICS2 GPID0 GPID2 GPID4 +GPID6 GPIE0 GPIE2 GPIE4 GPIE6 I2C10 I2C11 I2C12 I2C13 I2C14 I2C3 I2C4 I2C5 I2C6 +I2C7 I2C8 I2C9 LAD0 LAD1 LAD2 LAD3 LCLK LFRAME LPCHC LPCPD LPCPLUS LPCPME +LPCRST LPCSMI LSIRQ MAC1LINK MAC2LINK MDIO1 MDIO2 NCTS1 NCTS2 NCTS3 NCTS4 NDCD1 +NDCD2 NDCD3 NDCD4 NDSR1 NDSR2 NDSR3 NDSR4 NDTR1 NDTR2 NDTR3 NDTR4 NRI1 NRI2 +NRI3 NRI4 NRTS1 NRTS2 NRTS3 NRTS4 OSCCLK PEWAKE PNOR PWM0 PWM1 PWM2 PWM3 PWM4 +PWM5 PWM6 PWM7 RGMII1 RGMII2 RMII1 RMII2 RXD1 RXD2 RXD3 RXD4 SALT1 SALT10 +SALT11 SALT12 SALT13 SALT14 SALT2 SALT3 SALT4 SALT5 SALT6 SALT7 SALT8 SALT9 +SCL1 SCL2 SD1 SD2 SDA1 SDA2 SGPS1 SGPS2 SIOONCTRL SIOPBI SIOPBO SIOPWREQ +SIOPWRGD SIOS3 SIOS5 SIOSCI SPI1 SPI1CS1 SPI1DEBUG SPI1PASSTHRU SPI2CK SPI2CS0 +SPI2CS1 SPI2MISO SPI2MOSI TIMER3 TIMER4 TIMER5 TIMER6 TIMER7 TIMER8 TXD1 TXD2 +TXD3 TXD4 UART6 USBCKI VGABIOSROM VGAHS VGAVS VPI24 VPO WDTRST1 WDTRST2 Examples diff --git a/drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c b/drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c index c5c9a1b6fa1c..43221a3c7e23 100644 --- a/drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c +++ b/drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c @@ -25,14 +25,28 @@ #include "../pinctrl-utils.h" #include "pinctrl-aspeed.h" -#define ASPEED_G5_NR_PINS 228 +#define ASPEED_G5_NR_PINS 232 #define COND1 { ASPEED_IP_SCU, SCU90, BIT(6), 0, 0 } #define COND2 { ASPEED_IP_SCU, SCU94, GENMASK(1, 0), 0, 0 } +/* LHCR0 is offset from the end of the H8S/2168-compatible registers */ +#define LHCR0 0x20 +#define GFX064 0x64 + #define B14 0 SSSF_PIN_DECL(B14, GPIOA0, MAC1LINK, SIG_DESC_SET(SCU80, 0)); +#define D14 1 +SSSF_PIN_DECL(D14, GPIOA1, MAC2LINK, SIG_DESC_SET(SCU80, 1)); + +#define D13 2 +SIG_EXPR_LIST_DECL_SINGLE(SPI1CS1, SPI1CS1, SIG_DESC_SET(SCU80, 15)); +SIG_EXPR_LIST_DECL_SINGLE(TIMER3, TIMER3, SIG_DESC_SET(SCU80, 2)); +MS_PIN_DECL(D13, GPIOA2, SPI1CS1, TIMER3); +FUNC_GROUP_DECL(SPI1CS1, D13); +FUNC_GROUP_DECL(TIMER3, D13); + #define E13 3 SSSF_PIN_DECL(E13, GPIOA3, TIMER4, SIG_DESC_SET(SCU80, 3)); @@ -72,6 +86,32 @@ FUNC_GROUP_DECL(TIMER8, B13); FUNC_GROUP_DECL(MDIO2, C13, B13); +#define K19 8 +GPIO_PIN_DECL(K19, GPIOB0); + +#define L19 9 +GPIO_PIN_DECL(L19, GPIOB1); + +#define L18 10 +GPIO_PIN_DECL(L18, GPIOB2); + +#define K18 11 +GPIO_PIN_DECL(K18, GPIOB3); + +#define J20 12 +SSSF_PIN_DECL(J20, GPIOB4, USBCKI, SIG_DESC_SET(HW_STRAP1, 23)); + +#define H21 13 +#define H21_DESC SIG_DESC_SET(SCU80, 13) +SIG_EXPR_LIST_DECL_SINGLE(LPCPD, LPCPD, H21_DESC); +SIG_EXPR_LIST_DECL_SINGLE(LPCSMI, LPCSMI, H21_DESC); +MS_PIN_DECL(H21, GPIOB5, LPCPD, LPCSMI); +FUNC_GROUP_DECL(LPCPD, H21); +FUNC_GROUP_DECL(LPCSMI, H21); + +#define H22 14 +SSSF_PIN_DECL(H22, GPIOB6, LPCPME, SIG_DESC_SET(SCU80, 14)); + #define H20 15 GPIO_PIN_DECL(H20, GPIOB7); @@ -168,7 +208,44 @@ MS_PIN_DECL(D20, GPIOD3, SD2DAT1, GPID2OUT); FUNC_GROUP_DECL(GPID2, F20, D20); -#define GPIE_DESC SIG_DESC_SET(HW_STRAP1, 21) +#define GPID4_DESC SIG_DESC_SET(SCU8C, 10) + +#define D21 28 +SIG_EXPR_LIST_DECL_SINGLE(SD2DAT2, SD2, SD2_DESC); +SIG_EXPR_DECL(GPID4IN, GPID4, GPID4_DESC); +SIG_EXPR_DECL(GPID4IN, GPID, GPID_DESC); +SIG_EXPR_LIST_DECL_DUAL(GPID4IN, GPID4, GPID); +MS_PIN_DECL(D21, GPIOD4, SD2DAT2, GPID4IN); + +#define E20 29 +SIG_EXPR_LIST_DECL_SINGLE(SD2DAT3, SD2, SD2_DESC); +SIG_EXPR_DECL(GPID4OUT, GPID4, GPID4_DESC); +SIG_EXPR_DECL(GPID4OUT, GPID, GPID_DESC); +SIG_EXPR_LIST_DECL_DUAL(GPID4OUT, GPID4, GPID); +
[PATCH v4 5/5] pinctrl: aspeed: Fix kerneldoc return descriptions
Signed-off-by: Andrew Jeffery Acked-by: Joel Stanley --- drivers/pinctrl/aspeed/pinctrl-aspeed.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/pinctrl/aspeed/pinctrl-aspeed.c b/drivers/pinctrl/aspeed/pinctrl-aspeed.c index 782c5c97f853..76f62bd45f02 100644 --- a/drivers/pinctrl/aspeed/pinctrl-aspeed.c +++ b/drivers/pinctrl/aspeed/pinctrl-aspeed.c @@ -97,7 +97,7 @@ static inline void aspeed_sig_desc_print_val( * @enabled: True to query the enabled state, false to query disabled state * @regmap: The IP block's regmap instance * - * @return 1 if the descriptor's bitfield is configured to the state + * Return: 1 if the descriptor's bitfield is configured to the state * selected by @enabled, 0 if not, and less than zero if an unrecoverable * failure occurred * @@ -134,7 +134,7 @@ static int aspeed_sig_desc_eval(const struct aspeed_sig_desc *desc, * @enabled: True to query the enabled state, false to query disabled state * @maps: The list of regmap instances * - * @return 1 if the expression composed by @enabled evaluates true, 0 if not, + * Return: 1 if the expression composed by @enabled evaluates true, 0 if not, * and less than zero if an unrecoverable failure occurred. * * A mux function is enabled or disabled if the function's signal expression @@ -175,7 +175,7 @@ static int aspeed_sig_expr_eval(const struct aspeed_sig_expr *expr, * expression, false to disable the function's signal * @maps: The list of regmap instances for pinmux register access. * - * @return 0 if the expression is configured as requested and a negative error + * Return: 0 if the expression is configured as requested and a negative error * code otherwise */ static int aspeed_sig_expr_set(const struct aspeed_sig_expr *expr, @@ -256,7 +256,7 @@ static int aspeed_sig_expr_disable(const struct aspeed_sig_expr *expr, * @exprs: The list of signal expressions (from a priority level on a pin) * @maps: The list of regmap instances for pinmux register access. * - * @return 0 if all expressions are disabled, otherwise a negative error code + * Return: 0 if all expressions are disabled, otherwise a negative error code */ static int aspeed_disable_sig(const struct aspeed_sig_expr **exprs, struct regmap * const *maps) @@ -281,8 +281,8 @@ static int aspeed_disable_sig(const struct aspeed_sig_expr **exprs, * @exprs: List of signal expressions (haystack) * @name: The name of the requested function (needle) * - * @return A pointer to the signal expression whose function tag matches the - * provided name, otherwise NULL. + * Return: A pointer to the signal expression whose function tag matches the + * provided name, otherwise NULL. * */ static const struct aspeed_sig_expr *aspeed_find_expr_by_name( -- 2.9.3
[PATCH v4 0/5] pinctrl: aspeed: Implement remaining pins
Hi Linus, This is v4 of the series implementing the remainder of the pinmux tables for the AST2400 and AST2500 SoCs. v3 of the series can be found here: https://lkml.org/lkml/2016/12/5/847 Cheers, Andrew Changes since v3: * Add a patch fixing the AST2400 SCU compatible strings in the Aspeed pinctrl bindings. They are mentioned to define the expectations on the pinctrl node's parent and also in the examples to illustrate the relationship. * Rework the g5 example bindings in patch 2/5 to reflect the LPC/LHC bindings[1][2], and fix the SCU compatible string. [1] https://lkml.org/lkml/2016/12/20/63 [2] https://lkml.org/lkml/2016/12/20/62 Significant changes since v2: * The fix for touching bit SCU90[6] has been applied, so the patch has been dropped. * The MFD devicetree bindings patches have been split out into their own series: https://lkml.org/lkml/2016/12/5/835 * Rework the "Read and write bits in LPC and GFX controllers" patch so that the changes are backwards compatible with existing devicetrees. This will lead to limited functionality, but no more limited than what systems with those devicetrees already experience. * A fix for the kerneldoc return value descriptions Significant changes since v1: * Fixes from v1 have been applied, so have been dropped for v2 * A new fix has appeared, "pinctrl-aspeed-g5: Never set SCU90[6]", as noted above * New bindings documents for the SoC Display and LPC Host Controllers, driven by the patch "pinctrl: aspeed: Read and write bits in LPCHC and GFX controllers" * The v1 patch "pinctrl: aspeed: Enable capture of off-SCU pinmux state" has been significantly reworked and is now titled "pinctrl: aspeed: Read and write bits in LPCHC and GFX controllers" Andrew Jeffery (5): pinctrl: aspeed: dt: Fix compatibles for the System Control Unit pinctrl: aspeed: Read and write bits in LPC and GFX controllers pinctrl: aspeed-g4: Add mux configuration for all pins pinctrl: aspeed-g5: Add mux configuration for all pins pinctrl: aspeed: Fix kerneldoc return descriptions .../devicetree/bindings/pinctrl/pinctrl-aspeed.txt | 127 +- drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c | 1115 +- drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c | 1524 +++- drivers/pinctrl/aspeed/pinctrl-aspeed.c| 165 ++- drivers/pinctrl/aspeed/pinctrl-aspeed.h| 33 +- 5 files changed, 2835 insertions(+), 129 deletions(-) -- 2.9.3
[PATCH v4 1/5] pinctrl: aspeed: dt: Fix compatibles for the System Control Unit
Reference the SoC-specific compatible string in the examples as required. Signed-off-by: Andrew Jeffery --- Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt index 2ad18c4ea55c..b2efb73337c6 100644 --- a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt @@ -11,9 +11,14 @@ Required properties: "aspeed,ast2500-pinctrl" "aspeed,g5-pinctrl" -The pin controller node should be a child of a syscon node with the required +The pin controller node should be the child of a syscon node with the required property: -- compatible: "syscon", "simple-mfd" + +- compatible : Should be one of the following: + "aspeed,ast2400-scu", "syscon", "simple-mfd" + "aspeed,g4-scu", "syscon", "simple-mfd" + "aspeed,ast2500-scu", "syscon", "simple-mfd" + "aspeed,g5-scu", "syscon", "simple-mfd" Refer to the the bindings described in Documentation/devicetree/bindings/mfd/syscon.txt @@ -50,7 +55,7 @@ TIMER7 TIMER8 VGABIOSROM Examples: syscon: scu@1e6e2000 { - compatible = "syscon", "simple-mfd"; + compatible = "aspeed,ast2400-scu", "syscon", "simple-mfd"; reg = <0x1e6e2000 0x1a8>; pinctrl: pinctrl { -- 2.9.3
Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command
Hi, On 20 December 2016 at 15:18, Lu Baolu wrote: > Hi, > > On 12/20/2016 02:46 PM, Baolin Wang wrote: >> On 20 December 2016 at 14:39, Lu Baolu wrote: >>> Hi, >>> >>> On 12/20/2016 02:06 PM, Baolin Wang wrote: Hi, On 20 December 2016 at 12:29, Lu Baolu wrote: > Hi Mathias, > > On 12/19/2016 08:13 PM, Mathias Nyman wrote: >> On 19.12.2016 13:34, Baolin Wang wrote: >>> Hi Mathias, >>> >>> On 19 December 2016 at 18:33, Mathias Nyman >>> wrote: On 13.12.2016 05:21, Baolin Wang wrote: > Hi Mathias, > > On 12 December 2016 at 23:52, Mathias Nyman > wrote: >> On 05.12.2016 09:51, Baolin Wang wrote: >>> If a command event is found on the event ring during an interrupt, >>> we need to stop the command timer with del_timer(). Since >>> del_timer() >>> can fail if the timer is running and waiting on the xHCI lock, then >>> it maybe get the wrong timeout command in >>> xhci_handle_command_timeout() >>> if host fetched a new command and updated the xhci->current_cmd in >>> handle_cmd_completion(). For this situation, we need a way to signal >>> to the command timer that everything is fine and it should exit. >> >> Ah, right, this could actually happen. >> >>> We should introduce a counter (xhci->current_cmd_pending) for the >>> number >>> of pending commands. If we need to cancel the command timer and >>> del_timer() >>> succeeds, we decrement the number of pending commands. If >>> del_timer() >>> fails, >>> we leave the number of pending commands alone. >>> >>> For handling timeout command, in xhci_handle_command_timeout() we >>> will >>> check >>> the counter after decrementing it, if the counter >>> (xhci->current_cmd_pending) >>> is 0, which means xhci->current_cmd is the right timeout command. >>> If the >>> counter (xhci->current_cmd_pending) is greater than 0, which means >>> current >>> timeout command has been handled by host and host has fetched new >>> command >>> as >>> xhci->current_cmd, then just return and wait for new current >>> command. >> >> A counter like this could work. >> >> Writing the abort bit can generate either ABORT+STOP, or just STOP >> event, this seems to cover both. >> >> quick check, case 1: timeout and cmd completion at the same time. >> >> cpu1cpu2 >> >> queue_command(first), p++ (=1) >> queue_command(more), >> --completion irq fires---- timer times out at same >> time-- >> handle_cmd_completion() handle_cmd_timeout(),) >> lock(xhci_lock ) spin_on(xhci_lock) >> del_timer() fail, p (=1, nochange) >> cur_cmd = list_next(), p++ (=2) >> unlock(xhci_lock) >> lock(xhci_lock) >> p-- (=1) >> if (p > 0), exit >> OK works >> >> case 2: normal timeout case with ABORT+STOP, no race. >> >> cpu1cpu2 >> >> queue_command(first), p++ (=1) >> queue_command(more), >> handle_cmd_timeout() >> p-- (P=0), don't exit >> mod_timer(), p++ (P=1) >> write_abort_bit() >> handle_cmd_comletion(ABORT) >> del_timer(), ok, p-- (p = 0) >> handle_cmd_completion(STOP) >> del_timer(), fail, (P=0) >> handle_stopped_cmd_ring() >> cur_cmd = list_next(), p++ (=1) >> mod_timer() >> >> OK, works, and same for just STOP case, with the only difference that >> during handle_cmd_completion(STOP) p is decremented (p--) > Yes, that's the cases what I want to handle, thanks for your explicit > explanation. > Gave this some more thought over the weekend, and this implementation doesn't solve the case when the last command times out and races with the completion handler: cpu1cpu2 queue_command(first), p++ (=1) --completion irq fires---- timer times out at same time-- handle_cmd_completion() handle_cmd_timeout(),) lock(xhci_lock )spin_on(xhci_lock) del_timer() fail
Re: [PATCH 2/4] mmc: mmc: Add change to set controller to HS400 frequency in enhanced strobe
On 2016/12/20 13:53, Ritesh Harjani wrote: Currently mmc_select_hs400es is not setting the desired frequency before sending switch command. This makes CMD13 to timeout on some controllers. Thus add a change to set the desired HS400 frequency in mmc_select_hs400es when the timing is switched to HS400. What is the desired HS400 freq? I guess it is 200MHz, right? Well, per the spec, it just say "host *may* changes frequency to <= 200Mhz". So before setting timing to MMC_TIMING_MMC_HS400, we already reach ext_csd.hs_max_dtr. So it should meet the spec. If the fact that some controllers would see CMD13 timeout here, I guess you wouldn't let folks add max-frequency to the DT as if the max freq is set to just lower than your *desired HS400 freq* , it the same failure result even you add this patch, right? the root cause the controllers see failure for CMD13 is it didn't configure the right delay phase or something similar to fit the timing under the freq of hs_max_dtr, I guess. Signed-off-by: Ritesh Harjani --- drivers/mmc/core/mmc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index b61b52f9..eb69497 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -1329,6 +1329,7 @@ static int mmc_select_hs400es(struct mmc_card *card) /* Set host controller to HS400 timing and frequency */ mmc_set_timing(host, MMC_TIMING_MMC_HS400); + mmc_set_bus_speed(card); /* Controller enable enhanced strobe function */ host->ios.enhanced_strobe = true; -- Best Regards Shawn Lin
Re: [PATCH v3 2/5] lib: Add Sed-opal library
> + u8 lr; > + size_t key_name_len; > + char key_name[36]; Who is going to use the key_name? I can't find another reference to it anywhere else in the code. The reason why this tripped me off was the hardcoded length so I was going to check on how access to it is bounds checked. > +/** > + * struct opal_dev - The structure representing a OPAL enabled SED. > + * @sed_ctx:The SED context, contains fn pointers to sec_send/recv. > + * @opal_step:A series of opal methods that are necessary to complete a > comannd. > + * @func_data:An array of parameters for the opal methods above. > + * @state:Describes the current opal_step we're working on. > + * @dev_lock:Locks the entire opal_dev structure. > + * @parsed:Parsed response from controller. > + * @prev_data:Data returned from a method to the controller > + * @error_cb:Error function that handles closing sessions after a failed > method. > + * @unlk_lst:A list of Locking ranges to unlock on this device during a > resume. > + */ Needs spaces after the colons. > + u16 comID; > + u32 HSN; > + u32 TSN; Please use lower case variable names in Linux code, and separate words by underscores if needed. > +DEFINE_SPINLOCK(list_spinlock); Should be marked static. > +#define TPER_SYNC_SUPPORTED BIT(0) This sounds like a protocol constant and should go into the header with the rest of the protocol. > +static bool check_tper(const void *data) > +{ > + const struct d0_tper_features *tper = data; > + u8 flags = tper->supported_features; > + > + if (!(flags & TPER_SYNC_SUPPORTED)) { > + pr_err("TPer sync not supported. flags = %d\n", > +tper->supported_features); It would be great to use dev_err / dev_warn / etc in the opal code, although for that you'll need to pass a struct device from the driver into the code. > +static bool check_SUM(const void *data) The comment on member naming above also applies to variables and function names. > + bool foundComID = false, supported = true, single_user = false; > + const struct d0_header *hdr; > + const u8 *epos, *cpos; > + u16 comID = 0; > + int error = 0; > + > + epos = dev->cmd.resp; > + cpos = dev->cmd.resp; > + hdr = (struct d0_header *)dev->cmd.resp; You probably want to structure your command buffers to use void pointers and avoid ths kind of casting. > +#define TINY_ATOM_DATA_MASK GENMASK(5, 0) > +#define TINY_ATOM_SIGNED BIT(6) > + > +#define SHORT_ATOM_ID BIT(7) > +#define SHORT_ATOM_BYTESTRING BIT(5) > +#define SHORT_ATOM_SIGNED BIT(4) > +#define SHORT_ATOM_LEN_MASK GENMASK(3, 0) Protocol constants for the header again? > +#define ADD_TOKEN_STRING(cmd, key, keylen) \ > + if (!err) \ > + err = test_and_add_string(cmd, key, keylen); > + > +#define ADD_TOKEN(type, cmd, tok)\ > + if (!err) \ > + err = test_and_add_token_##type(cmd, tok); Please remove these macros and just open code the calls. If you want to avoid writing the if err lines again and again just pass err by reference to the functions and move the err check into the add_token* helpers. > + if ((hdr->cp.length == 0) > + || (hdr->pkt.length == 0) > + || (hdr->subpkt.length == 0)) { No need for the inner braces, also please place the operators at the end of the previous line instead of the beginning of the new line. > + while (cpos < total) { > + if (!(pos[0] & 0x80)) /* tiny atom */ > + token_length = response_parse_tiny(iter, pos); > + else if (!(pos[0] & 0x40)) /* short atom */ > + token_length = response_parse_short(iter, pos); > + else if (!(pos[0] & 0x20)) /* medium atom */ > + token_length = response_parse_medium(iter, pos); > + else if (!(pos[0] & 0x10)) /* long atom */ > + token_length = response_parse_long(iter, pos); Please add symbolic names for these constants to the protocol header. > + if (num_entries == 0) { > + pr_err("Couldn't parse response.\n"); > + return -EINVAL; > + resp->num = num_entries; Where is the closing brace for that if? > + if (!((resp->toks[n].width == OPAL_WIDTH_TINY) || > + (resp->toks[n].width == OPAL_WIDTH_SHORT))) { No need for the inner braces. > + pr_err("Atom is not short or tiny: %d\n", > +resp->toks[n].width); > + return 0; > + } > + > + return resp->toks[n].stored.u; > +} > + > +static u8 response_status(const struct parsed_resp *resp) > +{ > + if ((token_type(resp, 0) == OPAL_DTA_TOKENID_TOKEN) > + && (response_get_token(resp, 0) == OPAL_ENDOFSESSION)) { Same here, also please fix the operator placement. > + if ((token_type(resp, resp->num - 1) != OPAL_DTA_TOKENID_
Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command
Hi, On 12/20/2016 02:46 PM, Baolin Wang wrote: > On 20 December 2016 at 14:39, Lu Baolu wrote: >> Hi, >> >> On 12/20/2016 02:06 PM, Baolin Wang wrote: >>> Hi, >>> >>> On 20 December 2016 at 12:29, Lu Baolu wrote: Hi Mathias, On 12/19/2016 08:13 PM, Mathias Nyman wrote: > On 19.12.2016 13:34, Baolin Wang wrote: >> Hi Mathias, >> >> On 19 December 2016 at 18:33, Mathias Nyman >> wrote: >>> On 13.12.2016 05:21, Baolin Wang wrote: Hi Mathias, On 12 December 2016 at 23:52, Mathias Nyman wrote: > On 05.12.2016 09:51, Baolin Wang wrote: >> If a command event is found on the event ring during an interrupt, >> we need to stop the command timer with del_timer(). Since del_timer() >> can fail if the timer is running and waiting on the xHCI lock, then >> it maybe get the wrong timeout command in >> xhci_handle_command_timeout() >> if host fetched a new command and updated the xhci->current_cmd in >> handle_cmd_completion(). For this situation, we need a way to signal >> to the command timer that everything is fine and it should exit. > > Ah, right, this could actually happen. > >> We should introduce a counter (xhci->current_cmd_pending) for the >> number >> of pending commands. If we need to cancel the command timer and >> del_timer() >> succeeds, we decrement the number of pending commands. If del_timer() >> fails, >> we leave the number of pending commands alone. >> >> For handling timeout command, in xhci_handle_command_timeout() we >> will >> check >> the counter after decrementing it, if the counter >> (xhci->current_cmd_pending) >> is 0, which means xhci->current_cmd is the right timeout command. If >> the >> counter (xhci->current_cmd_pending) is greater than 0, which means >> current >> timeout command has been handled by host and host has fetched new >> command >> as >> xhci->current_cmd, then just return and wait for new current command. > > A counter like this could work. > > Writing the abort bit can generate either ABORT+STOP, or just STOP > event, this seems to cover both. > > quick check, case 1: timeout and cmd completion at the same time. > > cpu1cpu2 > > queue_command(first), p++ (=1) > queue_command(more), > --completion irq fires---- timer times out at same > time-- > handle_cmd_completion() handle_cmd_timeout(),) > lock(xhci_lock ) spin_on(xhci_lock) > del_timer() fail, p (=1, nochange) > cur_cmd = list_next(), p++ (=2) > unlock(xhci_lock) > lock(xhci_lock) > p-- (=1) > if (p > 0), exit > OK works > > case 2: normal timeout case with ABORT+STOP, no race. > > cpu1cpu2 > > queue_command(first), p++ (=1) > queue_command(more), > handle_cmd_timeout() > p-- (P=0), don't exit > mod_timer(), p++ (P=1) > write_abort_bit() > handle_cmd_comletion(ABORT) > del_timer(), ok, p-- (p = 0) > handle_cmd_completion(STOP) > del_timer(), fail, (P=0) > handle_stopped_cmd_ring() > cur_cmd = list_next(), p++ (=1) > mod_timer() > > OK, works, and same for just STOP case, with the only difference that > during handle_cmd_completion(STOP) p is decremented (p--) Yes, that's the cases what I want to handle, thanks for your explicit explanation. >>> Gave this some more thought over the weekend, and this implementation >>> doesn't solve the case when the last command times out and races with >>> the >>> completion handler: >>> >>> cpu1cpu2 >>> >>> queue_command(first), p++ (=1) >>> --completion irq fires---- timer times out at same >>> time-- >>> handle_cmd_completion() handle_cmd_timeout(),) >>> lock(xhci_lock )spin_on(xhci_lock) >>> del_timer() fail, p (=1, nochange) >>> no more commands, P (=1, nochange) >>> unlock(xhci_lock) >>> lock(xhci_lock) >>> p-- (=0
[PATCH v4 5/5] mfd: dt: Add bindings for the Aspeed SoC Display Controller (GFX)
The Aspeed SoC Display Controller is presented as a syscon device to arbitrate access by display and pinmux drivers. Video pinmux configuration on fifth generation SoCs depends on bits in both the System Control Unit and the Display Controller. Signed-off-by: Andrew Jeffery Acked-by: Rob Herring --- Documentation/devicetree/bindings/mfd/aspeed-gfx.txt | 17 + 1 file changed, 17 insertions(+) create mode 100644 Documentation/devicetree/bindings/mfd/aspeed-gfx.txt diff --git a/Documentation/devicetree/bindings/mfd/aspeed-gfx.txt b/Documentation/devicetree/bindings/mfd/aspeed-gfx.txt new file mode 100644 index ..aea5370efd97 --- /dev/null +++ b/Documentation/devicetree/bindings/mfd/aspeed-gfx.txt @@ -0,0 +1,17 @@ +* Device tree bindings for Aspeed SoC Display Controller (GFX) + +The Aspeed SoC Display Controller primarily does as its name suggests, but also +participates in pinmux requests on the g5 SoCs. It is therefore considered a +syscon device. + +Required properties: +- compatible: "aspeed,ast2500-gfx", "syscon" +- reg: contains offset/length value of the GFX memory + region. + +Example: + +gfx: display@1e6e6000 { + compatible = "aspeed,ast2500-gfx", "syscon"; + reg = <0x1e6e6000 0x1000>; +}; -- 2.9.3
[PATCH v4 4/5] mfd: dt: Add bindings for the Aspeed LPC Host Controller (LHC)
The LPC bus pinmux configuration on fifth generation Aspeed SoCs depends on bits in both the System Control Unit and the LPC Host Controller. The Aspeed LPC Host Controller is described as a child node of the LPC host-range syscon device for arbitration of access by the host controller and pinmux drivers. Signed-off-by: Andrew Jeffery Reviewed-by: Linus Walleij --- Linus: I've retained your r-b tag I don't think the addition of the ast2400 compatible string will fuss you. Please let me know if you feel this is inappropriate. .../devicetree/bindings/mfd/aspeed-lpc.txt | 26 ++ 1 file changed, 26 insertions(+) diff --git a/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt b/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt index a97131aba446..514d82ced95b 100644 --- a/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt +++ b/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt @@ -109,3 +109,29 @@ lpc: lpc@1e789000 { }; }; +Host Node Children +== + +LPC Host Controller +--- + +The Aspeed LPC Host Controller configures the Low Pin Count (LPC) bus behaviour +between the host and the baseboard management controller. The registers exist +in the "host" portion of the Aspeed LPC controller, which must be the parent of +the LPC host controller node. + +Required properties: + +- compatible: One of: + "aspeed,ast2400-lhc"; + "aspeed,ast2500-lhc"; + +- reg: contains offset/length values of the LHC memory regions. In the + AST2400 and AST2500 there are two regions. + +Example: + +lhc: lhc@20 { + compatible = "aspeed,ast2500-lhc"; + reg = <0x20 0x24 0x48 0x8>; +}; -- 2.9.3
[PATCH v4 1/5] mfd: dt: Fix "indicates" typo in mfd bindings document
Signed-off-by: Andrew Jeffery Acked-by: Linus Walleij Acked-by: Rob Herring --- Documentation/devicetree/bindings/mfd/mfd.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/mfd/mfd.txt b/Documentation/devicetree/bindings/mfd/mfd.txt index af9d6931a1a2..f1fceeda12f1 100644 --- a/Documentation/devicetree/bindings/mfd/mfd.txt +++ b/Documentation/devicetree/bindings/mfd/mfd.txt @@ -19,7 +19,7 @@ Optional properties: - compatible : "simple-mfd" - this signifies that the operating system should consider all subnodes of the MFD device as separate devices akin to how - "simple-bus" inidicates when to see subnodes as children for a simple + "simple-bus" indicates when to see subnodes as children for a simple memory-mapped bus. For more complex devices, when the nexus driver has to probe registers to figure out what child devices exist etc, this should not be used. In the latter case the child devices will be determined by the -- 2.9.3
[PATCH v4 2/5] mfd: dt: ranges, #address-cells and #size-cells as optional properties
Whilst describing a device and not a bus, simple-mfd is modelled on simple-bus where child nodes are iterated and registered as platform devices. Some complex devices, e.g. the Aspeed LPC controller, can benefit from address space mapping such that child nodes can use the regs property to describe their resource offsets within the multi-function device. Signed-off-by: Andrew Jeffery Acked-by: Rob Herring --- Documentation/devicetree/bindings/mfd/mfd.txt | 10 ++ 1 file changed, 10 insertions(+) diff --git a/Documentation/devicetree/bindings/mfd/mfd.txt b/Documentation/devicetree/bindings/mfd/mfd.txt index f1fceeda12f1..bcb6abb9d413 100644 --- a/Documentation/devicetree/bindings/mfd/mfd.txt +++ b/Documentation/devicetree/bindings/mfd/mfd.txt @@ -25,6 +25,16 @@ Optional properties: be used. In the latter case the child devices will be determined by the operating system. +- ranges: Describes the address mapping relationship to the parent. Should set + the child's base address to 0, the physical address within parent's address + space, and the length of the address map. + +- #address-cells: Specifies the number of cells used to represent physical base + addresses. Must be present if ranges is used. + +- #size-cells: Specifies the number of cells used to represent the size of an + address. Must be present if ranges is used. + Example: foo@1000 { -- 2.9.3
[PATCH v4 3/5] mfd: dt: Add Aspeed Low Pin Count Controller bindings
Signed-off-by: Andrew Jeffery Reviewed-by: Linus Walleij Reviewed-by: Joel Stanley Acked-by: Rob Herring --- .../devicetree/bindings/mfd/aspeed-lpc.txt | 111 + 1 file changed, 111 insertions(+) create mode 100644 Documentation/devicetree/bindings/mfd/aspeed-lpc.txt diff --git a/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt b/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt new file mode 100644 index ..a97131aba446 --- /dev/null +++ b/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt @@ -0,0 +1,111 @@ +== +Device tree bindings for the Aspeed Low Pin Count (LPC) Bus Controller +== + +The LPC bus is a means to bridge a host CPU to a number of low-bandwidth +peripheral devices, replacing the use of the ISA bus in the age of PCI[0]. The +primary use case of the Aspeed LPC controller is as a slave on the bus +(typically in a Baseboard Management Controller SoC), but under certain +conditions it can also take the role of bus master. + +The LPC controller is represented as a multi-function device to account for the +mix of functionality it provides. The principle split is between the register +layout at the start of the I/O space which is, to quote the Aspeed datasheet, +"basically compatible with the [LPC registers from the] popular BMC controller +H8S/2168[1]", and everything else, where everything else is an eclectic +collection of functions with a esoteric register layout. "Everything else", +here labeled the "host" portion of the controller, includes, but is not limited +to: + +* An IPMI Block Transfer[2] Controller + +* An LPC Host Controller: Manages LPC functions such as host vs slave mode, the + physical properties of some LPC pins, configuration of serial IRQs, and + APB-to-LPC bridging amonst other functions. + +* An LPC Host Interface Controller: Manages functions exposed to the host such + as LPC firmware hub cycles, configuration of the LPC-to-AHB mapping, UART + management and bus snoop configuration. + +* A set of SuperIO[3] scratch registers: Enables implementation of e.g. custom + hardware management protocols for handover between the host and baseboard + management controller. + +Additionally the state of the LPC controller influences the pinmux +configuration, therefore the host portion of the controller is exposed as a +syscon as a means to arbitrate access. + +[0] http://www.intel.com/design/chipsets/industry/25128901.pdf +[1] https://www.renesas.com/en-sg/doc/products/mpumcu/001/rej09b0078_h8s2168.pdf?key=7c88837454702128622bee53acbda8f4 +[2] http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/ipmi-second-gen-interface-spec-v2-rev1-1.pdf +[3] https://en.wikipedia.org/wiki/Super_I/O + +Required properties +=== + +- compatible: One of: + "aspeed,ast2400-lpc", "simple-mfd" + "aspeed,ast2500-lpc", "simple-mfd" + +- reg: contains the physical address and length values of the Aspeed +LPC memory region. + +- #address-cells: <1> +- #size-cells: <1> +- ranges: Maps 0 to the physical address and length of the LPC memory +region + +Required LPC Child nodes + + +BMC Node + + +- compatible: One of: + "aspeed,ast2400-lpc-bmc" + "aspeed,ast2500-lpc-bmc" + +- reg: contains the physical address and length values of the +H8S/2168-compatible LPC controller memory region + +Host Node +- + +- compatible: One of: + "aspeed,ast2400-lpc-host", "simple-mfd", "syscon" + "aspeed,ast2500-lpc-host", "simple-mfd", "syscon" + +- reg: contains the address and length values of the host-related +register space for the Aspeed LPC controller + +- #address-cells: <1> +- #size-cells: <1> +- ranges: Maps 0 to the address and length of the host-related LPC memory +region + +Example: + +lpc: lpc@1e789000 { + compatible = "aspeed,ast2500-lpc", "simple-mfd"; + reg = <0x1e789000 0x1000>; + + #address-cells = <1>; + #size-cells = <1>; + ranges = <0x0 0x1e789000 0x1000>; + + lpc_bmc: lpc-bmc@0 { + compatible = "aspeed,ast2500-lpc-bmc"; + reg = <0x0 0x80>; + }; + + lpc_host: lpc-host@80 { + compatible = "aspeed,ast2500-lpc-host", "simple-mfd", "syscon"; + reg = <0x80 0x1e0>; + reg-io-width = <4>; + + #address-cells = <1>; + #size-cells = <1>; + ranges = <0x0 0x80 0x1e0>; + }; +}; + -- 2.9.3
[PATCH v4 0/5] mfd: dt: Add bindings for the Aspeed MFDs
Hi Lee, Here's v4 of the Aspeed LPC MFD devicetree bindings series. v3 can be found at: https://lkml.org/lkml/2016/12/5/835 Changes since v3: * Based on Arnd's argument[1], drop the addition of the mfd/syscon bindings directory as well as the the last patch in v3, which moved a number of existing bindings. Eventually the Aspeed display controller will have a device-specific driver so it doesn't belong there either. * Add a compatible string for the AST2400 in the LPC Host Controller bindings as requested by Joel and slightly tweak the reg description for Rob. [1] https://lkml.org/lkml/2016/12/13/202 Andrew Jeffery (5): mfd: dt: Fix "indicates" typo in mfd bindings document mfd: dt: ranges, #address-cells and #size-cells as optional properties mfd: dt: Add Aspeed Low Pin Count Controller bindings mfd: dt: Add bindings for the Aspeed LPC Host Controller (LHC) mfd: dt: Add bindings for the Aspeed SoC Display Controller (GFX) .../devicetree/bindings/mfd/aspeed-gfx.txt | 17 +++ .../devicetree/bindings/mfd/aspeed-lpc.txt | 137 + Documentation/devicetree/bindings/mfd/mfd.txt | 12 +- 3 files changed, 165 insertions(+), 1 deletion(-) create mode 100644 Documentation/devicetree/bindings/mfd/aspeed-gfx.txt create mode 100644 Documentation/devicetree/bindings/mfd/aspeed-lpc.txt -- 2.9.3
Re: [PATCH] staging: lustre: ldlm: use designated initializers
On Mon, Dec 19, 2016 at 08:47:50AM -0800, Bruce Korb wrote: > On Mon, Dec 19, 2016 at 8:22 AM, James Simmons > >> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c > >> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c > >> @@ -143,7 +143,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock > >> *req, __u64 *flags, > >> int added = (mode == LCK_NL); > >> int overlaps = 0; > >> int splitted = 0; > >> - const struct ldlm_callback_suite null_cbs = { NULL }; > >> + const struct ldlm_callback_suite null_cbs = { }; > >> > >> CDEBUG(D_DLMTRACE, > >> "flags %#llx owner %llu pid %u mode %u start %llu end %llu\n", > > > > Nak. Filling null_cbs with random data is a bad idea. If you look at > > ldlm_lock_create() where this is used you have > > > > if (cbs) { > > lock->l_blocking_ast = cbs->lcs_blocking; > > lock->l_completion_ast = cbs->lcs_completion; > > lock->l_glimpse_ast = cbs->lcs_glimpse; > > } > > > > Having lock->l_* point to random addresses is a bad idea. > > What really needs to be done is proper initialization of that > > structure. A bunch of patches will be coming to address this. > > I'm not understanding the effect of the original difference. If you > specify any initializer, then all fields not specified are filled with > zero bits. Any pointers are, perforce, NULL. That should make both "{ > NULL }" and "{}" equivalent. They are equivalent, yes, but people want to use a GCC plugin that randomizes struct layouts for internal structures and the plugin doesn't work when you use struct ordering to initialize the struct. The plugin requires that you use designated intializers. regards, dan carpenter
Re: [tip:x86/urgent] x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6
On 12/19/16 02:56, tip-bot for Andy Lutomirski wrote: > Commit-ID: 3df8d9208569ef0b2313e516566222d745f3b94b > Gitweb: http://git.kernel.org/tip/3df8d9208569ef0b2313e516566222d745f3b94b > Author: Andy Lutomirski > AuthorDate: Thu, 15 Dec 2016 10:14:42 -0800 > Committer: Thomas Gleixner > CommitDate: Mon, 19 Dec 2016 11:50:24 +0100 > > x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6 > > A typo (or mis-merge?) resulted in leaf 6 only being probed if > cpuid_level >= 7. > > Fixes: 2ccd71f1b278 ("x86/cpufeature: Move some of the scattered feature bits > to x86_capability") > Signed-off-by: Andy Lutomirski > Acked-by: Borislav Petkov > Cc: Brian Gerst > Link: > http://lkml.kernel.org/r/6ea30c0e9daec21e488b54761881a6dfcf3e04d0.1481825597.git.l...@kernel.org > Signed-off-by: Thomas Gleixner > > --- > arch/x86/kernel/cpu/common.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index 1f6b50a..dc1697c 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -667,13 +667,14 @@ void get_cpu_cap(struct cpuinfo_x86 *c) > c->x86_capability[CPUID_1_EDX] = edx; > } > > + /* Thermal and Power Management Leaf: level 0x0006 (eax) */ > + if (c->cpuid_level >= 0x0006) > + c->x86_capability[CPUID_6_EAX] = cpuid_eax(0x0006); > + > /* Additional Intel-defined flags: level 0x0007 */ > if (c->cpuid_level >= 0x0007) { > cpuid_count(0x0007, 0, &eax, &ebx, &ecx, &edx); > - > c->x86_capability[CPUID_7_0_EBX] = ebx; > - > - c->x86_capability[CPUID_6_EAX] = cpuid_eax(0x0006); > c->x86_capability[CPUID_7_ECX] = ecx; > } Perhaps we should have something like: void cpuid_cond(u32 leaf, u32 subleaf, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx) { const struct cpuinfo_x86 *c = this_cpu_ptr(&cpu_info); u32 maxleaf; int cpuid_level = c->cpuid_level; switch (leaf >> 16) { case 0x: maxleaf = c->cpuid_level; break; case 0x8000: maxleaf = c->extended_cpuid_level; break; case 0x4000 .. 0x40ff: /* Not ideal, can we do better? */ maxleaf = cpu_has(X86_FEATURE_HYPERVISOR) ? 0x40ff : 0; break; /* Add Transmeta 0x8086 and VIA/Cyrix 0xc000? */ default: /* Can we do better? */ if (c->x86_vendor_id) == X86_VENDOR_INTEL) maxleaf = 0; else maxleaf = leaf | 0x; break; } if (cpuid_level >= 0 && leaf >= maxleaf) { cpuid_count(leaf, subleaf, eax, ebx, ecx, edx); } else { *eax = *ebx = *ecx = *edx = 0; } } -hpa P.S. I would love to (a) move the CPUID bits into a structure instead of passing all those crazy pointers around, and (b) stop passing struct cpuinfo * around when we only use it for the current processor anyway (a lot of these functions are in fact completely invalid if we don't); we could define this_cpu_info as (*this_cpu_ptr(&cpu_info)) -- basically what I have above -- or directly use percpu functions to access these variables.
Re: [PATCH v18 0/4] Introduce usb charger framework to deal with the usb gadget power negotation
Hi Neil, On 3 November 2016 at 09:25, NeilBrown wrote: > On Tue, Nov 01 2016, Baolin Wang wrote: > > >>> So I won't be responding on this topic any further until I see a genuine >>> attempt to understand and resolve the inconsistencies with >>> usb_register_notifier(). >> >> Any better solution? > > I'm not sure exactly what you are asking, so I'll assume you are asking > the question I want to answer :-) > > 1/ Liase with the extcon developers to resolve the inconsistencies > with USB connector types. > e.g. current there is both "EXTCON_USB" and "EXTCON_CHG_USB_SDP" > which both seem to suggest a standard downstream port. There is no > documentation describing how these relate, and no consistent practice > to copy. > I suspect the intention is that > EXTCON_USB and EXTCON_USB_HOST indicated that data capabilities of > the cable, while EXTCON_CHG_USB* indicate the power capabilities of > the cable. > So EXTCON_CHG_USB_SDP should always appear together with EXTCON_USB > while EXTCON_CHG_USB_DCP would not, and EXTCON_CHG_USB_ACA > would normally appear with EXTCON_USB_HOST (I think). > Some drivers follow this model, particularly extcon-max14577.c > but it is not consistent. > > This policy should be well documented and possibly existing drivers > should be updated to follow it. > > At the same time it would make sense to resolve EXTCON_CHG_USB_SLOW > and EXTCON_CHG_USB_FAST. These names don't mean much. > They were recently removed from drivers/power/axp288_charger.c > which is good, but are still used in drivers/extcon/extcon-max* > Possibly they should be changed to names from the standard, or > possibly they should be renamed to identify the current they are > expected to provide. e.g. EXTCON_CHG_USB_500MA and EXTCON_CHG_USB_1A Now I am creating the new patchset with fixing and converting exist drivers. I did some investigation about EXTCON subsystem. From your suggestion: 1. EXTCON_CHG_USB_SDP should always appear together with EXTCON_USB. After checking, now all extcon drivers were following this rule. 2. EXTCON_CHG_USB_ACA would normally appear with EXTCON_USB_HOST. Now no extcon drivers used EXTCON_CHG_USB_ACA, then no need to change. 3. Change EXTCON_CHG_USB_SLOW/FAST to EXTCON_CHG_USB_500MA/1A. There are no model that shows the slow charger should be 500mA and fast charger is 1A. (In extcon-max77693.c file, the fast charger can be drawn 2A), so changing to EXTCON_CHG_USB_500MA/1A is not useful I think. >From above investigation, I did not find anything need to be changed now. What do you think? Thanks. > > 2/ Change all usb phys to register an extcon and to send appropriate >notifications. Many already do, but I don't think it is universal. >It is probable that the extcon should be registered using common code >instead of each phy driver having its own >extcon_get_edev_by_phandle() >or whatever. >If the usb phy driver needs to look at battery charger registers to >know what sort of cable was connected (which I believe is the case >for the chips you are interested in), then it should do that. > > 3/ Currently some USB controllers discover that a cable was connected by >listening on an extcon, and some by registering for a usb_notifier >(described below) ... though there seem to only be 2 left which do that. >Now that all USB phys send connection information via extcon (see 2), >the USB controllers should be changed to all find out about the cable >using extcon. > > 4/ struct usb_phy contains: > /* for notification of usb_phy_events */ > struct atomic_notifier_head notifier; > > This is used inconsistently. Sometimes the argument passed > is NULL, sometimes it is a pointer to 'vbus_draw' - the current > limited negotiated via USB, sometimes it is a pointer the the gadget > though as far as I can tell, that last one is never used. > > This should be changed to be consistent. This notifier is no longer > needed to tell the USB controller that a cable was connected (extcon > now does that, see 3) so it is only used to communicate the > 'vbus_draw' information. > So it should be changed to *only* send a notification when vbus_draw > is known, and it should carry that information. > This should probably be done in common code, and removed > from individual drivers. > > 5/ Now that all cable connection notifications are sent over extcon and >all vbus_draw notifications are sent over the usb_phy notifier, write >some support code that a power supply client can use to be told what >power is available. >e.g. a battery charger driver would call: >register_power_client(.) >or similar, providing a phandle (or similar) for the usb phy and a >function to call back when the available current changes (or maybe a >work_struct containing the function pointer) > >register_power_client() would the
Klientskie bazy dannih! Uznaite podrobnee! Skype: prodawez390 Email: prodawez...@gmail.com tel +79139230330 (whatsapp\viber\telegram) Soberem dlja Vas po internet bazu dannyh potencial'nyh klientov d
Klientskie bazy dannih!!! Uznaite podrobnee po Skype: prodawez390 Email: prodawez...@gmail.com tel +79139230330 (whatsapp\viber\telegram) Soberem dlja Vas po internet bazu dannyh potencial'nyh klientov dlja Vashego Biznesa Mnogo! Bystro! Nedorogo! Uznajte podrobnee na kartinke https://is.gd/HP0cCY
Re: [PATCH] drm/mediatek: Support UYVY and YUYV format for overlay
Hi Bibby, On Wed, 2016-12-14 at 13:14 +0800, Bibby Hsieh wrote: > MT8173 overlay can support UYVY and YUYV format, > we add the format in DRM driver. > > Signed-off-by: Bibby Hsieh > --- > drivers/gpu/drm/mediatek/mtk_disp_ovl.c | 6 ++ > drivers/gpu/drm/mediatek/mtk_drm_plane.c | 2 ++ > 2 files changed, 8 insertions(+) > > diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c > b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c > index 019b7ca..0a340f3 100644 > --- a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c > +++ b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c > @@ -44,6 +44,8 @@ > #define OVL_CON_CLRFMT_RGB888(1 << 12) > #define OVL_CON_CLRFMT_RGBA (2 << 12) > #define OVL_CON_CLRFMT_ARGB (3 << 12) > +#define OVL_CON_CLRFMT_UYVY (4 << 12) > +#define OVL_CON_CLRFMT_YUYV (5 << 12) > #define OVL_CON_AEN BIT(8) > #define OVL_CON_ALPHA 0xff > > @@ -161,6 +163,10 @@ static unsigned int ovl_fmt_convert(unsigned int fmt) > case DRM_FORMAT_XBGR: > case DRM_FORMAT_ABGR: > return OVL_CON_CLRFMT_RGBA | OVL_CON_BYTE_SWAP; > + case DRM_FORMAT_YUYV: > + return OVL_CON_CLRFMT_YUYV; > + case DRM_FORMAT_UYVY: > + return OVL_CON_CLRFMT_UYVY; > } > } Your patch looks good, but I am not sure about some details. AFAIK, there is a color matrix here to describe how to transform from BT.601 / BT.709 / other color space to RGB color space. You can use the default value if you don't care how colors are represented. Or you can specify the matrix in this patch. Regards, yt.shen > > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_plane.c > b/drivers/gpu/drm/mediatek/mtk_drm_plane.c > index c461a23..b94c6ee 100644 > --- a/drivers/gpu/drm/mediatek/mtk_drm_plane.c > +++ b/drivers/gpu/drm/mediatek/mtk_drm_plane.c > @@ -28,6 +28,8 @@ > DRM_FORMAT_XRGB, > DRM_FORMAT_ARGB, > DRM_FORMAT_RGB565, > + DRM_FORMAT_YUYV, > + DRM_FORMAT_UYVY, > }; > > static void mtk_plane_reset(struct drm_plane *plane)
Re: [PATCH v3 2/5] lib: Add Sed-opal library
On Mon, Dec 19, 2016 at 12:35:46PM -0700, Scott Bauer wrote: > +int fdev_sed_ioctl(struct file *filep, unsigned int cmd, > +unsigned long arg) > +{ > + struct sed_key key; > + struct sed_context *sed_ctx; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EACCES; > + > + if (!filep->f_sedctx || !filep->f_sedctx->ops || !filep->f_sedctx->dev) > + return -ENODEV; > + > + sed_ctx = filep->f_sedctx; First of all, that's a bisect hazard. What's more, looking through the rest of patchset, WTF does it * need to be called that early in ioctl(2) handling, instead of having ->ioctl() instance for that sucker calling it? * _not_ get your ->f_sedctx as an explicit argument, passed by the caller in ->ioctl(), seeing that it's possible to calculate by file->private_data? * store that thing in struct file itself, bloating it for everything all for the sake of few drivers that might want to use that helper?
[PATCH] arm64/dts/ls2080a-rdb: remove disable status of pca9547
pca9547 won't probed since its status property is disabled. while there are devices connected to it, we need remove status property to let ds3232 and adt7461 probed correctly. Signed-off-by: Meng Yi --- arch/arm64/boot/dts/freescale/fsl-ls2080a-rdb.dts | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls2080a-rdb.dts b/arch/arm64/boot/dts/freescale/fsl-ls2080a-rdb.dts index 265e0a8..2ff46ca 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls2080a-rdb.dts +++ b/arch/arm64/boot/dts/freescale/fsl-ls2080a-rdb.dts @@ -102,7 +102,6 @@ reg = <0x75>; #address-cells = <1>; #size-cells = <0>; - status = "disabled"; i2c@1 { #address-cells = <1>; #size-cells = <0>; -- 2.1.0.27.g96db324
Re: [PATCH v3 4/5] nvme: Implement resume_from_suspend and SED Allocation code.
> +void nvme_unlock_from_suspend(struct nvme_ctrl *ctrl) > +{ > + if (opal_unlock_from_suspend(&ctrl->sed_ctx)) > + pr_warn("Failed to unlock one or more locking ranges!\n"); > +} > +EXPORT_SYMBOL_GPL(nvme_unlock_from_suspend); I don't think we even need this wrapper. Also for the warning please use dev_warn so that the user knows which controller failed. > @@ -1765,7 +1766,7 @@ static void nvme_reset_work(struct work_struct *work) > { > struct nvme_dev *dev = container_of(work, struct nvme_dev, reset_work); > int result = -ENODEV; > - > + bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL); > if (WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING)) Please don't remove the empty line after the variable declarations. Also I would place your new line before the result line, as that makes it a tad easier to read.
Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command
On 20 December 2016 at 14:39, Lu Baolu wrote: > Hi, > > On 12/20/2016 02:06 PM, Baolin Wang wrote: >> Hi, >> >> On 20 December 2016 at 12:29, Lu Baolu wrote: >>> Hi Mathias, >>> >>> On 12/19/2016 08:13 PM, Mathias Nyman wrote: On 19.12.2016 13:34, Baolin Wang wrote: > Hi Mathias, > > On 19 December 2016 at 18:33, Mathias Nyman > wrote: >> On 13.12.2016 05:21, Baolin Wang wrote: >>> Hi Mathias, >>> >>> On 12 December 2016 at 23:52, Mathias Nyman >>> wrote: On 05.12.2016 09:51, Baolin Wang wrote: > > If a command event is found on the event ring during an interrupt, > we need to stop the command timer with del_timer(). Since del_timer() > can fail if the timer is running and waiting on the xHCI lock, then > it maybe get the wrong timeout command in > xhci_handle_command_timeout() > if host fetched a new command and updated the xhci->current_cmd in > handle_cmd_completion(). For this situation, we need a way to signal > to the command timer that everything is fine and it should exit. Ah, right, this could actually happen. > > We should introduce a counter (xhci->current_cmd_pending) for the > number > of pending commands. If we need to cancel the command timer and > del_timer() > succeeds, we decrement the number of pending commands. If del_timer() > fails, > we leave the number of pending commands alone. > > For handling timeout command, in xhci_handle_command_timeout() we will > check > the counter after decrementing it, if the counter > (xhci->current_cmd_pending) > is 0, which means xhci->current_cmd is the right timeout command. If > the > counter (xhci->current_cmd_pending) is greater than 0, which means > current > timeout command has been handled by host and host has fetched new > command > as > xhci->current_cmd, then just return and wait for new current command. A counter like this could work. Writing the abort bit can generate either ABORT+STOP, or just STOP event, this seems to cover both. quick check, case 1: timeout and cmd completion at the same time. cpu1cpu2 queue_command(first), p++ (=1) queue_command(more), --completion irq fires---- timer times out at same time-- handle_cmd_completion() handle_cmd_timeout(),) lock(xhci_lock ) spin_on(xhci_lock) del_timer() fail, p (=1, nochange) cur_cmd = list_next(), p++ (=2) unlock(xhci_lock) lock(xhci_lock) p-- (=1) if (p > 0), exit OK works case 2: normal timeout case with ABORT+STOP, no race. cpu1cpu2 queue_command(first), p++ (=1) queue_command(more), handle_cmd_timeout() p-- (P=0), don't exit mod_timer(), p++ (P=1) write_abort_bit() handle_cmd_comletion(ABORT) del_timer(), ok, p-- (p = 0) handle_cmd_completion(STOP) del_timer(), fail, (P=0) handle_stopped_cmd_ring() cur_cmd = list_next(), p++ (=1) mod_timer() OK, works, and same for just STOP case, with the only difference that during handle_cmd_completion(STOP) p is decremented (p--) >>> >>> Yes, that's the cases what I want to handle, thanks for your explicit >>> explanation. >>> >> Gave this some more thought over the weekend, and this implementation >> doesn't solve the case when the last command times out and races with the >> completion handler: >> >> cpu1cpu2 >> >> queue_command(first), p++ (=1) >> --completion irq fires---- timer times out at same time-- >> handle_cmd_completion() handle_cmd_timeout(),) >> lock(xhci_lock )spin_on(xhci_lock) >> del_timer() fail, p (=1, nochange) >> no more commands, P (=1, nochange) >> unlock(xhci_lock) >> lock(xhci_lock) >> p-- (=0) >> p == 0, continue, even if we >> should >> not. >>
Re: [PATCH v3 1/5] include: Add definitions for sed
On Mon, Dec 19, 2016 at 12:35:45PM -0700, Scott Bauer wrote: > This patch adds the definitions and structures for the SED > Opal code. This seems to contain a few things: userspace ABIs, on the wire defintions, and prototypes for the code added in patch 2. The userspace ABIs should be a header on it's own. For new code we also usually try to split the protocol definitions from the in-kernel APIs, although that's not even reflected in the headers here. If you think it's reasonable to split those out those should be a separate new patch, and the actual in-kernel data structures and protoypes should just go with the actual code in your current patch 2. > +/** > + * struct sed_context - SED Security context for a device > + * @ops:The Trusted Send/Recv functions. > + * @sec_data:Opaque pointer that will be passed to the send/recv fn. > + *Drivers can use this to pass necessary data required for > + *Their implementation of send/recv. > + * @dev:Currently an Opal Dev structure. In the future can be other types > + *Of security structures. > + */ This is missing a couple spaces for the common kerneldoc format. > +struct sed_context { > + const struct sec_ops *ops; > + void *sec_data; > + void *dev; > +}; What is the difference between sec_data and dev? And do your really need either one when using the container_of trick I pointed out in response to the other patch. > +struct sed_key { > + __u32 sed_type; > + union { > + struct opal_keyopal; > + struct opal_new_pw opal_pw; > + struct opal_session_info opal_session; > + struct opal_user_lr_setup opal_lrs; > + struct opal_lock_unlockopal_lk_unlk; > + struct opal_mbr_data opal_mbr; > + /* additional command set key types */ > + }; > +}; > + > +#define IOC_SED_SAVE_IOW('p', 220, struct sed_key) > +#define IOC_SED_LOCK_UNLOCK _IOW('p', 221, struct sed_key) > +#define IOC_SED_TAKE_OWNERSHIP _IOW('p', 222, struct sed_key) > +#define IOC_SED_ACTIVATE_LSP _IOW('p', 223, struct sed_key) > +#define IOC_SED_SET_PW _IOW('p', 224, struct sed_key) > +#define IOC_SED_ACTIVATE_USR _IOW('p', 225, struct sed_key) > +#define IOC_SED_REVERT_TPR _IOW('p', 226, struct sed_key) > +#define IOC_SED_LR_SETUP _IOW('p', 227, struct sed_key) > +#define IOC_SED_ADD_USR_TO_LR _IOW('p', 228, struct sed_key) > +#define IOC_SED_ENABLE_DISABLE_MBR _IOW('p', 229, struct sed_key) > +#define IOC_SED_ERASE_LR _IOW('p', 230, struct sed_key) > +#define IOC_SED_SECURE_ERASE_LR_IOW('p', 231, struct sed_key) I'll need to look at the details again, but it seems like each ioctl uses exactly one of the structures in the union above, so there is no real need for that union.
Re: [PATCH v2] fscrypt: Factor out bio specific functions
Hi, > On 19.12.2016, at 12:25, Richard Weinberger wrote: > > That way we can get rid of the direct dependency on CONFIG_BLOCK. > > Reported-by: Arnd Bergmann > Reported-by: Randy Dunlap > Suggested-by: Christoph Hellwig > Fixes: d475a507457b ("ubifs: Add skeleton for fscrypto") > Signed-off-by: Richard Weinberger > --- > Changes since v1: > - Moved fscrypt_zeroout_range() also to bio.c Looks good to me. Reviewed-by: David Gstir - David
[PATCH v3 4/6] powerpc/perf: Add event attribute and group to IMC pmus
Device tree IMC driver code parses the IMC units and their events. It passes the information to IMC pmu code which is placed in powerpc/perf as "imc-pmu.c". This patch creates only event attributes and attribute groups for the IMC pmus. Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Michael Neuling Cc: Stewart Smith Cc: Daniel Axtens Cc: Stephane Eranian Signed-off-by: Hemant Kumar --- arch/powerpc/perf/Makefile| 6 +- arch/powerpc/perf/imc-pmu.c | 96 +++ arch/powerpc/platforms/powernv/opal-imc.c | 12 +++- 3 files changed, 111 insertions(+), 3 deletions(-) create mode 100644 arch/powerpc/perf/imc-pmu.c diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile index f102d53..6f1d0ac 100644 --- a/arch/powerpc/perf/Makefile +++ b/arch/powerpc/perf/Makefile @@ -2,10 +2,14 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror obj-$(CONFIG_PERF_EVENTS) += callchain.o perf_regs.o +imc-$(CONFIG_PPC_POWERNV) += imc-pmu.o + obj-$(CONFIG_PPC_PERF_CTRS)+= core-book3s.o bhrb.o obj64-$(CONFIG_PPC_PERF_CTRS) += power4-pmu.o ppc970-pmu.o power5-pmu.o \ power5+-pmu.o power6-pmu.o power7-pmu.o \ - isa207-common.o power8-pmu.o power9-pmu.o + isa207-common.o power8-pmu.o power9-pmu.o \ + $(imc-y) + obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c new file mode 100644 index 000..7b6ce50 --- /dev/null +++ b/arch/powerpc/perf/imc-pmu.c @@ -0,0 +1,96 @@ +/* + * Nest Performance Monitor counter support. + * + * Copyright (C) 2016 Madhavan Srinivasan, IBM Corporation. + * (C) 2016 Hemant K Shaw, IBM Corporation. + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ +#include +#include +#include +#include +#include +#include + +struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS]; +struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS]; + +/* dev_str_attr : Populate event "name" and string "str" in attribute */ +static struct attribute *dev_str_attr(const char *name, const char *str) +{ + struct perf_pmu_events_attr *attr; + + attr = kzalloc(sizeof(*attr), GFP_KERNEL); + + sysfs_attr_init(&attr->attr.attr); + + attr->event_str = str; + attr->attr.attr.name = name; + attr->attr.attr.mode = 0444; + attr->attr.show = perf_event_sysfs_show; + + return &attr->attr.attr; +} + +/* + * update_events_in_group: Update the "events" information in an attr_group + * and assign the attr_group to the pmu "pmu". + */ +static int update_events_in_group(struct imc_events *events, + int idx, struct imc_pmu *pmu) +{ + struct attribute_group *attr_group; + struct attribute **attrs; + int i; + + /* Allocate memory for attribute group */ + attr_group = kzalloc(sizeof(*attr_group), GFP_KERNEL); + if (!attr_group) + return -ENOMEM; + + /* Allocate memory for attributes */ + attrs = kzalloc((sizeof(struct attribute *) * (idx + 1)), GFP_KERNEL); + if (!attrs) { + kfree(attr_group); + return -ENOMEM; + } + + attr_group->name = "events"; + attr_group->attrs = attrs; + for (i = 0; i < idx; i++, events++) { + attrs[i] = dev_str_attr((char *)events->ev_name, + (char *)events->ev_value); + } + + pmu->attr_groups[0] = attr_group; + return 0; +} + +/* + * init_imc_pmu : Setup the IMC pmu device in "pmu_ptr" and its events + *"events". + * Setup the cpu mask information for these pmus and setup the state machine + * hotplug notifiers as well. + */ +int init_imc_pmu(struct imc_events *events, int idx, +struct imc_pmu *pmu_ptr) +{ + int ret = -ENODEV; + + ret = update_events_in_group(events, idx, pmu_ptr); + if (ret) + goto err_free; + + return 0; + +err_free: + /* Only free the attr_groups which are dynamically allocated */ + if (pmu_ptr->attr_groups[0]) { + kfree(pmu_ptr->attr_groups[0]->attrs); + kfree(pmu_ptr->attr_groups[0]); + } + + return ret; +} diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c index 5ee93402..7870401 100644 --- a/arch/powerpc/platforms/powernv/opal-imc.c +++ b/arch/powerpc/platforms/powernv/opal-imc.c @@ -31,8 +31,11 @@ #include #include -struct perchip_nest_info nest_perchip_i
[PATCH v3 3/6] powerpc/powernv: Detect supported IMC units and its events
Parse device tree to detect IMC units. Traverse through each IMC unit node to find supported events and corresponding unit/scale files (if any). Right now, only nest IMC units are supported. The nest IMC unit event node from device tree will contain the offset in the reserved memory region to get the counter data for a given event. The offsets for the nest events are contained in the "reg" property of the event "node". Kernel code uses this offset as event configuration value. Device tree parser code also looks for scale/unit property in the event node and passes on the value as an event attr for perf interface to use in the post processing by the perf tool. Some PMUs may have common scale and unit properties which implies that all events supported by this PMU inherit the scale and unit properties of the PMU itself. For those events, we need to set the common unit and scale values. For failure to initialize any unit or any event, disable that unit and continue setting up the rest of them. Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Michael Neuling Cc: Stewart Smith Cc: Daniel Axtens Cc: Stephane Eranian Signed-off-by: Hemant Kumar --- arch/powerpc/platforms/powernv/opal-imc.c | 332 ++ 1 file changed, 332 insertions(+) diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c index ee2ae45..5ee93402 100644 --- a/arch/powerpc/platforms/powernv/opal-imc.c +++ b/arch/powerpc/platforms/powernv/opal-imc.c @@ -32,6 +32,337 @@ #include struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS]; +struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS]; + +static int imc_event_info(char *name, struct imc_events *events) +{ + char *buf; + + /* memory for content */ + buf = kzalloc(IMC_MAX_PMU_NAME_LEN, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + events->ev_name = name; + events->ev_value = buf; + return 0; +} + +static int imc_event_info_str(struct property *pp, char *name, + struct imc_events *events) +{ + int ret; + + ret = imc_event_info(name, events); + if (ret) + return ret; + + if (!pp->value || (strnlen(pp->value, pp->length) == pp->length) || + (pp->length > IMC_MAX_PMU_NAME_LEN)) + return -EINVAL; + strncpy(events->ev_value, (const char *)pp->value, pp->length); + + return 0; +} + +static int imc_event_info_val(char *name, u32 val, + struct imc_events *events) +{ + int ret; + + ret = imc_event_info(name, events); + if (ret) + return ret; + sprintf(events->ev_value, "event=0x%x", val); + + return 0; +} + +static int set_event_property(struct property *pp, char *event_prop, + struct imc_events *events, char *ev_name) +{ + char *buf; + int ret; + + buf = kzalloc(IMC_MAX_PMU_NAME_LEN, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + sprintf(buf, "%s.%s", ev_name, event_prop); + ret = imc_event_info_str(pp, buf, events); + if (ret) { + kfree(events->ev_name); + kfree(events->ev_value); + } + + return ret; +} + +/* + * imc_events_node_parser: Parse the event node "dev" and assign the parsed + * information to event "events". + * + * Parses the "reg" property of this event. "reg" gives us the event offset. + * Also, parse the "scale" and "unit" properties, if any. + */ +static int imc_events_node_parser(struct device_node *dev, + struct imc_events *events, + struct property *event_scale, + struct property *event_unit) +{ + struct property *name, *pp; + char *ev_name; + u32 val; + int idx = 0, ret; + + if (!dev) + return -EINVAL; + + /* +* Loop through each property of an event node +*/ + name = of_find_property(dev, "event-name", NULL); + if (!name) + return -ENODEV; + + if (!name->value || + (strnlen(name->value, name->length) == name->length) || + (name->length > IMC_MAX_PMU_NAME_LEN)) + return -EINVAL; + + ev_name = kzalloc(IMC_MAX_PMU_NAME_LEN, GFP_KERNEL); + if (!ev_name) + return -ENOMEM; + + strncpy(ev_name, name->value, name->length); + + /* +* Parse each property of this event node "dev". Property "reg" has +* the offset which is assigned to the event name. Other properties +* like "scale" and "unit" are assigned to event.scale and event.unit +* accordingly. +*/ + for_each_property_of_node(dev, pp) { + /* +* If there is an issu
[PATCH v3 6/6] powerpc/perf: IMC pmu cpumask and cpu hotplug support
Adds cpumask attribute to be used by each IMC pmu. Only one cpu (any online CPU) from each chip for nest PMUs is designated to read counters. On CPU hotplug, dying CPU is checked to see whether it is one of the designated cpus, if yes, next online cpu from the same chip (for nest units) is designated as new cpu to read counters. Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Michael Neuling Cc: Stewart Smith Cc: Daniel Axtens Cc: Stephane Eranian Signed-off-by: Hemant Kumar --- arch/powerpc/include/asm/opal-api.h| 3 +- arch/powerpc/include/asm/opal.h| 2 + arch/powerpc/perf/imc-pmu.c| 167 - arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + 4 files changed, 171 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index 0e2e57b..48e1d3e 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -167,7 +167,8 @@ #define OPAL_INT_EOI 124 #define OPAL_INT_SET_MFRR 125 #define OPAL_PCI_TCE_KILL 126 -#define OPAL_LAST 126 +#define OPAL_NEST_IMC_COUNTERS_CONTROL 128 +#define OPAL_LAST 128 /* Device tree flags */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index e958b70..fe72b57 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -229,6 +229,8 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t kill_type, int64_t opal_rm_pci_tce_kill(uint64_t phb_id, uint32_t kill_type, uint32_t pe_num, uint32_t tce_size, uint64_t dma_addr, uint32_t npages); +int64_t opal_nest_imc_counters_control(uint64_t mode, uint64_t value1, + uint64_t value2, uint64_t value3); /* Internal functions */ extern int early_init_dt_scan_opal(unsigned long node, const char *uname, diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index f12ece8..49f6486 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -16,6 +16,7 @@ struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS]; struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS]; +static cpumask_t nest_imc_cpumask; /* Needed for sanity check */ extern u64 nest_max_offset; @@ -31,6 +32,164 @@ static struct attribute_group imc_format_group = { .attrs = imc_format_attrs, }; +/* Get the cpumask printed to a buffer "buf" */ +static ssize_t imc_pmu_cpumask_get_attr(struct device *dev, + struct device_attribute *attr, char *buf) +{ + cpumask_t *active_mask; + + active_mask = &nest_imc_cpumask; + return cpumap_print_to_pagebuf(true, buf, active_mask); +} + +static DEVICE_ATTR(cpumask, S_IRUGO, imc_pmu_cpumask_get_attr, NULL); + +static struct attribute *imc_pmu_cpumask_attrs[] = { + &dev_attr_cpumask.attr, + NULL, +}; + +static struct attribute_group imc_pmu_cpumask_attr_group = { + .attrs = imc_pmu_cpumask_attrs, +}; + +/* + * nest_init : Initializes the nest imc engine for the current chip. + */ +static void nest_init(int *loc) +{ + int rc; + + rc = opal_nest_imc_counters_control(NEST_IMC_PRODUCTION_MODE, + NEST_IMC_ENGINE_START, 0, 0); + if (rc) + loc[smp_processor_id()] = 1; +} + +static void nest_change_cpu_context(int old_cpu, int new_cpu) +{ + int i; + + for (i = 0; +(per_nest_pmu_arr[i] != NULL) && (i < IMC_MAX_PMUS); i++) + perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu, + old_cpu, new_cpu); +} + +static int ppc_nest_imc_cpu_online(unsigned int cpu) +{ + int nid, fcpu, ncpu; + struct cpumask *l_cpumask, tmp_mask; + + /* Fint the cpumask of this node */ + nid = cpu_to_node(cpu); + l_cpumask = cpumask_of_node(nid); + + /* +* If any of the cpu from this node is already present in the mask, +* just return, if not, then set this cpu in the mask. +*/ + if (!cpumask_and(&tmp_mask, l_cpumask, &nest_imc_cpumask)) { + cpumask_set_cpu(cpu, &nest_imc_cpumask); + return 0; + } + + fcpu = cpumask_first(l_cpumask); + ncpu = cpumask_next(cpu, l_cpumask); + if (cpu == fcpu) { + if (cpumask_test_and_clear_cpu(ncpu, &nest_imc_cpumask)) { + cpumask_set_cpu(cpu, &nest_imc_cpumask); + nest_change_cpu_context(ncpu, cpu); + } + } + + return 0; +} + +static int ppc_nest_imc_cpu_offline(unsigned int cpu) +{ + int nid, target = -1; + st
[PATCH v3 1/6] powerpc/powernv: Data structure and macros definitions
Create new header file "imc-pmu.h" to add the data structures and macros needed for IMC pmu support. Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Michael Neuling Cc: Stewart Smith Cc: Daniel Axtens Cc: Stephane Eranian Signed-off-by: Hemant Kumar --- arch/powerpc/include/asm/imc-pmu.h | 73 ++ 1 file changed, 73 insertions(+) create mode 100644 arch/powerpc/include/asm/imc-pmu.h diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h new file mode 100644 index 000..911d837 --- /dev/null +++ b/arch/powerpc/include/asm/imc-pmu.h @@ -0,0 +1,73 @@ +#ifndef PPC_POWERNV_IMC_PMU_DEF_H +#define PPC_POWERNV_IMC_PMU_DEF_H + +/* + * IMC Nest Performance Monitor counter support. + * + * Copyright (C) 2016 Madhavan Srinivasan, IBM Corporation. + * (C) 2016 Hemant K Shaw, IBM Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include + +#define IMC_MAX_CHIPS 32 +#define IMC_MAX_PMUS 32 +#define IMC_MAX_PMU_NAME_LEN 256 + +#define NEST_IMC_ENGINE_START 1 +#define NEST_IMC_ENGINE_STOP 0 +#define NEST_MAX_PAGES 16 + +#define NEST_IMC_PRODUCTION_MODE 1 + +#define IMC_DTB_COMPAT "ibm,opal-in-memory-counters" +#define IMC_DTB_NEST_COMPAT"ibm,imc-counters-chip" + +/* + * Structure to hold per chip specific memory address + * information for nest pmus. Nest Counter data are exported + * in per-chip reserved memory region by the PORE Engine. + */ +struct perchip_nest_info { + u32 chip_id; + u64 pbase; + u64 vbase[NEST_MAX_PAGES]; + u64 size; +}; + +/* + * Place holder for nest pmu events and values. + */ +struct imc_events { + char *ev_name; + char *ev_value; +}; + +/* + * Device tree parser code detects IMC pmu support and + * registers new IMC pmus. This structure will + * hold the pmu functions and attrs for each imc pmu and + * will be referenced at the time of pmu registration. + */ +struct imc_pmu { + struct pmu pmu; + int domain; + const struct attribute_group *attr_groups[4]; +}; + +/* + * Domains for IMC PMUs + */ +#define IMC_DOMAIN_NEST1 + +#define UNKNOWN_DOMAIN -1 + +#endif /* PPC_POWERNV_IMC_PMU_DEF_H */ -- 2.7.4
[PATCH v3 5/6] powerpc/perf: Generic imc pmu event functions
Since, the IMC counters' data are periodically fed to a memory location, the functions to read/update, start/stop, add/del can be generic and can be used by all IMC PMU units. This patch adds a set of generic imc pmu related event functions to be used by each imc pmu unit. Add code to setup format attribute and to register imc pmus. Add a event_init function for nest_imc events. Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Michael Neuling Cc: Stewart Smith Cc: Daniel Axtens Cc: Stephane Eranian Signed-off-by: Hemant Kumar --- arch/powerpc/include/asm/imc-pmu.h| 1 + arch/powerpc/perf/imc-pmu.c | 122 ++ arch/powerpc/platforms/powernv/opal-imc.c | 29 ++- 3 files changed, 148 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h index 911d837..ceb6b1f 100644 --- a/arch/powerpc/include/asm/imc-pmu.h +++ b/arch/powerpc/include/asm/imc-pmu.h @@ -70,4 +70,5 @@ struct imc_pmu { #define UNKNOWN_DOMAIN -1 +int imc_get_domain(struct device_node *pmu_dev); #endif /* PPC_POWERNV_IMC_PMU_DEF_H */ diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index 7b6ce50..f12ece8 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -17,6 +17,117 @@ struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS]; struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS]; +/* Needed for sanity check */ +extern u64 nest_max_offset; + +PMU_FORMAT_ATTR(event, "config:0-20"); +static struct attribute *imc_format_attrs[] = { + &format_attr_event.attr, + NULL, +}; + +static struct attribute_group imc_format_group = { + .name = "format", + .attrs = imc_format_attrs, +}; + +static int nest_imc_event_init(struct perf_event *event) +{ + int chip_id; + u32 config = event->attr.config; + struct perchip_nest_info *pcni; + + if (event->attr.type != event->pmu->type) + return -ENOENT; + + /* Sampling not supported */ + if (event->hw.sample_period) + return -EINVAL; + + /* unsupported modes and filters */ + if (event->attr.exclude_user || + event->attr.exclude_kernel || + event->attr.exclude_hv || + event->attr.exclude_idle || + event->attr.exclude_host || + event->attr.exclude_guest) + return -EINVAL; + + if (event->cpu < 0) + return -EINVAL; + + /* Sanity check for config (event offset) */ + if (config > nest_max_offset) + return -EINVAL; + + chip_id = topology_physical_package_id(event->cpu); + pcni = &nest_perchip_info[chip_id]; + event->hw.event_base = pcni->vbase[config/PAGE_SIZE] + + (config & ~PAGE_MASK); + + return 0; +} + +static void imc_read_counter(struct perf_event *event) +{ + u64 *addr, data; + + addr = (u64 *)event->hw.event_base; + data = __be64_to_cpu(*addr); + local64_set(&event->hw.prev_count, data); +} + +static void imc_perf_event_update(struct perf_event *event) +{ + u64 counter_prev, counter_new, final_count, *addr; + + addr = (u64 *)event->hw.event_base; + counter_prev = local64_read(&event->hw.prev_count); + counter_new = __be64_to_cpu(*addr); + final_count = counter_new - counter_prev; + + local64_set(&event->hw.prev_count, counter_new); + local64_add(final_count, &event->count); +} + +static void imc_event_start(struct perf_event *event, int flags) +{ + imc_read_counter(event); +} + +static void imc_event_stop(struct perf_event *event, int flags) +{ + if (flags & PERF_EF_UPDATE) + imc_perf_event_update(event); +} + +static int imc_event_add(struct perf_event *event, int flags) +{ + if (flags & PERF_EF_START) + imc_event_start(event, flags); + + return 0; +} + +/* update_pmu_ops : Populate the appropriate operations for "pmu" */ +static int update_pmu_ops(struct imc_pmu *pmu) +{ + if (!pmu) + return -EINVAL; + + pmu->pmu.task_ctx_nr = perf_invalid_context; + pmu->pmu.event_init = nest_imc_event_init; + pmu->pmu.add = imc_event_add; + pmu->pmu.del = imc_event_stop; + pmu->pmu.start = imc_event_start; + pmu->pmu.stop = imc_event_stop; + pmu->pmu.read = imc_perf_event_update; + pmu->attr_groups[1] = &imc_format_group; + pmu->pmu.attr_groups = pmu->attr_groups; + + return 0; +} + /* dev_str_attr : Populate event "name" and string "str" in attribute */ static struct attribute *dev_str_attr(const char *name, const char *str) { @@ -83,6 +194,17 @@ int init_imc_pmu(struct imc_events *events, int idx, if (ret) goto err_free; + ret = upda
[PATCH v3 2/6] powerpc/powernv: Autoload IMC device driver module
This patch does three things : - Enables "opal.c" to create a platform device for the IMC interface according to the appropriate compatibility string. - Find the reserved-memory region details from the system device tree and get the base address of HOMER region address for each chip. - We also get the Nest PMU counter data offsets (in the HOMER region) and their sizes. The offsets for the counters' data are fixed and won't change from chip to chip. The device tree parsing logic is separated from the PMU creation functions (which is done in subsequent patches). Right now, only Nest units are taken care of. Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Michael Neuling Cc: Stewart Smith Cc: Daniel Axtens Cc: Stephane Eranian Signed-off-by: Hemant Kumar --- arch/powerpc/platforms/powernv/Makefile | 2 +- arch/powerpc/platforms/powernv/opal-imc.c | 117 ++ arch/powerpc/platforms/powernv/opal.c | 13 3 files changed, 131 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index b5d98cb..44909fe 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o opal-async.o idle.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o -obj-y += opal-kmsg.o +obj-y += opal-kmsg.o opal-imc.o obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c new file mode 100644 index 000..ee2ae45 --- /dev/null +++ b/arch/powerpc/platforms/powernv/opal-imc.c @@ -0,0 +1,117 @@ +/* + * OPAL IMC interface detection driver + * Supported on POWERNV platform + * + * Copyright (C) 2016 Madhavan Srinivasan, IBM Corporation. + *(C) 2016 Hemant K Shaw, IBM Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS]; + +static int opal_imc_counters_probe(struct platform_device *pdev) +{ + struct device_node *child, *imc_dev, *rm_node = NULL; + struct perchip_nest_info *pcni; + u32 reg[4], pages, nest_offset, nest_size, idx; + int i = 0; + const char *node_name; + + if (!pdev || !pdev->dev.of_node) + return -ENODEV; + + imc_dev = pdev->dev.of_node; + + /* +* nest_offset : where the nest-counters' data start. +* size : size of the entire nest-counters region +*/ + if (of_property_read_u32(imc_dev, "imc-nest-offset", &nest_offset)) + goto err; + if (of_property_read_u32(imc_dev, "imc-nest-size", &nest_size)) + goto err; + + /* Find the "homer region" for each chip */ + rm_node = of_find_node_by_path("/reserved-memory"); + if (!rm_node) + goto err; + + for_each_child_of_node(rm_node, child) { + if (of_property_read_string_index(child, "name", 0, + &node_name)) + continue; + if (strncmp("ibm,homer-image", node_name, + strlen("ibm,homer-image"))) + continue; + + /* Get the chip id to which the above homer region belongs to */ + if (of_property_read_u32(child, "ibm,chip-id", &idx)) + goto err; + + /* reg property will have four u32 cells. */ + if (of_property_read_u32_array(child, "reg", reg, 4)) + goto err; + + pcni = &nest_perchip_info[idx]; + + /* Fetch the homer region base address */ + pcni->pbase = reg[0]; + pcni->pbase = pcni->pbase << 32 | reg[1]; + /* Add the nest IMC Base offset */ + pcni->pbase = pcni->pbase + nest_offset; +
[PATCH v3 0/6] IMC Instrumentation Support
Power 9 has In-Memory-Collection (IMC) infrastructure which contains various Performance Monitoring Units (PMUs) at Nest level (these are on-chip but off-core). These Nest PMU counters are handled by a Nest IMC microcode. This microcode runs in the OCC (On-Chip Controller) complex and its purpose is to program the nest counters, collect the counter data and move the counter data to memory. The IMC infrastructure encapsulates nest (per-chip), core and thread level counters. While the nest IMC PMUs are handled by the nest IMC microcode, the core and thread level PMUs are handled by the Core-HPMC engine. This patchset enables the nest IMC PMUs and is based on the initial work done by Madhavan Srinivasan. "Nest Instrumentation Support" : https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132078.html v1 for this patchset can be found here : https://lwn.net/Articles/705475/ Nest events: Per-chip nest instrumentation provides various per-chip metrics such as memory, powerbus, Xlink and Alink bandwidth. PMU Events' Information: OPAL obtains the Nest PMU and event information from the IMC Catalog and passes on to the kernel via the device tree. The events' information contains : - Event name - Event Offset - Event description and, maybe : - Event scale - Event unit Some PMUs may have a common scale and unit values for all their supported events. For those cases, the scale and unit properties for those events must be inherited from the PMU. The event offset in the memory is where the counter data gets accumulated. The OPAL-side patches are posted upstream : https://lists.ozlabs.org/pipermail/skiboot/2016-November/005552.html The kernel discovers the IMC counters information in the device tree at the "imc-counters" device node which has a compatible field "ibm,opal-in-memory-counters". Parsing of the Events' information: To parse the IMC PMUs and events information, the kernel has to discover the "imc-counters" node and walk through the pmu and event nodes. Here is an excerpt of the dt showing the imc-counters and mcs node: /dts-v1/; [...] imc-counters { imc-nest-offset = <0x32>; compatible = "ibm,opal-in-memory-counters"; imc-nest-size = <0x3>; #address-cells = <0x1>; #size-cells = <0x1>; phandle = <0x1238>; version-id = [00]; mcs0 { compatible = "ibm,imc-counters-chip"; ranges; #address-cells = <0x1>; #size-cells = <0x1>; phandle = <0x1279>; scale = "1.2207e-4"; unit = "MiB"; event@528 { event-name = "PM_MCS_UP_128B_DATA_XFER_MC0" ; desc = "Total Read Bandwidth seen on both MCS of MC0"; phandle = <0x128c>; reg = <0x118 0x8>; }; [...] >From the device tree, the kernel parses the PMUs and their events' information. After parsing the nest IMC PMUs and their events, the PMUs and their attributes are registered in the kernel. Example Usage : # perf list [...] nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/ [Kernel PMU event] nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0_LAST_SAMPLE/ [Kernel PMU event] [...] # perf stat -e "nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/" -a --per-socket TODOs: - Add support for Core IMC. - Add support for thread IMC. Comments/feedback/suggestions are welcome. Changelog: v2 -> v3 : - Changed all references for IMA (In-Memory Accumulation) to IMC (In-Memory Collection). v1 -> v2 : - Account for the cases where a PMU can have a common scale and unit values for all its supported events (Patch 3/6). - Fixed a Build error (for maple_defconfig) by enabling imc_pmu.o only for CONFIG_PPC_POWERNV=y (Patch 4/6) - Read from the "event-name" property instead of "name" for an event node (Patch 3/6). Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Sukadev Bhattiprolu Cc: Michael Neuling Cc: Stewart Smith Cc: Daniel Axtens Cc: Stephane Eranian Signed-off-by: Hemant Kumar Hemant Kumar (6): powerpc/powernv: Data structure and macros definitions powerpc/powernv: Autoload IMC device driver module powerpc/powernv: Detect supported IMC units and its events powerpc/perf: Add event attribute and group to IMC pmus powerpc/perf: Generic imc pmu event functions powerpc/perf: IMC pmu cpumask and cpu hotplug support arch/powerpc/include/asm/imc-pmu.h | 74 arch/powerpc/include/asm/opal-api.h| 3 +- arch/powerpc/include/asm/opal.h
Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command
Hi, On 12/20/2016 02:06 PM, Baolin Wang wrote: > Hi, > > On 20 December 2016 at 12:29, Lu Baolu wrote: >> Hi Mathias, >> >> On 12/19/2016 08:13 PM, Mathias Nyman wrote: >>> On 19.12.2016 13:34, Baolin Wang wrote: Hi Mathias, On 19 December 2016 at 18:33, Mathias Nyman wrote: > On 13.12.2016 05:21, Baolin Wang wrote: >> Hi Mathias, >> >> On 12 December 2016 at 23:52, Mathias Nyman >> wrote: >>> On 05.12.2016 09:51, Baolin Wang wrote: If a command event is found on the event ring during an interrupt, we need to stop the command timer with del_timer(). Since del_timer() can fail if the timer is running and waiting on the xHCI lock, then it maybe get the wrong timeout command in xhci_handle_command_timeout() if host fetched a new command and updated the xhci->current_cmd in handle_cmd_completion(). For this situation, we need a way to signal to the command timer that everything is fine and it should exit. >>> >>> >>> Ah, right, this could actually happen. >>> We should introduce a counter (xhci->current_cmd_pending) for the number of pending commands. If we need to cancel the command timer and del_timer() succeeds, we decrement the number of pending commands. If del_timer() fails, we leave the number of pending commands alone. For handling timeout command, in xhci_handle_command_timeout() we will check the counter after decrementing it, if the counter (xhci->current_cmd_pending) is 0, which means xhci->current_cmd is the right timeout command. If the counter (xhci->current_cmd_pending) is greater than 0, which means current timeout command has been handled by host and host has fetched new command as xhci->current_cmd, then just return and wait for new current command. >>> >>> >>> A counter like this could work. >>> >>> Writing the abort bit can generate either ABORT+STOP, or just STOP >>> event, this seems to cover both. >>> >>> quick check, case 1: timeout and cmd completion at the same time. >>> >>> cpu1cpu2 >>> >>> queue_command(first), p++ (=1) >>> queue_command(more), >>> --completion irq fires---- timer times out at same >>> time-- >>> handle_cmd_completion() handle_cmd_timeout(),) >>> lock(xhci_lock ) spin_on(xhci_lock) >>> del_timer() fail, p (=1, nochange) >>> cur_cmd = list_next(), p++ (=2) >>> unlock(xhci_lock) >>> lock(xhci_lock) >>> p-- (=1) >>> if (p > 0), exit >>> OK works >>> >>> case 2: normal timeout case with ABORT+STOP, no race. >>> >>> cpu1cpu2 >>> >>> queue_command(first), p++ (=1) >>> queue_command(more), >>> handle_cmd_timeout() >>> p-- (P=0), don't exit >>> mod_timer(), p++ (P=1) >>> write_abort_bit() >>> handle_cmd_comletion(ABORT) >>> del_timer(), ok, p-- (p = 0) >>> handle_cmd_completion(STOP) >>> del_timer(), fail, (P=0) >>> handle_stopped_cmd_ring() >>> cur_cmd = list_next(), p++ (=1) >>> mod_timer() >>> >>> OK, works, and same for just STOP case, with the only difference that >>> during handle_cmd_completion(STOP) p is decremented (p--) >> >> Yes, that's the cases what I want to handle, thanks for your explicit >> explanation. >> > Gave this some more thought over the weekend, and this implementation > doesn't solve the case when the last command times out and races with the > completion handler: > > cpu1cpu2 > > queue_command(first), p++ (=1) > --completion irq fires---- timer times out at same time-- > handle_cmd_completion() handle_cmd_timeout(),) > lock(xhci_lock )spin_on(xhci_lock) > del_timer() fail, p (=1, nochange) > no more commands, P (=1, nochange) > unlock(xhci_lock) > lock(xhci_lock) > p-- (=0) > p == 0, continue, even if we > should > not. >For this we still need to rely > on > checking cur_cmd == NULL in the timeout function. > (Baolus patch sets it to NULL if there are no more commands pending)
[PATCH v2] ipc/sem.c: fix incorrect sem_lock pairing
Based on the syzcaller test case from dvyukov: https://gist.githubusercontent.com/dvyukov/d0e5efefe4d7d6daed829f5c3ca26a40/raw/08d0a261fe3c987bed04fbf267e08ba04bd533ea/gistfile1.txt The slow (i.e.: failure to acquire) syscall exit from semtimedop() incorrectly assumed that the the same lock is acquired as it was at the initial syscall entry. This is wrong: - thread A: single semop semop(), sleeps - thread B: multi semop semop(), sleeps - thread A: woken up by signal/timeout With this sequence, the initial sem_lock() call locks the per-semaphore spinlock, and it is unlocked with sem_unlock(). The call at the syscall return locks the global spinlock. Because locknum is not updated, the following sem_unlock() call unlocks the per-semaphore spinlock, which is actually not locked. The fix is trivial: Use the return value from sem_lock. Reported-by: dvyu...@google.com Signed-off-by: Manfred Spraul Fixes: 370b262c896e ("ipc/sem: avoid idr tree lookup for interrupted semop") Cc: d...@stgolabs.net --- Patch V2: - subject line updated - documentation slightly updated ipc/sem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ipc/sem.c b/ipc/sem.c index e08b948..3ec5742 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -1977,7 +1977,7 @@ SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops, } rcu_read_lock(); - sem_lock(sma, sops, nsops); + locknum = sem_lock(sma, sops, nsops); if (!ipc_valid_object(&sma->sem_perm)) goto out_unlock_free; -- 2.7.4
Re: [PATCH v3 3/5] fs: Wire up SED/Opal to ioctl
> @@ -853,6 +854,7 @@ struct file { > #ifdef CONFIG_SECURITY > void*f_security; > #endif > + struct sed_context *f_sedctx; Adding a new field to the global struct file for a block driver feature is not acceptable. And I don't really see why it would be nessecary anyway.
Re: [PATCH v3 4/5] nvme: Implement resume_from_suspend and SED Allocation code.
> +static int nvme_sec_send(void *ctrl_data, u16 spsp, u8 secp, > + void *buf, size_t len) > +{ > + return nvme_sec_submit(ctrl_data, spsp, secp, buf, len, > +nvme_admin_security_send); > +} > + > +static int nvme_sec_recv(void *ctrl_data, u16 spsp, u8 secp, > + void *buf, size_t len) > +{ > + return nvme_sec_submit(ctrl_data, spsp, secp, buf, len, > +nvme_admin_security_recv); > +} > + > +static const struct sec_ops nvme_sec_ops = { > + .sec_send = nvme_sec_send, > + .sec_recv = nvme_sec_recv, > +}; Just make sec_submit the callback passed to the core and avoid this boiler-plate code. > > +int nvme_opal_initialize(struct nvme_ctrl *ctrl) > +{ > + /* Opal dev has already been allocated for this controller */ > + if (ctrl->sed_ctx.dev) > + return 0; > + > + ctrl->sed_ctx.dev = alloc_opal_dev(ctrl->admin_q); > + if (!ctrl->sed_ctx.dev) > + return -ENOMEM; > + ctrl->sed_ctx.ops = &nvme_sec_ops; > + ctrl->sed_ctx.sec_data = ctrl; No need for sec_data callback, just pass the sed_ctx to the driver and use container_of to get at the containing structure.
Re: [tpmdd-devel] [PATCH] tpm, tpm_crb: Handle 64-bit resource in crb_check_resource()
On 12/19/16 07:56, Jarkko Sakkinen wrote: > On Sun, Dec 18, 2016 at 10:20:53PM -0600, Jiandi An wrote: >> crb_check_resource() in TPM CRB driver calls >> acpi_dev_resource_memory() which only handles 32-bit resources. >> Adding a call to acpi_dev_resource_address_space() in TPM CRB >> driver which handles 64-bit resources. >> >> Signed-off-by: Jiandi An > > 1. Is there a platform in existence where this change fixes a problem? > 2. What is difference between "memory" and "address space" conceptually? >Just wondering why 32-bit stuff is "memory" and 64-bit stuff is >"address space". Could there be a one function that would work both >for 32-bit and 64-bit cases? > > Yeah, I do not know this API too well. That's why I'm asking. > > /Jarkko > > > If this is the right thing it also needs to be done in tpm_tis. > > I will point out that this driver only works with memory, so using a > generic decoder without checking for IO maps may not be correct.. > > Jason > Hi Jarkko and Jason, Thanks for your comments. I am a developer at Qualcomm and we are trying to enable TPM CRB driver on ARM64 for Qualcomm Centriq 2400. Spec changes to ACPI table for TPM 2.0 have been submitted and approved by TCG and are currently in the 60 day waiting period for release. I have a series of patches that do this TPM CRB driver enablement for ARM64 that I'll be submitting. But first, for our platform the control area buffer address is greater than 32-bit. It is memory with no IO effects. I ran into this issue first when I use QWordMemory in ACPI ASL to describe the resource. MemoryRangeType is specified as AddressRangeMemory. The QWordMemory macro evaluates to a buffer which contains a 64-bit memory resource descriptor, which describes a range of memory addresses. It is a QWord Address Space Descriptor. acpi_dev_resource_address_space() handles the 64-bit memory described using QWordMemory macro in ACPI ASL. The Memory32Fixed macro evaluates to a buffer which contains a 32-bit memory descriptor, which describes a fixed range of memory addresses. acpi_dev_resource_memory() handles the 32-bit memory described using Memory32Fixed in ACPI ASL. I did not see a specific acpi_dev_resource_ service that handles 64-bit resource address and doesn't use the generic acpi_decode_space() decoder. I do have a question about having to specify _CRS method in ACPI ASL. When I was doing early prototyping for this enablement work on ARM64 back during the time with 4.5 kernel, it was before the introduction of the following two patches: commit 1bd047be37d95bf65a219f4931215f71878ac060 tpm_crb: Use devm_ioremap_resource commit 51dd43dff74b0547ad844638f6910ca29c956819 tpm_tis: Use devm_ioremap_resource The control area buffer is specified in the TPM2.0 static ACPI table. TPM CRB driver maps the control area address and reads out cmd and rsp buffer addresses and maps them. There is no requirement in the TCG TPM ACPI spec for specifying _CRS to describe the control area buffer. When I rebased the early prototype for CRB driver ARM64 enablement to the latest kernel, I hit this issue where I have to specify _CRS method. So in _CRS method I specify the same control area address that's in the TPM2.0 static ACPI table. I see the _CRS requirement in the CRB driver is for resource synchronization between the TIS and CRB drivers to support force mode in TIS. Could I get some background on that so I could be more careful not breaking TIS while making changes to CRB driver for ARM64 enablement? Thanks in advance. >> --- >> drivers/char/tpm/tpm_crb.c | 8 +--- >> 1 file changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c >> index 717b6b4..86f355b 100644 >> --- a/drivers/char/tpm/tpm_crb.c >> +++ b/drivers/char/tpm/tpm_crb.c >> @@ -264,10 +264,12 @@ static bool crb_req_canceled(struct tpm_chip *chip, u8 >> status) >> static int crb_check_resource(struct acpi_resource *ares, void *data) >> { >> struct resource *io_res = data; >> -struct resource res; >> +struct resource_win win; >> +struct resource *res = &(win.res); >> >> -if (acpi_dev_resource_memory(ares, &res)) { >> -*io_res = res; >> +if (acpi_dev_resource_memory(ares, res) || >> +acpi_dev_resource_address_space(ares, &win)) { >> +*io_res = *res; >> io_res->name = NULL; >> } >> >> -- >> Jiandi An >> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm >> Technologies, Inc. >> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux >> Foundation Collaborative Project. >> > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > tpmdd-devel mailing list > tpmdd-de...@lists.sourceforge.ne
Re: [PATCH v3 4/5] nvme: Implement resume_from_suspend and SED Allocation code.
On Mon, Dec 19, 2016 at 03:23:12PM -0700, Scott Bauer wrote: > I went back and reviewed the spec 1.2.1: > > http://www.nvmexpress.org/wp-content/uploads/NVM_Express_1_2_1_Gold_20160603.pdf > Section 5.18 (page 140->141) > > Describes the security send command type and it doesn't have any reference of > a namespace ID. Anecdotally, I just removed the ns->ns_id line from the code > and > everything still works as intended. Is there another portion of the spec or > errata > that requires ns_id? (I can't access 1.2.1 errta the link doesn't work). As far as I can tell Security Send / Receive has always been intended to apply to the whole controller, even if that's something I would not personally think is a good idea.
Re: [PATCH v3 2/5] lib: Add Sed-opal library
On Mon, Dec 19, 2016 at 04:34:15PM -0500, Keith Busch wrote: > This seems like an optional library that some environments may wish to > opt-out of building into the kernel. Any reason not to add an entry into > the Kconfig to turn this on/off? This needs to be a CONFIG_BLOCK_SED / CONFIG_BLOCK_SED_OPAL and should move to block/.
Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command
Hi, On 20 December 2016 at 12:29, Lu Baolu wrote: > Hi Mathias, > > On 12/19/2016 08:13 PM, Mathias Nyman wrote: >> On 19.12.2016 13:34, Baolin Wang wrote: >>> Hi Mathias, >>> >>> On 19 December 2016 at 18:33, Mathias Nyman >>> wrote: On 13.12.2016 05:21, Baolin Wang wrote: > > Hi Mathias, > > On 12 December 2016 at 23:52, Mathias Nyman > wrote: >> >> On 05.12.2016 09:51, Baolin Wang wrote: >>> >>> >>> If a command event is found on the event ring during an interrupt, >>> we need to stop the command timer with del_timer(). Since del_timer() >>> can fail if the timer is running and waiting on the xHCI lock, then >>> it maybe get the wrong timeout command in xhci_handle_command_timeout() >>> if host fetched a new command and updated the xhci->current_cmd in >>> handle_cmd_completion(). For this situation, we need a way to signal >>> to the command timer that everything is fine and it should exit. >> >> >> >> Ah, right, this could actually happen. >> >>> >>> >>> We should introduce a counter (xhci->current_cmd_pending) for the number >>> of pending commands. If we need to cancel the command timer and >>> del_timer() >>> succeeds, we decrement the number of pending commands. If del_timer() >>> fails, >>> we leave the number of pending commands alone. >>> >>> For handling timeout command, in xhci_handle_command_timeout() we will >>> check >>> the counter after decrementing it, if the counter >>> (xhci->current_cmd_pending) >>> is 0, which means xhci->current_cmd is the right timeout command. If the >>> counter (xhci->current_cmd_pending) is greater than 0, which means >>> current >>> timeout command has been handled by host and host has fetched new >>> command >>> as >>> xhci->current_cmd, then just return and wait for new current command. >> >> >> >> A counter like this could work. >> >> Writing the abort bit can generate either ABORT+STOP, or just STOP >> event, this seems to cover both. >> >> quick check, case 1: timeout and cmd completion at the same time. >> >> cpu1cpu2 >> >> queue_command(first), p++ (=1) >> queue_command(more), >> --completion irq fires---- timer times out at same time-- >> handle_cmd_completion() handle_cmd_timeout(),) >> lock(xhci_lock ) spin_on(xhci_lock) >> del_timer() fail, p (=1, nochange) >> cur_cmd = list_next(), p++ (=2) >> unlock(xhci_lock) >> lock(xhci_lock) >> p-- (=1) >> if (p > 0), exit >> OK works >> >> case 2: normal timeout case with ABORT+STOP, no race. >> >> cpu1cpu2 >> >> queue_command(first), p++ (=1) >> queue_command(more), >> handle_cmd_timeout() >> p-- (P=0), don't exit >> mod_timer(), p++ (P=1) >> write_abort_bit() >> handle_cmd_comletion(ABORT) >> del_timer(), ok, p-- (p = 0) >> handle_cmd_completion(STOP) >> del_timer(), fail, (P=0) >> handle_stopped_cmd_ring() >> cur_cmd = list_next(), p++ (=1) >> mod_timer() >> >> OK, works, and same for just STOP case, with the only difference that >> during handle_cmd_completion(STOP) p is decremented (p--) > > > Yes, that's the cases what I want to handle, thanks for your explicit > explanation. > Gave this some more thought over the weekend, and this implementation doesn't solve the case when the last command times out and races with the completion handler: cpu1cpu2 queue_command(first), p++ (=1) --completion irq fires---- timer times out at same time-- handle_cmd_completion() handle_cmd_timeout(),) lock(xhci_lock )spin_on(xhci_lock) del_timer() fail, p (=1, nochange) no more commands, P (=1, nochange) unlock(xhci_lock) lock(xhci_lock) p-- (=0) p == 0, continue, even if we should not. For this we still need to rely on checking cur_cmd == NULL in the timeout function. (Baolus patch sets it to NULL if there are no more commands pending) >>> >>> As I pointed out in patch 1 of this patchset, this patchset is based >>> on Lu Baolu's new fix patch: >>> usb: xhci: fix possible wild
BQ27xxx registers
Hi, I'm testing out the 4.9 kernel on a AM3359 board fitted with a BQ27510-G3 fuel gauge. The board previously worked on the 4.1 kernel, however on the 4.9 kernel the bq27xxx_battery.c driver is spitting out this error continuously: power_supply bq27510-0: driver failed to report `charge_full_design' property: -121 I have narrowed down the problem to commit 099867a16 where the BQ27XXX_REG_DCAP value was changed from 0x2e to 0x3c. I can generate a patch to fix this issue, however the bigger problem exists as to which revision fuel gauge the bq27xxx_battery.c driver is intended to support for each family. Attached is a table I put together of all the register addresses for each supported device under the BQ27500 definition and what the current driver utilizes (I didn't paste it here as the word wrapping messes with the formatting). There isn't really an ideal solution I can see where we keep the single BQ27500 definition and support all the functionality of each revision. We can try and just support the latest revisions (BQ27510-G3 and BQ27520-G4) but I think it could hurt backwards compatibility for existing hardware. I'm happy to work on a fix but just wanted to get some thoughts before proceeding. Cheers, Chris RegisterBQ27500/1 BQ27510-G1 BQ27510-G2 BQ27510-G3 BQ27520-G1 BQ27520-G2 BQ27520-G3 BQ27520-G4 bq27xxx_battery.c BQ27XXX_REG_CTRL0x000x000x000x00 0x000x000x000x000x00 BQ27XXX_REG_TEMP0x060x060x060x06 0x060x060x060x060x06 BQ27XXX_REG_INT_TEMP— - - 0x28 - 0x360x360x280x28 BQ27XXX_REG_VOLT0x080x080x080x08 0x080x080x080x080x08 BQ27XXX_REG_AI 0x140x140x140x14 0x140x140x140x140x14 BQ27XXX_REG_FLAGS 0x0a0x0a0x0a0x0a 0x0a0x0a0x0a0x0a0x0a BQ27XXX_REG_TTE 0x160x160x160x16 0x160x160x160x160x16 BQ27XXX_REG_TTF 0x180x180x18— 0x180x18- - - BQ27XXX_REG_TTES0x1c0x1c0x1c0x1a 0x1c0x1c0x1c0x1a0x1a BQ27XXX_REG_TTECP 0x260x260x26- 0x260x260x26- - BQ27XXX_REG_NAC 0x0c0x0c0x0c0x0c 0x0c0x0c0x0c0x0c0x0c BQ27XXX_REG_FCC 0x120x120x120x12 0x120x120x120x120x12 BQ27XXX_REG_CYCT0x2a0x2a0x2a0x1e - 0x2a0x2a0x1e0x2a BQ27XXX_REG_AE 0x220x220x22- 0x220x220x22- - BQ27XXX_REG_SOC 0x2c0x2c0x2c0x20 0x2c0x2c0x2c0x200x2c BQ27XXX_REG_DCAP0x3c0x3c0x3c0x2e 0x3c0x3c0x3c- 0x3c BQ27XXX_REG_AP 0x240x240x24- 0x240x240x24- -
[PATCH v2 2/2] usb: host: xhci: Handle the right timeout command
If a command event is found on the event ring during an interrupt, we need to stop the command timer with del_timer(). Since del_timer() can fail if the timer is running and waiting on the xHCI lock, then it maybe get the wrong timeout command in xhci_handle_command_timeout() if host fetched a new command and updated the xhci->current_cmd in handle_cmd_completion(). For this situation, we need a way to signal to the command timer that everything is fine and it should exit. We should check if the command timer is pending in xhci_handle_command_timeout() function, if the command timer is pending, which means current timeout command has been handled by host and host has fetched new command and re-added the command timer, then just return and wait for new current command. If not, it means current command is timeout and need to be handled. Signed-off-by: Baolin Wang --- Changes since v1: - Remove the counter and just check if the command timer is pending. --- drivers/usb/host/xhci-ring.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 9965a4c..3947344 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1269,7 +1269,7 @@ void xhci_handle_command_timeout(unsigned long data) xhci = (struct xhci_hcd *) data; spin_lock_irqsave(&xhci->lock, flags); - if (!xhci->current_cmd) { + if (!xhci->current_cmd || timer_pending(&xhci->cmd_timer)) { spin_unlock_irqrestore(&xhci->lock, flags); return; } -- 1.7.9.5
Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
On Mon, Dec 19, 2016 at 09:09:13PM -0800, Paul E. McKenney wrote: > On Mon, Dec 19, 2016 at 11:15:15PM +0800, Boqun Feng wrote: > > On Thu, Dec 15, 2016 at 02:51:36PM +, Colin Ian King wrote: > > > On 15/12/16 14:42, Boqun Feng wrote: > > > > On Thu, Dec 15, 2016 at 12:04:59PM +, Mark Rutland wrote: > > > >> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote: > > > >>> ->qsmask of an RCU leaf node is usually more sparse than the > > > >>> corresponding cpu_possible_mask. So replace the > > > >>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with > > > >>> for_each_leaf_node_cpu() to save several checks. > > > >>> > > > >>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate > > > >>> the > > > >>> corresponding mask for a bit because @mask is unsigned long, this was > > > >>> spotted by Colin Ian King and CoverityScan > > > >>> in > > > >>> a previous version of this patch.] > > > >> > > > >> Nit: This note can go now that we use leaf_node_cpu_bit(). ;) > > > >> > > > > > > > > I kinda keep this here for honoring the effort of finding out this bug > > > > from Colin, but yes, it's no longer needed here for the current code. > > > > > > Yep, remove it. > > > > > > > Paul, here is a modified version of this patch, what I only did is > > removing this note. > > > > Besides I rebased the whole series on the current rcu/dev branch of -rcu > > tree, on this very commit: > > > > 8e9b2521b18a ("doc: Quick-Quiz answers are now inline") > > > > And I put the latest version at > > > > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git leaf-node > > > > If you thought it's better, I could send a v3 ;-) > > I would feel better about this patchset if it reduced the number of lines > of code rather than increasing them. That said, part of the increase > is a commment. Still, I am not convinced that the extra level of macro > is carrying its weight. > > dbf18a2422e2 ("rcu: Introduce for_each_leaf_node_cpu()") > > The commit log needs a bit of wordsmithing. > > The added WARN_ON_ONCE(!cpu_possible(cpu)) still seems strange. > What is its purpose, really? What does its triggering tell you? > What other checks did you consider as an alternative? > The check is an over-case one, it's introduced because I'm worried about some code outside the RCU code mis-sets the ->qsmask* or ->expmask* on an "impossible" CPU. I will explanation later in more details. > And if you are going to add checks of this type, should you > also check for this being a leaf rcu_node structure? > I don't think I want to check that, and I don't think check cpu_possible(cpu) in the macro is similar to that. > 3f0b4ba1fe94 ("rcu: Use for_each_leaf_node_cpu() in RCU stall checking") > > This does look a bit nicer, but why the added blank lines? > Are they really helping? > > The commit log seems a bit misplaced. This code is almost never > executed (once per 21 seconds at the most), so performance really > isn't a consideration. The simpler-looking code might be. > > fd799f1ac7b7 ("rcu: Use for_each_leaf_node_cpu() in ->expmask iteration") > > Ditto on blank lines. > > Again, this code is executed per expedited grace period, so > performance really isn't a big deal. More of a big deal than > the stall-warning code, but we still are way off of any fastpath. > > 69a1baedbf42 ("rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()") > > Ditto again on blank lines. > > And on the commit log. This code is executed about once > per several jiffies, and on larger machines, per 20 jiffies > or so. Performance really isn't a consideration. > > 7b00e50e3efb ("rcu: Use for_each_leaf_node_cpu() in online CPU iteration") > > And another ditto on blank lines. > > This code executes once per CPU-hotplug operation, so again isn't > at all performance critical. > > In short, if you are trying to sell this to me as a significant performance > boost, I am not buying. The added WARN_ON_ONCE() looks quite dubious, Yep, it won't help the performance a lot, but it 1) helps the performance in theory, because it iterates less CPUs 2) makes code cleaner. By "cleaner", I mean we can a) affort more blank lines to make loops separated from other code and b) descrease the indent levels for those loops. But, yes I should add those points in the commit log, because those are more visible effects. > though perhaps I am misunderstanding its purpose. My assumption is > that you want to detect missing UL suffixes on bitmask constants, in > which case I bet there is a better way. > The WARN_ON_ONCE() is not for detecting missing UL suffixes on bitmask constatns, and we don't need to check that, because we use leaf_node_cpu_id() now. As I said, this is an over-case check, and we can drop if we guarante that CPUs masked in ->
Re: [PATCH v2] fscrypt: Factor out bio specific functions
Looks fine, Reviewed-by: Christoph Hellwig
[PATCH 4/4] mmc: sdhci-msm: provide enhanced_strobe mode feature support
This provide support for enhanced_strobe feature to sdhci-msm. Signed-off-by: Ritesh Harjani --- drivers/mmc/host/sdhci-msm.c | 32 +++- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c index 32879b8..d092a16 100644 --- a/drivers/mmc/host/sdhci-msm.c +++ b/drivers/mmc/host/sdhci-msm.c @@ -102,6 +102,7 @@ #define CORE_DDR_200_CFG 0x184 #define CORE_CDC_T4_DLY_SELBIT(0) +#define CORE_CMDIN_RCLK_EN BIT(1) #define CORE_START_CDC_TRAFFIC BIT(6) #define CORE_VENDOR_SPEC3 0x1b0 #define CORE_PWRSAVE_DLL BIT(3) @@ -579,6 +580,7 @@ static int sdhci_msm_cdclp533_calibration(struct sdhci_host *host) static int sdhci_msm_cm_dll_sdc4_calibration(struct sdhci_host *host) { + struct mmc_host *mmc = host->mmc; u32 dll_status, config; int ret; @@ -593,6 +595,12 @@ static int sdhci_msm_cm_dll_sdc4_calibration(struct sdhci_host *host) */ writel_relaxed(DDR_CONFIG_POR_VAL, host->ioaddr + CORE_DDR_CONFIG); + if (mmc->ios.enhanced_strobe) { + config = readl_relaxed(host->ioaddr + CORE_DDR_200_CFG); + config |= CORE_CMDIN_RCLK_EN; + writel_relaxed(config, host->ioaddr + CORE_DDR_200_CFG); + } + config = readl_relaxed(host->ioaddr + CORE_DLL_CONFIG_2); config |= CORE_DDR_CAL_EN; writel_relaxed(config, host->ioaddr + CORE_DLL_CONFIG_2); @@ -627,6 +635,7 @@ static int sdhci_msm_hs400_dll_calibration(struct sdhci_host *host) { struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); struct sdhci_msm_host *msm_host = sdhci_pltfm_priv(pltfm_host); + struct mmc_host *mmc = host->mmc; int ret; u32 config; @@ -640,14 +649,17 @@ static int sdhci_msm_hs400_dll_calibration(struct sdhci_host *host) if (ret) goto out; - /* Set the selected phase in delay line hw block */ - ret = msm_config_cm_dll_phase(host, msm_host->saved_tuning_phase); - if (ret) - goto out; + if (!mmc->ios.enhanced_strobe) { + /* Set the selected phase in delay line hw block */ + ret = msm_config_cm_dll_phase(host, + msm_host->saved_tuning_phase); + if (ret) + goto out; + config = readl_relaxed(host->ioaddr + CORE_DLL_CONFIG); + config |= CORE_CMD_DAT_TRACK_SEL; + writel_relaxed(config, host->ioaddr + CORE_DLL_CONFIG); + } - config = readl_relaxed(host->ioaddr + CORE_DLL_CONFIG); - config |= CORE_CMD_DAT_TRACK_SEL; - writel_relaxed(config, host->ioaddr + CORE_DLL_CONFIG); if (msm_host->use_cdclp533) ret = sdhci_msm_cdclp533_calibration(host); else @@ -802,7 +814,8 @@ static void sdhci_msm_set_uhs_signaling(struct sdhci_host *host, spin_unlock_irq(&host->lock); /* CDCLP533 HW calibration is only required for HS400 mode*/ if (host->clock > CORE_FREQ_100MHZ && - msm_host->tuning_done && !msm_host->calibration_done && + (msm_host->tuning_done || mmc->ios.enhanced_strobe) && + !msm_host->calibration_done && mmc->ios.timing == MMC_TIMING_MMC_HS400) if (!sdhci_msm_hs400_dll_calibration(host)) msm_host->calibration_done = true; @@ -941,7 +954,8 @@ static void sdhci_msm_set_clock(struct sdhci_host *host, unsigned int clock) * Select HS400 mode using the HC_SELECT_IN from VENDOR SPEC * register */ - if (msm_host->tuning_done && !msm_host->calibration_done) { + if ((msm_host->tuning_done || curr_ios.enhanced_strobe) && + !msm_host->calibration_done) { /* * Write 0x6 to HC_SELECT_IN and 1 to HC_SELECT_IN_EN * field in VENDOR_SPEC_FUNC -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCH 2/4] mmc: mmc: Add change to set controller to HS400 frequency in enhanced strobe
Currently mmc_select_hs400es is not setting the desired frequency before sending switch command. This makes CMD13 to timeout on some controllers. Thus add a change to set the desired HS400 frequency in mmc_select_hs400es when the timing is switched to HS400. Signed-off-by: Ritesh Harjani --- drivers/mmc/core/mmc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index b61b52f9..eb69497 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -1329,6 +1329,7 @@ static int mmc_select_hs400es(struct mmc_card *card) /* Set host controller to HS400 timing and frequency */ mmc_set_timing(host, MMC_TIMING_MMC_HS400); + mmc_set_bus_speed(card); /* Controller enable enhanced strobe function */ host->ios.enhanced_strobe = true; -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCH 3/4] mmc: mmc: enable ios.enhanced_strobe before mmc_set_timing
Some controllers may need to configure few registers based on enhanced strobe mode while configuring to HS400 timing, thus make ios.enhanced_strobe to true before mmc_set_timing in mmc_select_hs400es. Signed-off-by: Ritesh Harjani --- drivers/mmc/core/mmc.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index eb69497..052368e 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -1327,12 +1327,18 @@ static int mmc_select_hs400es(struct mmc_card *card) goto out_err; } + /* +* Enable enhanced_strobe in ios, as some controllers +* may need to configure few registers based on enhanced +* strobe while changing HS400 timing. +*/ + host->ios.enhanced_strobe = true; + /* Set host controller to HS400 timing and frequency */ mmc_set_timing(host, MMC_TIMING_MMC_HS400); mmc_set_bus_speed(card); /* Controller enable enhanced strobe function */ - host->ios.enhanced_strobe = true; if (host->ops->hs400_enhanced_strobe) host->ops->hs400_enhanced_strobe(host, &host->ios); -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCH 1/4] mmc: core: Return from mmc_set_clock if hz is already set to ios.clock
Signed-off-by: Ritesh Harjani --- drivers/mmc/core/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 543eadd..125f8a9 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -1139,6 +1139,9 @@ void mmc_set_clock(struct mmc_host *host, unsigned int hz) if (hz > host->f_max) hz = host->f_max; + if (hz == host->ios.clock) + return; + host->ios.clock = hz; mmc_set_ios(host); } -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCH 0/4] mmc: Provide enhanced strobe support for sdhci-msm
Hi, Patch series is to enable enhanced strobe feature for sdhci-msm driver. But there are also some changes done in core layer for this. This was tested on msm8996 based interal target with emmc-5.1 card. Ritesh Harjani (4): mmc: core: Return from mmc_set_clock if hz is already set to ios.clock mmc: mmc: Add change to set controller to HS400 frequency in enhanced strobe mmc: mmc: enable ios.enhanced_strobe before mmc_set_timing mmc: sdhci-msm: provide enhanced_strobe mode feature support drivers/mmc/core/core.c | 3 +++ drivers/mmc/core/mmc.c | 9 - drivers/mmc/host/sdhci-msm.c | 32 +++- 3 files changed, 34 insertions(+), 10 deletions(-) -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCHv2 11/11] powerpc/kvm: Advertise availablity of HPT resizing on KVM HV
This updates the KVM_CAP_SPAPR_RESIZE_HPT capability to advertise the presence of in-kernel HPT resizing on KVM HV. Signed-off-by: David Gibson --- arch/powerpc/kvm/powerpc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index efd1183..1f32235 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -605,6 +605,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_SPAPR_MULTITCE: r = 1; break; + case KVM_CAP_SPAPR_RESIZE_HPT: + r = !!hv_enabled; + break; #endif case KVM_CAP_PPC_HTM: r = cpu_has_feature(CPU_FTR_TM_COMP) && -- 2.9.3
[PATCHv2 06/11] powerpc/kvm: Split HPT allocation from activation
Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT) and sets it up as the active page table for a VM. For the upcoming HPT resize implementation we're going to want to allocate HPTs separately from activating them. So, split the allocation itself out into kvmppc_allocate_hpt() and perform the activation with a new kvmppc_set_hpt() function. Likewise we split kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt() which unsets it as an active HPT, then frees it. We also move the logic to fall back to smaller HPT sizes if the first try fails into the single caller which used that behaviour, kvmppc_hv_setup_htab_rma(). This introduces a slight semantic change, in that previously if the initial attempt at CMA allocation failed, we would fall back to attempting smaller sizes with the page allocator. Now, we try first CMA, then the page allocator at each size. As far as I can tell this change should be harmless. To match, we make kvmppc_free_hpt() just free the actual HPT itself. The call to kvmppc_free_lpid() that was there, we move to the single caller. Signed-off-by: David Gibson --- arch/powerpc/include/asm/kvm_book3s_64.h | 4 ++ arch/powerpc/include/asm/kvm_ppc.h | 5 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 96 +++- arch/powerpc/kvm/book3s_hv.c | 18 +- 4 files changed, 68 insertions(+), 55 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 8810327..bcb3e0a 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -22,6 +22,10 @@ #include +/* Power architecture requires HPT is at least 256kiB, at most 64TiB */ +#define PPC_MIN_HPT_ORDER 18 +#define PPC_MAX_HPT_ORDER 46 + #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu *vcpu) { diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 3db6be9..41575b8 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -155,9 +155,10 @@ extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu); extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu); extern void kvmppc_map_magic(struct kvm_vcpu *vcpu); -extern long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp); +extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order); +extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info); extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp); -extern void kvmppc_free_hpt(struct kvm *kvm); +extern void kvmppc_free_hpt(struct kvm_hpt_info *info); extern long kvmppc_prepare_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem); extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu, diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index fe88132..4683d87 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -40,74 +40,66 @@ #include "trace_hv.h" -/* Power architecture requires HPT is at least 256kB */ -#define PPC_MIN_HPT_ORDER 18 - static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel, unsigned long *pte_idx_ret); static void kvmppc_rmap_reset(struct kvm *kvm); -long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) +int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order) { unsigned long hpt = 0; - struct revmap_entry *rev; + int cma = 0; struct page *page = NULL; - long order = KVM_DEFAULT_HPT_ORDER; + struct revmap_entry *rev; + unsigned long npte; - if (htab_orderp) { - order = *htab_orderp; - if (order < PPC_MIN_HPT_ORDER) - order = PPC_MIN_HPT_ORDER; - } + if ((order < PPC_MIN_HPT_ORDER) || (order > PPC_MAX_HPT_ORDER)) + return -EINVAL; - kvm->arch.hpt.cma = 0; page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT)); if (page) { hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page)); memset((void *)hpt, 0, (1ul << order)); - kvm->arch.hpt.cma = 1; + cma = 1; } - /* Lastly try successively smaller sizes from the page allocator */ - /* Only do this if userspace didn't specify a size via ioctl */ - while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) { - hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT| - __GFP_NOWARN, order - PAGE_SHIFT); - if (!hpt) - --order; - } + if (!hpt) + hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT
[PATCHv2 02/11] powerpc/kvm: HPT resizing documentation and reserved numbers
This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to advertise whether KVM is capable of handling the PAPR extensions for resizing the hashed page table during guest runtime. It also adds definitions for two new VM ioctl()s to implement this extension, and documentation of the same. Note that, HPT resizing is already possible with KVM PR without kernel modification, since the HPT is managed within userspace (qemu). The capability defined here will only be set where an in-kernel implementation of resizing is necessary, i.e. for KVM HV. To determine if the userspace resize implementation can be used, it's necessary to check KVM_CAP_PPC_ALLOC_HTAB. Unfortunately older kernels incorrectly set KVM_CAP_PPC_ALLOC_HTAB even with KVM PR. If userspace it want to support resizing with KVM PR on such kernels, it will need a workaround. Signed-off-by: David Gibson --- Documentation/virtual/kvm/api.txt | 95 +++ include/uapi/linux/kvm.h | 11 + 2 files changed, 106 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 42be7b1..f452c72 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3201,6 +3201,101 @@ struct kvm_reinject_control { pit_reinject = 0 (!reinject mode) is recommended, unless running an old operating system that uses the PIT for timing (e.g. Linux 2.4.x). +4.100 KVM_PPC_RESIZE_HPT_PREPARE + +Capability: KVM_CAP_SPAPR_RESIZE_HPT +Architectures: powerpc +Type: vm ioctl +Parameters: struct kvm_ppc_resize_hpt (in) +Returns: 0 on successful completion, +>0 if a new HPT is being prepared, the value is an estimated + number of milliseconds until preparation is complete + -EFAULT if struct kvm_reinject_control cannot be read, +-EINVAL if the supplied shift or flags are invalid +-ENOMEM if unable to allocate the new HPT +-ENOSPC if there was a hash collision when moving existing + HPT entries to the new HPT +-EIO on other error conditions + +Used to implement the PAPR extension for runtime resizing of a guest's +Hashed Page Table (HPT). Specifically this starts, stops or monitors +the preparation of a new potential HPT for the guest, essentially +implementing the H_RESIZE_HPT_PREPARE hypercall. + +If called with shift > 0 when there is no pending HPT for the guest, +this begins preparation of a new pending HPT of size 2^(shift) bytes. +It then returns a positive integer with the estimated number of +milliseconds until preparation is complete. + +If called when there is a pending HPT whose size does not match that +requested in the parameters, discards the existing pending HPT and +creates a new one as above. + +If called when there is a pending HPT of the size requested, will: + * If preparation of the pending HPT is already complete, return 0 + * If preparation of the pending HPT has failed, return an error +code, then discard the pending HPT. + * If preparation of the pending HPT is still in progress, return an +estimated number of milliseconds until preparation is complete. + +If called with shift == 0, discards any currently pending HPT and +returns 0 (i.e. cancels any in-progress preparation). + +flags is reserved for future expansion, currently setting any bits in +flags will result in an -EINVAL. + +Normally this will be called repeatedly with the same parameters until +it returns <= 0. The first call will initiate preparation, subsequent +ones will monitor preparation until it completes or fails. + +struct kvm_ppc_resize_hpt { + __u64 flags; + __u32 shift; + __u32 pad; +}; + +4.100 KVM_PPC_RESIZE_HPT_COMMIT + +Capability: KVM_CAP_SPAPR_RESIZE_HPT +Architectures: powerpc +Type: vm ioctl +Parameters: struct kvm_ppc_resize_hpt (in) +Returns: 0 on successful completion, + -EFAULT if struct kvm_reinject_control cannot be read, +-EINVAL if the supplied shift or flags are invalid +-ENXIO is there is no pending HPT, or the pending HPT doesn't + have the requested size +-EBUSY if the pending HPT is not fully prepared +-ENOSPC if there was a hash collision when moving existing + HPT entries to the new HPT +-EIO on other error conditions + +Used to implement the PAPR extension for runtime resizing of a guest's +Hashed Page Table (HPT). Specifically this requests that the guest be +transferred to working with the new HPT, essentially implementing the +H_RESIZE_HPT_COMMIT hypercall. + +This should only be called after KVM_PPC_RESIZE_HPT_PREPARE has +returned 0 with the same parameters. In other cases +KVM_PPC_RESIZE_HPT_COMMIT will return an error (usually -ENXIO or +-EBUSY, though others may be possible if the preparation was started, +but failed). + +This will have undefined effects on the guest if it has not already +placed itself in a quiescent state where no
[PATCHv2 10/11] powerpc/kvm: KVM-HV HPT resizing implementation
This adds the "guts" of the implementation for the HPT resizing PAPR extension. It has the code to allocate and clear a new HPT, rehash an existing HPT's entries into it, and accomplish the switchover for a KVM guest from the old HPT to the new one. Signed-off-by: David Gibson --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 188 +++- 1 file changed, 187 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 86a1bcb..e22776e 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -66,6 +66,10 @@ struct kvm_resize_hpt { /* These fields protected by kvm->lock */ int error; bool prepare_done; + + /* Private to the work thread, until prepare_done is true, +* then protected by kvm->resize_hpt_sem */ + struct kvm_hpt_info hpt; }; static void kvmppc_rmap_reset(struct kvm *kvm); @@ -1187,21 +1191,203 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void *va, unsigned long gpa, */ static int resize_hpt_allocate(struct kvm_resize_hpt *resize) { + int rc; + + rc = kvmppc_allocate_hpt(&resize->hpt, resize->order); + if (rc < 0) + return rc; + + resize_hpt_debug(resize, "resize_hpt_allocate(): HPT @ 0x%lx\n", +resize->hpt.virt); + return 0; } +static unsigned long resize_hpt_rehash_hpte(struct kvm_resize_hpt *resize, + unsigned long idx) +{ + struct kvm *kvm = resize->kvm; + struct kvm_hpt_info *old = &kvm->arch.hpt; + struct kvm_hpt_info *new = &resize->hpt; + unsigned long old_hash_mask = (1ULL << (old->order - 7)) - 1; + unsigned long new_hash_mask = (1ULL << (new->order - 7)) - 1; + __be64 *hptep, *new_hptep; + unsigned long vpte, rpte, guest_rpte; + int ret; + struct revmap_entry *rev; + unsigned long apsize, psize, avpn, pteg, hash; + unsigned long new_idx, new_pteg, replace_vpte; + + hptep = (__be64 *)(old->virt + (idx << 4)); + + /* Guest is stopped, so new HPTEs can't be added or faulted +* in, only unmapped or altered by host actions. So, it's +* safe to check this before we take the HPTE lock */ + vpte = be64_to_cpu(hptep[0]); + if (!(vpte & HPTE_V_VALID) && !(vpte & HPTE_V_ABSENT)) + return 0; /* nothing to do */ + + while (!try_lock_hpte(hptep, HPTE_V_HVLOCK)) + cpu_relax(); + + vpte = be64_to_cpu(hptep[0]); + + ret = 0; + if (!(vpte & HPTE_V_VALID) && !(vpte & HPTE_V_ABSENT)) + /* Nothing to do */ + goto out; + + /* Unmap */ + rev = &old->rev[idx]; + guest_rpte = rev->guest_rpte; + + ret = -EIO; + apsize = hpte_page_size(vpte, guest_rpte); + if (!apsize) + goto out; + + if (vpte & HPTE_V_VALID) { + unsigned long gfn = hpte_rpn(guest_rpte, apsize); + int srcu_idx = srcu_read_lock(&kvm->srcu); + struct kvm_memory_slot *memslot = + __gfn_to_memslot(kvm_memslots(kvm), gfn); + + if (memslot) { + unsigned long *rmapp; + rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn]; + + lock_rmap(rmapp); + kvmppc_unmap_hpte(kvm, idx, rmapp, gfn); + unlock_rmap(rmapp); + } + + srcu_read_unlock(&kvm->srcu, srcu_idx); + } + + /* Reload PTE after unmap */ + vpte = be64_to_cpu(hptep[0]); + + BUG_ON(vpte & HPTE_V_VALID); + BUG_ON(!(vpte & HPTE_V_ABSENT)); + + ret = 0; + if (!(vpte & HPTE_V_BOLTED)) + goto out; + + rpte = be64_to_cpu(hptep[1]); + psize = hpte_base_page_size(vpte, rpte); + avpn = HPTE_V_AVPN_VAL(vpte) & ~((psize - 1) >> 23); + pteg = idx / HPTES_PER_GROUP; + if (vpte & HPTE_V_SECONDARY) + pteg = ~pteg; + + if (!(vpte & HPTE_V_1TB_SEG)) { + unsigned long offset, vsid; + + /* We only have 28 - 23 bits of offset in avpn */ + offset = (avpn & 0x1f) << 23; + vsid = avpn >> 5; + /* We can find more bits from the pteg value */ + if (psize < (1ULL << 23)) + offset |= ((vsid ^ pteg) & old_hash_mask) * psize; + + hash = vsid ^ (offset / psize); + } else { + unsigned long offset, vsid; + + /* We only have 40 - 23 bits of seg_off in avpn */ + offset = (avpn & 0x1) << 23; + vsid = avpn >> 17; + if (psize < (1ULL << 23)) + offset |= ((vsid ^ (vsid << 25) ^ pteg) & old_hash_mask) * psize; + + hash = vsid ^ (vsid << 25) ^ (offset / psize); +
[PATCHv2 05/11] powerpc/kvm: Don't store values derivable from HPT order
Currently the kvm_hpt_info structure stores the hashed page table's order, and also the number of HPTEs it contains and a mask for its size. The last two can be easily derived from the order, so remove them and just calculate them as necessary with a couple of helper inlines. Signed-off-by: David Gibson Reviewed-by: Thomas Huth --- arch/powerpc/include/asm/kvm_book3s_64.h | 12 arch/powerpc/include/asm/kvm_host.h | 2 -- arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 +--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 18 +- 4 files changed, 34 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 8482921..8810327 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -350,6 +350,18 @@ extern void kvmppc_mmu_debugfs_init(struct kvm *kvm); extern void kvmhv_rm_send_ipi(int cpu); +static inline unsigned long kvmppc_hpt_npte(struct kvm_hpt_info *hpt) +{ + /* HPTEs are 2**4 bytes long */ + return 1UL << (hpt->order - 4); +} + +static inline unsigned long kvmppc_hpt_mask(struct kvm_hpt_info *hpt) +{ + /* 128 (2**7) bytes in each HPTEG */ + return (1UL << (hpt->order - 7)) - 1; +} + #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #endif /* __ASM_KVM_BOOK3S_64_H__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 074b17f..7576e0e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -246,8 +246,6 @@ struct kvm_hpt_info { unsigned long virt; /* Array of reverse mapping entries for each guest HPTE */ struct revmap_entry *rev; - unsigned long npte; - unsigned long mask; /* Guest HPT size is 2**(order) bytes */ u32 order; /* 1 if HPT allocated with CMA, 0 otherwise */ diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index b5799d1..fe88132 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -83,15 +83,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) kvm->arch.hpt.virt = hpt; kvm->arch.hpt.order = order; - /* HPTEs are 2**4 bytes long */ - kvm->arch.hpt.npte = 1ul << (order - 4); - /* 128 (2**7) bytes in each HPTEG */ - kvm->arch.hpt.mask = (1ul << (order - 7)) - 1; atomic64_set(&kvm->arch.mmio_update, 0); /* Allocate reverse map array */ - rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte); + rev = vmalloc(sizeof(struct revmap_entry) * kvmppc_hpt_npte(&kvm->arch.hpt)); if (!rev) { pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n"); goto out_freehpt; @@ -194,8 +190,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, if (npages > 1ul << (40 - porder)) npages = 1ul << (40 - porder); /* Can't use more than 1 HPTE per HPTEG */ - if (npages > kvm->arch.hpt.mask + 1) - npages = kvm->arch.hpt.mask + 1; + if (npages > kvmppc_hpt_mask(&kvm->arch.hpt) + 1) + npages = kvmppc_hpt_mask(&kvm->arch.hpt) + 1; hp0 = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) | HPTE_V_BOLTED | hpte0_pgsize_encoding(psize); @@ -205,7 +201,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, for (i = 0; i < npages; ++i) { addr = i << porder; /* can't use hpt_hash since va > 64 bits */ - hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & kvm->arch.hpt.mask; + hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) + & kvmppc_hpt_mask(&kvm->arch.hpt); /* * We assume that the hash table is empty and no * vcpus are using it at this stage. Since we create @@ -1306,7 +1303,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf, /* Skip uninteresting entries, i.e. clean on not-first pass */ if (!first_pass) { - while (i < kvm->arch.hpt.npte && + while (i < kvmppc_hpt_npte(&kvm->arch.hpt) && !hpte_dirty(revp, hptp)) { ++i; hptp += 2; @@ -1316,7 +1313,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf, hdr.index = i; /* Grab a series of valid entries */ - while (i < kvm->arch.hpt.npte && + while (i < kvmppc_hpt_npte(&kvm->arch.hpt) && hdr.n_valid < 0x && nb + HPTE_SIZE < count && record_hpte(flags, hptp, hpte, revp, 1, first_pass)) { @@ -1332,7 +1329,7 @@ st
[PATCHv2 03/11] powerpc/kvm: Rename kvm_alloc_hpt() for clarity
The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at all obvious from the name. In practice kvmppc_alloc_hpt() allocates an HPT by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate it with CMA only. To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma(). Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma(). Signed-off-by: David Gibson Reviewed-by: Thomas Huth --- arch/powerpc/include/asm/kvm_ppc.h | 4 ++-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 arch/powerpc/kvm/book3s_hv_builtin.c | 8 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 2da67bf..3db6be9 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -186,8 +186,8 @@ extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu, unsigned long tce_value, unsigned long npages); extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba); -extern struct page *kvm_alloc_hpt(unsigned long nr_pages); -extern void kvm_release_hpt(struct page *page, unsigned long nr_pages); +extern struct page *kvm_alloc_hpt_cma(unsigned long nr_pages); +extern void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages); extern int kvmppc_core_init_vm(struct kvm *kvm); extern void kvmppc_core_destroy_vm(struct kvm *kvm); extern void kvmppc_core_free_memslot(struct kvm *kvm, diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index b795dd1..ae17cdd 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -62,7 +62,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) } kvm->arch.hpt_cma_alloc = 0; - page = kvm_alloc_hpt(1ul << (order - PAGE_SHIFT)); + page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT)); if (page) { hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page)); memset((void *)hpt, 0, (1ul << order)); @@ -108,7 +108,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) out_freehpt: if (kvm->arch.hpt_cma_alloc) - kvm_release_hpt(page, 1 << (order - PAGE_SHIFT)); + kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT)); else free_pages(hpt, order - PAGE_SHIFT); return -ENOMEM; @@ -155,8 +155,8 @@ void kvmppc_free_hpt(struct kvm *kvm) kvmppc_free_lpid(kvm->arch.lpid); vfree(kvm->arch.revmap); if (kvm->arch.hpt_cma_alloc) - kvm_release_hpt(virt_to_page(kvm->arch.hpt_virt), - 1 << (kvm->arch.hpt_order - PAGE_SHIFT)); + kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt_virt), +1 << (kvm->arch.hpt_order - PAGE_SHIFT)); else free_pages(kvm->arch.hpt_virt, kvm->arch.hpt_order - PAGE_SHIFT); diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 5bb24be..4c4aa47 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -52,19 +52,19 @@ static int __init early_parse_kvm_cma_resv(char *p) } early_param("kvm_cma_resv_ratio", early_parse_kvm_cma_resv); -struct page *kvm_alloc_hpt(unsigned long nr_pages) +struct page *kvm_alloc_hpt_cma(unsigned long nr_pages) { VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES)); } -EXPORT_SYMBOL_GPL(kvm_alloc_hpt); +EXPORT_SYMBOL_GPL(kvm_alloc_hpt_cma); -void kvm_release_hpt(struct page *page, unsigned long nr_pages) +void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages) { cma_release(kvm_cma, page, nr_pages); } -EXPORT_SYMBOL_GPL(kvm_release_hpt); +EXPORT_SYMBOL_GPL(kvm_free_hpt_cma); /** * kvm_cma_reserve() - reserve area for kvm hash pagetable -- 2.9.3
[PATCHv2 04/11] powerpc/kvm: Gather HPT related variables into sub-structure
Currently, the powerpc kvm_arch structure contains a number of variables tracking the state of the guest's hashed page table (HPT) in KVM HV. This patch gathers them all together into a single kvm_hpt_info substructure. This makes life more convenient for the upcoming HPT resizing implementation. Signed-off-by: David Gibson --- arch/powerpc/include/asm/kvm_host.h | 20 ++--- arch/powerpc/kvm/book3s_64_mmu_hv.c | 90 ++--- arch/powerpc/kvm/book3s_hv.c| 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 - 4 files changed, 91 insertions(+), 83 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index e59b172..074b17f 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -241,12 +241,24 @@ struct kvm_arch_memory_slot { #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ }; +struct kvm_hpt_info { + /* Host virtual (linear mapping) address of guest HPT */ + unsigned long virt; + /* Array of reverse mapping entries for each guest HPTE */ + struct revmap_entry *rev; + unsigned long npte; + unsigned long mask; + /* Guest HPT size is 2**(order) bytes */ + u32 order; + /* 1 if HPT allocated with CMA, 0 otherwise */ + int cma; +}; + struct kvm_arch { unsigned int lpid; #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE unsigned int tlb_sets; - unsigned long hpt_virt; - struct revmap_entry *revmap; + struct kvm_hpt_info hpt; atomic64_t mmio_update; unsigned int host_lpid; unsigned long host_lpcr; @@ -256,14 +268,10 @@ struct kvm_arch { unsigned long lpcr; unsigned long vrma_slb_v; int hpte_setup_done; - u32 hpt_order; atomic_t vcpus_running; u32 online_vcores; - unsigned long hpt_npte; - unsigned long hpt_mask; atomic_t hpte_mod_interest; cpumask_t need_tlb_flush; - int hpt_cma_alloc; struct dentry *debugfs_dir; struct dentry *htab_dentry; #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index ae17cdd..b5799d1 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -61,12 +61,12 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) order = PPC_MIN_HPT_ORDER; } - kvm->arch.hpt_cma_alloc = 0; + kvm->arch.hpt.cma = 0; page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT)); if (page) { hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page)); memset((void *)hpt, 0, (1ul << order)); - kvm->arch.hpt_cma_alloc = 1; + kvm->arch.hpt.cma = 1; } /* Lastly try successively smaller sizes from the page allocator */ @@ -81,22 +81,22 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) if (!hpt) return -ENOMEM; - kvm->arch.hpt_virt = hpt; - kvm->arch.hpt_order = order; + kvm->arch.hpt.virt = hpt; + kvm->arch.hpt.order = order; /* HPTEs are 2**4 bytes long */ - kvm->arch.hpt_npte = 1ul << (order - 4); + kvm->arch.hpt.npte = 1ul << (order - 4); /* 128 (2**7) bytes in each HPTEG */ - kvm->arch.hpt_mask = (1ul << (order - 7)) - 1; + kvm->arch.hpt.mask = (1ul << (order - 7)) - 1; atomic64_set(&kvm->arch.mmio_update, 0); /* Allocate reverse map array */ - rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt_npte); + rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte); if (!rev) { pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n"); goto out_freehpt; } - kvm->arch.revmap = rev; + kvm->arch.hpt.rev = rev; kvm->arch.sdr1 = __pa(hpt) | (order - 18); pr_info("KVM guest htab at %lx (order %ld), LPID %x\n", @@ -107,7 +107,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) return 0; out_freehpt: - if (kvm->arch.hpt_cma_alloc) + if (kvm->arch.hpt.cma) kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT)); else free_pages(hpt, order - PAGE_SHIFT); @@ -129,10 +129,10 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp) goto out; } } - if (kvm->arch.hpt_virt) { - order = kvm->arch.hpt_order; + if (kvm->arch.hpt.virt) { + order = kvm->arch.hpt.order; /* Set the entire HPT to 0, i.e. invalid HPTEs */ - memset((void *)kvm->arch.hpt_virt, 0, 1ul << order); + memset((void *)kvm->arch.hpt.virt, 0, 1ul << order); /* * Reset all the reverse-mapping chains for all memslots
[PATCHv2 08/11] powerpc/kvm: Create kvmppc_unmap_hpte_helper()
The kvm_unmap_rmapp() function, called from certain MMU notifiers, is used to force all guest mappings of a particular host page to be set ABSENT, and removed from the reverse mappings. For HPT resizing, we will have some cases where we want to set just a single guest HPTE ABSENT and remove its reverse mappings. To prepare with this, we split out the logic from kvm_unmap_rmapp() to evict a single HPTE, moving it to a new helper function. Signed-off-by: David Gibson --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 77 + 1 file changed, 44 insertions(+), 33 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 2765d04..74e909a 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -738,13 +738,53 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, return kvm_handle_hva_range(kvm, hva, hva + 1, handler); } +/* Must be called with both HPTE and rmap locked */ +static void kvmppc_unmap_hpte(struct kvm *kvm, unsigned long i, + unsigned long *rmapp, unsigned long gfn) +{ + __be64 *hptep = (__be64 *) (kvm->arch.hpt.virt + (i << 4)); + struct revmap_entry *rev = kvm->arch.hpt.rev; + unsigned long j, h; + unsigned long ptel, psize, rcbits; + + j = rev[i].forw; + if (j == i) { + /* chain is now empty */ + *rmapp &= ~(KVMPPC_RMAP_PRESENT | KVMPPC_RMAP_INDEX); + } else { + /* remove i from chain */ + h = rev[i].back; + rev[h].forw = j; + rev[j].back = h; + rev[i].forw = rev[i].back = i; + *rmapp = (*rmapp & ~KVMPPC_RMAP_INDEX) | j; + } + + /* Now check and modify the HPTE */ + ptel = rev[i].guest_rpte; + psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel); + if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) && + hpte_rpn(ptel, psize) == gfn) { + hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); + kvmppc_invalidate_hpte(kvm, hptep, i); + hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO); + /* Harvest R and C */ + rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C); + *rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT; + if (rcbits & HPTE_R_C) + kvmppc_update_rmap_change(rmapp, psize); + if (rcbits & ~rev[i].guest_rpte) { + rev[i].guest_rpte = ptel | rcbits; + note_hpte_modification(kvm, &rev[i]); + } + } +} + static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, unsigned long gfn) { - struct revmap_entry *rev = kvm->arch.hpt.rev; - unsigned long h, i, j; + unsigned long i; __be64 *hptep; - unsigned long ptel, psize, rcbits; for (;;) { lock_rmap(rmapp); @@ -767,37 +807,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, cpu_relax(); continue; } - j = rev[i].forw; - if (j == i) { - /* chain is now empty */ - *rmapp &= ~(KVMPPC_RMAP_PRESENT | KVMPPC_RMAP_INDEX); - } else { - /* remove i from chain */ - h = rev[i].back; - rev[h].forw = j; - rev[j].back = h; - rev[i].forw = rev[i].back = i; - *rmapp = (*rmapp & ~KVMPPC_RMAP_INDEX) | j; - } - /* Now check and modify the HPTE */ - ptel = rev[i].guest_rpte; - psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel); - if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) && - hpte_rpn(ptel, psize) == gfn) { - hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); - kvmppc_invalidate_hpte(kvm, hptep, i); - hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO); - /* Harvest R and C */ - rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C); - *rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT; - if (rcbits & HPTE_R_C) - kvmppc_update_rmap_change(rmapp, psize); - if (rcbits & ~rev[i].guest_rpte) { - rev[i].guest_rpte = ptel | rcbits; - note_hpte_modification(kvm, &rev[i]); - } - } + kvmppc_unmap_hpte(kvm, i, rmapp, gfn); unlock_rmap(rmapp); __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } -- 2.9.3
[PATCHv2 09/11] powerpc/kvm: Outline of KVM-HV HPT resizing implementation
This adds a not yet working outline of the HPT resizing PAPR extension. Specifically it adds the necessary ioctl() functions, their basic steps, the work function which will handle preparation for the resize, and synchronization between these, the guest page fault path and guest HPT update path. The actual guts of the implementation isn't here yet, so for now the calls will always fail. Signed-off-by: David Gibson --- arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/include/asm/kvm_ppc.h | 4 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 191 arch/powerpc/kvm/book3s_hv.c| 25 + 4 files changed, 223 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 7576e0e..fb8e508 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -252,6 +252,8 @@ struct kvm_hpt_info { int cma; }; +struct kvm_resize_hpt; + struct kvm_arch { unsigned int lpid; #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE @@ -272,6 +274,7 @@ struct kvm_arch { cpumask_t need_tlb_flush; struct dentry *debugfs_dir; struct dentry *htab_dentry; + struct kvm_resize_hpt *resize_hpt; /* protected by kvm->lock */ #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE struct mutex hpt_mutex; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 3b837bc..f8eaed0 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -215,6 +215,10 @@ extern void kvmppc_bookehv_exit(void); extern int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu); extern int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *); +extern long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm, + struct kvm_ppc_resize_hpt *rhpt); +extern long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm, + struct kvm_ppc_resize_hpt *rhpt); int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 74e909a..86a1bcb 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -40,9 +40,34 @@ #include "trace_hv.h" +//#define DEBUG_RESIZE_HPT 1 + +#ifdef DEBUG_RESIZE_HPT +#define resize_hpt_debug(resize, ...) \ + do {\ + printk(KERN_DEBUG "RESIZE HPT %p: ", resize); \ + printk(__VA_ARGS__);\ + } while (0) +#else +#define resize_hpt_debug(resize, ...) \ + do { } while (0) +#endif + static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel, unsigned long *pte_idx_ret); + +struct kvm_resize_hpt { + /* These fields read-only after init */ + struct kvm *kvm; + struct work_struct work; + u32 order; + + /* These fields protected by kvm->lock */ + int error; + bool prepare_done; +}; + static void kvmppc_rmap_reset(struct kvm *kvm); int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order) @@ -1158,6 +1183,172 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void *va, unsigned long gpa, } /* + * HPT resizing + */ +static int resize_hpt_allocate(struct kvm_resize_hpt *resize) +{ + return 0; +} + +static int resize_hpt_rehash(struct kvm_resize_hpt *resize) +{ + return -EIO; +} + +static void resize_hpt_pivot(struct kvm_resize_hpt *resize) +{ +} + +static void resize_hpt_release(struct kvm *kvm, struct kvm_resize_hpt *resize) +{ + BUG_ON(kvm->arch.resize_hpt != resize); + kvm->arch.resize_hpt = NULL; + kfree(resize); +} + +static void resize_hpt_prepare_work(struct work_struct *work) +{ + struct kvm_resize_hpt *resize = container_of(work, +struct kvm_resize_hpt, +work); + struct kvm *kvm = resize->kvm; + int err; + + resize_hpt_debug(resize, "resize_hpt_prepare_work(): order = %d\n", +resize->order); + + err = resize_hpt_allocate(resize); + + mutex_lock(&kvm->lock); + + resize->error = err; + resize->prepare_done = true; + + mutex_unlock(&kvm->lock); +} + +long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm, +struct kvm_ppc_resize_hpt *rhpt) +{ + unsigned long flags = rhpt->flags; + unsigned long shift = rhpt->shift; + struct kvm_resize_hpt *resize; + int ret; + + if (flags != 0) + return -EINVAL; + + if (shift &
[PATCHv2 01/11] Documentation: Correct duplicate section number in kvm/api.txt
Both KVM_CREATE_SPAPR_TCE_64 and KVM_REINJECT_CONTROL have section number 4.98 in Documentation/virtual/kvm/api.txt, presumably due to a naive merge. This corrects the duplication. Signed-off-by: David Gibson --- Documentation/virtual/kvm/api.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 03145b7..42be7b1 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3177,7 +3177,7 @@ of IOMMU pages. The rest of functionality is identical to KVM_CREATE_SPAPR_TCE. -4.98 KVM_REINJECT_CONTROL +4.99 KVM_REINJECT_CONTROL Capability: KVM_CAP_REINJECT_CONTROL Architectures: x86 -- 2.9.3
[PATCHv2 00/11] KVM implementation of PAPR HPT resizing extension
Here is the KVM implementation for the proposed PAPR extension which allows the runtime resizing of a PAPR guest's Hashed Page Table (HPT). Using this requires a guest kernel with support for the extension. Patches for guest side support in Linux were posted earlier: https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-December/152164.html It also requires userspace (i.e. qemu) to intercept the HPT resizing hypercalls and invoke the KVM ioctl()s to implement them. This is done instead of having KVM direclty intercept the hypercalls, so that userspace can, if useful, impose additional restrictions on resizes: for example it could refuse them entirely if policy for the VM precludes resizing, or it could limit the size of HPT the guest can request to meet resource limits. Patches to implement the userspace part of HPT resizing are proposed for qemu-2.9, and can be found at: https://github.com/dgibson/qemu/tree/hpt-resize I'm posting these now, in the hopes that both these and the corresponding guest side patches can be staged and merged for the 4.11 window. Changes in v2: * Use a more normal and straightforward encoding of the capability flags * Add information to Documentation/virtual/kvm/api.txt * Assorted minor cleanups based on review comments David Gibson (11): Documentation: Correct duplicate section number in kvm/api.txt powerpc/kvm: HPT resizing documentation and reserved numbers powerpc/kvm: Rename kvm_alloc_hpt() for clarity powerpc/kvm: Gather HPT related variables into sub-structure powerpc/kvm: Don't store values derivable from HPT order powerpc/kvm: Split HPT allocation from activation powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size powerpc/kvm: Create kvmppc_unmap_hpte_helper() powerpc/kvm: Outline of KVM-HV HPT resizing implementation powerpc/kvm: KVM-HV HPT resizing implementation powerpc/kvm: Advertise availablity of HPT resizing on KVM HV Documentation/virtual/kvm/api.txt| 111 +- arch/powerpc/include/asm/kvm_book3s_64.h | 16 + arch/powerpc/include/asm/kvm_host.h | 21 +- arch/powerpc/include/asm/kvm_ppc.h | 15 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 625 +-- arch/powerpc/kvm/book3s_hv.c | 50 ++- arch/powerpc/kvm/book3s_hv_builtin.c | 8 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 +-- arch/powerpc/kvm/powerpc.c | 3 + include/uapi/linux/kvm.h | 11 + 10 files changed, 741 insertions(+), 181 deletions(-) -- 2.9.3
[PATCHv2 07/11] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size
The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page table (HPT) that userspace expects a guest VM to have, and is also used to clear that HPT when necessary (e.g. guest reboot). At present, once the ioctl() is called for the first time, the HPT size can never be changed thereafter - it will be cleared but always sized as from the first call. With upcoming HPT resize implementation, we're going to need to allow userspace to resize the HPT at reset (to change it back to the default size if the guest changed it). So, we need to allow this ioctl() to change the HPT size. This patch also updates Documentation/virtual/kvm/api.txt to reflect the new behaviour. In fact the documentation was already slightly incorrect since 572abd5 "KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl" Signed-off-by: David Gibson --- Documentation/virtual/kvm/api.txt | 14 -- arch/powerpc/include/asm/kvm_ppc.h | 2 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 29 - arch/powerpc/kvm/book3s_hv.c| 5 + 4 files changed, 26 insertions(+), 24 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index f452c72..c12da23 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2443,18 +2443,20 @@ are, it will do nothing and return an EBUSY error. The parameter is a pointer to a 32-bit unsigned integer variable containing the order (log base 2) of the desired size of the hash table, which must be between 18 and 46. On successful return from the -ioctl, it will have been updated with the order of the hash table that -was allocated. +ioctl, the value will not be changed by the kernel. If no hash table has been allocated when any vcpu is asked to run (with the KVM_RUN ioctl), the host kernel will allocate a default-sized hash table (16 MB). If this ioctl is called when a hash table has already been allocated, -the kernel will clear out the existing hash table (zero all HPTEs) and -return the hash table order in the parameter. (If the guest is using -the virtualized real-mode area (VRMA) facility, the kernel will -re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.) +with a different order from the existing hash table, the existing hash +table will be freed and a new one allocated. If this is ioctl is +called when a hash table has already been allocated of the same order +as specified, the kernel will clear out the existing hash table (zero +all HPTEs). In either case, if the guest is using the virtualized +real-mode area (VRMA) facility, the kernel will re-create the VMRA +HPTEs on the next KVM_RUN of any vcpu. 4.77 KVM_S390_INTERRUPT diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 41575b8..3b837bc 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -157,7 +157,7 @@ extern void kvmppc_map_magic(struct kvm_vcpu *vcpu); extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order); extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info); -extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp); +extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order); extern void kvmppc_free_hpt(struct kvm_hpt_info *info); extern long kvmppc_prepare_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 4683d87..2765d04 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -102,10 +102,10 @@ void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info) info->virt, (long)info->order, kvm->arch.lpid); } -long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp) +long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order) { long err = -EBUSY; - long order; + struct kvm_hpt_info info; mutex_lock(&kvm->lock); if (kvm->arch.hpte_setup_done) { @@ -117,8 +117,9 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp) goto out; } } - if (kvm->arch.hpt.virt) { - order = kvm->arch.hpt.order; + if (kvm->arch.hpt.order == order) { + /* We already have a suitable HPT */ + /* Set the entire HPT to 0, i.e. invalid HPTEs */ memset((void *)kvm->arch.hpt.virt, 0, 1ul << order); /* @@ -127,17 +128,19 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp) kvmppc_rmap_reset(kvm); /* Ensure that each vcpu will flush its TLB on next entry. */ cpumask_setall(&kvm->arch.need_tlb_flush); - *htab_orderp = order; err = 0; - } else { - struct kvm_hpt_info info; - -
Re: [RFC] Question about freeing of resource in 'atlas7_pinmux_probe()', in file 'drivers/pinctrl/sirf/pinctrl-atlas7.c'
Le 19/12/2016 à 14:54, Vladimir Zapolskiy a écrit : Functionally it looks good, I have two comments though. 1) you don't need to revert 151b8c5ba1eb, the commit is a proper fix per se but incomplete, please add your change on top of it, 2) minimizing the lines of code by removing duplicates is always good, so here a better and complete fix will be like the following one: Hi, thanks for the feedback and comments. Patch submitted. CJ
Re: [PATCH v2 4/4] phy: qcom-qmp: new qmp phy driver for qcom-chipsets
Hi, On Tue, Nov 29, 2016 at 6:05 AM, Stephen Boyd wrote: Thanks for a thorough review. Please find my comments inline. > On 11/22, Vivek Gautam wrote: >> diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig >> index f1dcec1..8970d9e 100644 >> --- a/drivers/phy/Kconfig >> +++ b/drivers/phy/Kconfig >> @@ -430,6 +430,15 @@ config PHY_STIH407_USB >> Enable this support to enable the picoPHY device used by USB2 >> and USB3 controllers on STMicroelectronics STiH407 SoC families. >> >> +config PHY_QCOM_QMP >> + tristate "Qualcomm QMP PHY Driver" >> + depends on OF && (ARCH_QCOM || COMPILE_TEST) >> + select GENERIC_PHY >> + select RESET_CONTROLLER > > Again, probably should be dropped as we're not providing resets > here, just using them. Sure, will drop in v3. > >> diff --git a/drivers/phy/phy-qcom-qmp.c b/drivers/phy/phy-qcom-qmp.c >> new file mode 100644 >> index 000..f85289e >> --- /dev/null >> +++ b/drivers/phy/phy-qcom-qmp.c >> @@ -0,0 +1,1141 @@ >> +/* >> + * Copyright (c) 2016, The Linux Foundation. All rights reserved. >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License version 2 and >> + * only version 2 as published by the Free Software Foundation. >> + * >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + * >> + */ >> + >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> + >> +#include >> + > [...] >> + >> +/* set of registers with offsets different per-PHY */ >> +enum qphy_reg_layout { >> + /* Common block control registers */ >> + QPHY_COM_SW_RESET, >> + QPHY_COM_POWER_DOWN_CONTROL, >> + QPHY_COM_START_CONTROL, >> + QPHY_COM_PCS_READY_STATUS, >> + /* PCS registers */ >> + QPHY_PLL_LOCK_CHK_DLY_TIME, >> + QPHY_FLL_CNTRL1, >> + QPHY_FLL_CNTRL2, >> + QPHY_FLL_CNT_VAL_L, >> + QPHY_FLL_CNT_VAL_H_TOL, >> + QPHY_FLL_MAN_CODE, >> + QPHY_PCS_READY_STATUS, >> +}; >> + >> +unsigned int pciephy_regs_layout[] = { > > static? const? Will take care of these in v3. [...] >> + >> +/** >> + * struct qmp_phy_init_cfg:- per-PHY init config. > > The colon and dash can't be right. One is kernel-doc, but of > course there aren't any other member descriptions. Right, will convert this to a one-line comment. > >> + */ >> +struct qmp_phy_init_cfg { >> + /* phy-type - PCIE/UFS/USB */ >> + unsigned int type; >> + /* number of lanes provided by phy */ >> + int nlanes; >> + >> + /* Initialization sequence for PHY blocks - Serdes, tx, rx, pcs */ >> + struct qmp_phy_init_tbl *phy_init_serdes_tbl; >> + int phy_init_serdes_tbl_sz; >> + struct qmp_phy_init_tbl *phy_init_tx_tbl; >> + int phy_init_tx_tbl_sz; >> + struct qmp_phy_init_tbl *phy_init_rx_tbl; >> + int phy_init_rx_tbl_sz; >> + struct qmp_phy_init_tbl *phy_init_pcs_tbl; >> + int phy_init_pcs_tbl_sz; > > _sz makes it sound like bytes, perhaps _num instead. sure. > >> + >> + /* array of registers with different offsets */ >> + unsigned int *regs; > > const? taking care of these in v3. > >> + >> + unsigned int mask_start_ctrl; >> + unsigned int mask_pwr_dn_ctrl; >> + /* true, if PHY has a separate PHY_COM_CNTRL block */ >> + bool has_phy_com_ctrl; >> + /* true, if PHY has a reset for individual lanes */ >> + bool has_lane_rst; >> +}; >> + >> +/** >> + * struct qmp_phy_desc:- per-lane phy-descriptor. > > Also kernel-doc requests full stop is left out on one line > descriptions. sure, will remove it from here and from other places where a full-stop is not required. > >> + * >> + * @phy: pointer to generic phy >> + * @tx: pointer to iomapped memory space for PHY's tx >> + * @rx: pointer to iomapped memory space for PHY's rx >> + * @pcs: pointer to iomapped memory space for PHY's pcs >> + * @pipe_clk: pointer to pipe lock >> + * @index: lane index >> + * @qphy: pointer to QMP phy to which this lane belongs >> + * @lane_rst: pointer to lane's reset controller > > Not sure we care about "pointer to" as types can be figured out > other ways. will drop "pointer to" string. > >> + */ >> +struct qmp_phy_desc { >> + struct phy *phy; >> + void __iomem *tx; >> + void __iomem *rx; >> + void __iomem *pcs; >> + struct clk *pipe_clk; >> + unsigned int index; >> + struct qcom_qmp_phy *qphy; >> + struct reset_control *lane_rst; >> +}; >> + >> +/** >> + * struct qcom_qmp_phy:- structure holding QMP PHY attributes. >> + * >> + * @dev: pointer to device >> + * @serdes: pointer to iomapped memory space for phy's serdes >> + * >> + * @aux_clk: pointer
[PATCH 2/2] pinctrl: sirf: atlas7: Improve code layout
Add some tab in order to improve indentation. Signed-off-by: Christophe JAILLET --- drivers/pinctrl/sirf/pinctrl-atlas7.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/pinctrl/sirf/pinctrl-atlas7.c b/drivers/pinctrl/sirf/pinctrl-atlas7.c index f714f67c4b64..76df538fa814 100644 --- a/drivers/pinctrl/sirf/pinctrl-atlas7.c +++ b/drivers/pinctrl/sirf/pinctrl-atlas7.c @@ -5444,7 +5444,7 @@ static int atlas7_pinmux_probe(struct platform_device *pdev) pmx->regs[idx] = of_iomap(np, idx); if (!pmx->regs[idx]) { dev_err(&pdev->dev, - "can't map ioc bank#%d registers\n", idx); + "can't map ioc bank#%d registers\n", idx); ret = -ENOMEM; goto unmap_io; } @@ -6057,8 +6057,8 @@ static int atlas7_gpio_probe(struct platform_device *pdev) ret = gpiochip_add_data(chip, a7gc); if (ret) { dev_err(&pdev->dev, - "%s: error in probe function with status %d\n", - np->name, ret); + "%s: error in probe function with status %d\n", + np->name, ret); goto failed; } -- 2.9.3
[PATCH 1/2] pinctrl: sirf: atlas7: Add missing 'of_node_put()'
Reference to 'sys2pci_np' should be dropped in all cases here, not only in error handling path. Signed-off-by: Christophe JAILLET --- drivers/pinctrl/sirf/pinctrl-atlas7.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/pinctrl/sirf/pinctrl-atlas7.c b/drivers/pinctrl/sirf/pinctrl-atlas7.c index 7f3041697813..f714f67c4b64 100644 --- a/drivers/pinctrl/sirf/pinctrl-atlas7.c +++ b/drivers/pinctrl/sirf/pinctrl-atlas7.c @@ -5420,14 +5420,15 @@ static int atlas7_pinmux_probe(struct platform_device *pdev) sys2pci_np = of_find_node_by_name(NULL, "sys2pci"); if (!sys2pci_np) return -EINVAL; + ret = of_address_to_resource(sys2pci_np, 0, &res); + of_node_put(sys2pci_np); if (ret) return ret; + pmx->sys2pci_base = devm_ioremap_resource(&pdev->dev, &res); - if (IS_ERR(pmx->sys2pci_base)) { - of_node_put(sys2pci_np); + if (IS_ERR(pmx->sys2pci_base)) return -ENOMEM; - } pmx->dev = &pdev->dev; -- 2.9.3
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Mon, Dec 19, 2016 at 09:27:18PM -0800, Andy Lutomirski wrote: > On Mon, Dec 19, 2016 at 8:44 PM, Alexei Starovoitov > wrote: > > On Mon, Dec 19, 2016 at 07:12:48PM -0800, Andy Lutomirski wrote: > >> > >> struct cgroup_bpf { > >> /* > >> * Store two sets of bpf_prog pointers, one for programs that are > >> * pinned directly to this cgroup, and one for those that are > >> effective > >> * when this cgroup is accessed. > >> */ > >> struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE]; > >> struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE]; > >> }; > >> > >> in struct cgroup, there's a 'struct cgroup_bpf bpf;'. > >> > >> This would change to something like: > >> > >> struct cgroup_filter_slot { > >> struct bpf_prog *effective; > >> struct cgroup_filter_slot *next; > >> struct bpf_prog *local; > >> } > >> > >> local is NULL unless *this* cgroup has a filter. effective points to > >> the bpf_prog that's active in this cgroup or the nearest ancestor that > >> has a filter. next is NULL if there are no filters higher in the > >> chain or points to the next slot that has a filter. struct cgroup > >> has: > >> > >> struct cgroup_filter_slot filters[MAX_BPF_ATTACH_TYPE]; > >> > >> To evaluate it, you do: > >> > >> struct cgroup_filter_slot *slot = &cgroup->slot[the index]; > >> > >> if (!slot->effective) > >> return; > >> > >> do { > >> evaluate(slot->effective); > >> slot = slot->next; > >> } while (unlikely(slot)); > > > > yes. something like this can work as a future extension > > to support multiple programs for security use case. > > Please propose a patch. > > Again, it's not needed today and there is no rush to implement it. > > > > If this happens after 4.10 and 4.10 is released as is, then this > change would be an ABI break. it won't break existing apps. please study how bpf syscall was extended in the past without breaking anything. Same thing here. The default is one program per hook per cgroup. Everything else is future extensions.
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Mon, Dec 19, 2016 at 8:44 PM, Alexei Starovoitov wrote: > On Mon, Dec 19, 2016 at 07:12:48PM -0800, Andy Lutomirski wrote: >> >> struct cgroup_bpf { >> /* >> * Store two sets of bpf_prog pointers, one for programs that are >> * pinned directly to this cgroup, and one for those that are >> effective >> * when this cgroup is accessed. >> */ >> struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE]; >> struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE]; >> }; >> >> in struct cgroup, there's a 'struct cgroup_bpf bpf;'. >> >> This would change to something like: >> >> struct cgroup_filter_slot { >> struct bpf_prog *effective; >> struct cgroup_filter_slot *next; >> struct bpf_prog *local; >> } >> >> local is NULL unless *this* cgroup has a filter. effective points to >> the bpf_prog that's active in this cgroup or the nearest ancestor that >> has a filter. next is NULL if there are no filters higher in the >> chain or points to the next slot that has a filter. struct cgroup >> has: >> >> struct cgroup_filter_slot filters[MAX_BPF_ATTACH_TYPE]; >> >> To evaluate it, you do: >> >> struct cgroup_filter_slot *slot = &cgroup->slot[the index]; >> >> if (!slot->effective) >> return; >> >> do { >> evaluate(slot->effective); >> slot = slot->next; >> } while (unlikely(slot)); > > yes. something like this can work as a future extension > to support multiple programs for security use case. > Please propose a patch. > Again, it's not needed today and there is no rush to implement it. > If this happens after 4.10 and 4.10 is released as is, then this change would be an ABI break. --Andy
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Mon, Dec 19, 2016 at 8:51 PM, Alexei Starovoitov wrote: > On Mon, Dec 19, 2016 at 05:40:53PM -0800, Andy Lutomirski wrote: >> >> By the way, even if Alexei is right, the BPF_PROG_DETACH API doesn't >> even take a reference to a BPF program as an argument. What is it >> supposed to do if this mechanism ever gets extended? > > we just add another field to that anonymous union just like > we did for other commands and everything is backwards compatible. > It's the basics of bpf syscall that we've been relying on for some > time now and it worked just fine. And what happens if you don't specify that member and two programs are attached? --Andy
Re: [PATCH v4 0/2] FPGA: TS-7300 FPGA manager
On Mon, 19 Dec 2016, Florian Fainelli wrote: > On 12/18/2016 12:21 PM, Florian Fainelli wrote: > > Hi all, > > > > This patch series adds support for loading bitstreams into the Altera > > Cyclone II > > connected to an EP9302 on a TS-7300 board. > > > > Changes in v4: > > > > - fixed ops->write not to do the final configuration release > > - reordered patches > > > > Changes in v3: > > > > - fix write_init and write_complete signatures > > > > Changes in v2: > > > > - rebased against fpga/for-next > > - added defines for configuration bits and delays > > - added error mesage if ioremap() fails > > - detailed how the configuration through CPLD is done > > Alan, Moritz, thanks for providing Acked-by, I was under the impression > these patches would be taken by you through the FPGA tree, Hi Florain, That's right, I'll handle it. The FPGA tree goes in through Greg. We're in the merge window where stuff Greg already has is going to Linus so it's too late to get this into 4.10. I'll be doing a pull after that. Alan > since Hartley > acked the EP93xx part and since that was the tree used as a baseline. > > Let me know how you want to proceed, since EP93xx is not as active as > other ARM SoCs, there may not be a specific tree where to stage these > patches (unless you take them). > > Thanks! > -- > Florian > -- > To unsubscribe from this list: send the line "unsubscribe linux-fpga" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
On Mon, Dec 19, 2016 at 11:15:15PM +0800, Boqun Feng wrote: > On Thu, Dec 15, 2016 at 02:51:36PM +, Colin Ian King wrote: > > On 15/12/16 14:42, Boqun Feng wrote: > > > On Thu, Dec 15, 2016 at 12:04:59PM +, Mark Rutland wrote: > > >> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote: > > >>> ->qsmask of an RCU leaf node is usually more sparse than the > > >>> corresponding cpu_possible_mask. So replace the > > >>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with > > >>> for_each_leaf_node_cpu() to save several checks. > > >>> > > >>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate the > > >>> corresponding mask for a bit because @mask is unsigned long, this was > > >>> spotted by Colin Ian King and CoverityScan in > > >>> a previous version of this patch.] > > >> > > >> Nit: This note can go now that we use leaf_node_cpu_bit(). ;) > > >> > > > > > > I kinda keep this here for honoring the effort of finding out this bug > > > from Colin, but yes, it's no longer needed here for the current code. > > > > Yep, remove it. > > > > Paul, here is a modified version of this patch, what I only did is > removing this note. > > Besides I rebased the whole series on the current rcu/dev branch of -rcu > tree, on this very commit: > > 8e9b2521b18a ("doc: Quick-Quiz answers are now inline") > > And I put the latest version at > > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git leaf-node > > If you thought it's better, I could send a v3 ;-) I would feel better about this patchset if it reduced the number of lines of code rather than increasing them. That said, part of the increase is a commment. Still, I am not convinced that the extra level of macro is carrying its weight. dbf18a2422e2 ("rcu: Introduce for_each_leaf_node_cpu()") The commit log needs a bit of wordsmithing. The added WARN_ON_ONCE(!cpu_possible(cpu)) still seems strange. What is its purpose, really? What does its triggering tell you? What other checks did you consider as an alternative? And if you are going to add checks of this type, should you also check for this being a leaf rcu_node structure? 3f0b4ba1fe94 ("rcu: Use for_each_leaf_node_cpu() in RCU stall checking") This does look a bit nicer, but why the added blank lines? Are they really helping? The commit log seems a bit misplaced. This code is almost never executed (once per 21 seconds at the most), so performance really isn't a consideration. The simpler-looking code might be. fd799f1ac7b7 ("rcu: Use for_each_leaf_node_cpu() in ->expmask iteration") Ditto on blank lines. Again, this code is executed per expedited grace period, so performance really isn't a big deal. More of a big deal than the stall-warning code, but we still are way off of any fastpath. 69a1baedbf42 ("rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()") Ditto again on blank lines. And on the commit log. This code is executed about once per several jiffies, and on larger machines, per 20 jiffies or so. Performance really isn't a consideration. 7b00e50e3efb ("rcu: Use for_each_leaf_node_cpu() in online CPU iteration") And another ditto on blank lines. This code executes once per CPU-hotplug operation, so again isn't at all performance critical. In short, if you are trying to sell this to me as a significant performance boost, I am not buying. The added WARN_ON_ONCE() looks quite dubious, though perhaps I am misunderstanding its purpose. My assumption is that you want to detect missing UL suffixes on bitmask constants, in which case I bet there is a better way. Speaking of which, how do we know that this is free of bugs? Thanx, Paul > Regards, > Boqun > > >8 > From: Boqun Feng > Date: Thu, 8 Dec 2016 23:21:11 +0800 > Subject: [PATCH v2.1 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp() > > ->qsmask of an RCU leaf node is usually more sparse than the > corresponding cpu_possible_mask. So replace the > for_each_leaf_node_possible_cpu() in force_qs_rnp() with > for_each_leaf_node_cpu() to save several checks. > > Signed-off-by: Boqun Feng > --- > kernel/rcu/tree.c | 12 +--- > 1 file changed, 5 insertions(+), 7 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 4ea4496f4ecc..c2b753fb7f09 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3046,13 +3046,11 @@ static void force_qs_rnp(struct rcu_state *rsp, > continue; > } > } > - for_each_leaf_node_possible_cpu(rnp, cpu) { > - unsigned long bit = leaf_node_cpu_bit(rnp, cpu); > - if ((rnp->qsmask & bit) != 0) { > - if (f(per_cpu_p
Re: [PATCH] Xen: ARM: Zero reserved fields of xatp before making hypervisor call
On 12/19/16 12:49, Stefano Stabellini wrote: > On Mon, 19 Dec 2016, Juergen Gross wrote: >> On 19/12/16 03:56, Jiandi An wrote: >>> Ensure all reserved fields of xatp are zero before making hypervisor >>> call to XEN in xen_map_device_mmio(). xenmem_add_to_physmap_one() in >>> XEN fails the mapping request if extra.res reserved field in xatp is >>> not zero for XENMAPSPACE_dev_mmio request. >>> >>> Signed-off-by: Jiandi An >>> --- >>> drivers/xen/arm-device.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/xen/arm-device.c b/drivers/xen/arm-device.c >>> index 778acf8..208273b 100644 >>> --- a/drivers/xen/arm-device.c >>> +++ b/drivers/xen/arm-device.c >>> @@ -87,6 +87,9 @@ static int xen_map_device_mmio(const struct resource >>> *resources, >>> idxs[j] = XEN_PFN_DOWN(r->start) + j; >>> } >>> >>> + /* Ensure reserved fields are set to zero */ >>> + memset(&xatp, 0, sizeof(xatp)); >>> + >>> xatp.domid = DOMID_SELF; >>> xatp.size = nr; >>> xatp.space = XENMAPSPACE_dev_mmio; >> >> Instead of setting xatp to 0 in each iteration I'd prefer a static >> initialization of .domid and .space: >> >> struct xen_add_to_physmap_range xatp = { >> .domid = DOMID_SELF, >> .space = XENMAPSPACE_dev_mmio >> }; > > +1 > Hi Juergen, Thanks for you comments. xatp is passed to XEN via the hypervisor call in each loop. XEN touches xatp and returns it back. For example XEN returns error of underlying mapping call in the err[] array in xatp. (The err[] is not checked after the hypervisor call returns and it's a bug to be addressed in a separate patch) XEN could theoretically corrupt xatp when it's returned. And the loop would go on to the next iteration passing in whatever that's in xatp returned by the previous hypervisor call. Harder to debug in my opinion if xatp get corrupted by XEN somehow when a bug is introduced in XEN. At first I put the memset of xatp at the beginning outside of the loop. But I thought it's better to initialize xatp that's passed in each time a hypervisor call is made so we know exactly we set going into the hypervisor call. -- Jiandi An Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Mon, Dec 19, 2016 at 05:40:53PM -0800, Andy Lutomirski wrote: > > By the way, even if Alexei is right, the BPF_PROG_DETACH API doesn't > even take a reference to a BPF program as an argument. What is it > supposed to do if this mechanism ever gets extended? we just add another field to that anonymous union just like we did for other commands and everything is backwards compatible. It's the basics of bpf syscall that we've been relying on for some time now and it worked just fine.
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Mon, Dec 19, 2016 at 07:50:01PM -0800, Andy Lutomirski wrote: > >> > >> net.socket_create_filter = "none": no filter > >> net.socket_create_filter = "bpf:baadf00d": bpf filter > > > > i'm assuming 'baadf00d' is bpf program fd expressed a text string? > > and kernel needs to parse above? will you allow capital and lower > > case for 'bpf:' ? and mixed case too? spaces and tabs allowed or not? > > can program fd expressed as decimal or hex or both? > > how do you return the error? as a text string for user space > > to parse? > > No. The kernel does not parse it because you cannot write this to the > file. You set a bpf filter with ioctl and pass an fd. If you *read* > the file, you get the same bpf program hash that fdinfo on the bpf > object would show -- this is for debugging and (eventually) CRIU. my understanding that cgroup is based on kernfs and both don't support ioctl, so you'd need quite some hacking to introduce such concepts and buy-in from a bunch of people first. > >> net.socket_create_filter = "iptables:foobar": some iptables thingy > >> net.socket_create_filter = "nft:blahblahblah": some nft thingy > > > > iptables/nft are not applicable to 'bpf_socket_create' which > > looks into: > > struct bpf_sock { > > __u32 bound_dev_if; > > __u32 family; > > __u32 type; > > __u32 protocol; > > }; > > so don't fit as an example. > > The code that takes a 'struct sock' and sets up bpf_sock is > bpf-specific and would obviously not be used for non-bpf filter. But > if you had a filter that was just a like of address families, that > filter would look at struct sock and do its own thing. iptables > wouldn't make sense for a socket creation filter, but it would make > perfect sense for an ingress filter. Obviously the bpf-specific state > object wouldn't be used, but it would still be a hook, invoked from > the same network code, guarded by the same static key, looking at the > same skb. I strongly suggest to go back and read my first reply where I think I explained well enough that something like iptables will not able to reuse the ioctl mechanism you're proposing here, hook ids will be different, attachment mechanism will be different too. So your proposed cgroup ioctl is already dead as a reusable interface. > >> net.socket_create_filter = "disallow": no sockets created period > >> net.socket_create_filter = "address_family_list:1,2,3": allow AF 1, 2, and > >> 3 > > > > so you're proposing to add a bunch of hard coded logic to the kernel. > > First to parse such text into some sort of syntax tree or list/set > > and then have hard coded logic specifically for these two use cases? > > While above two can be implemented as trivial bpf programs already?! > > That goes 180% degree vs bpf philosophy. bpf is about moving > > the specific code out of the kernel and keeping kernel generic that > > it can solve as many use cases as possible by being programmable. > > I'm not seriously proposing implementing these. My point is that > *bpf*, while wonderful, is not the be-all-and-end-all of kernel > configurability, and other types of hooks might want to be hooked in > here. Then please let's talk about real use cases. This daydreaming of some future llvm in the kernel that you were bringing up during LPC doesn't help the discussion. Just like these artificial examples. > > ... > >> What exactly isn't sensible about using cgroup_bpf for containers? > > > > my use case is close to zero overhead application network monitoring. > > So if I set up a cgroup that's monitored and call it /cgroup/a and > enable delegation and if the program running there wants to do its own > monitoring in /cgroup/a/b (via delegation), then you really want the > outer monitor to silently drop events coming from /cgroup/a/b? yes. both are root and must talk to each other if they want to co-exist. When root process is asking kernel to do X, this X has to happen. > Then disallow nesting. You're welcome to not rush the decision, but > that's not what you've done. If 4.10 is released as is, you've made > the decision and you're going to have a hard time changing it. Nothing needs to be changed. > > No. As was pointed out before only root can load > > BPF_PROG_TYPE_CGROUP_[SKB|SOCK] > > type programs and only root can attach them. > > Why? It really seems to me that you expect that future namespaceable > bpf hooks will use a totally different API. At KS, I sat in a room > full of people arguing about cgroup v2 and a lot of them pointed out > that there are valid, paying use cases that want to stick cgroup v1 in > a container, because the code that uses cgroup v1 in the container is > called systemd and the contained OS is called RHEL (or SuSE or Ubuntu > or Gentoo or whatever), and that code is *already written* and needs > to be contained. bpf in general is not namespace aware. It's global and this cgroup scoping of bpf programs is the first of this kind. Namespacing of bpf is comp
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Mon, Dec 19, 2016 at 07:12:48PM -0800, Andy Lutomirski wrote: > > struct cgroup_bpf { > /* > * Store two sets of bpf_prog pointers, one for programs that are > * pinned directly to this cgroup, and one for those that are > effective > * when this cgroup is accessed. > */ > struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE]; > struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE]; > }; > > in struct cgroup, there's a 'struct cgroup_bpf bpf;'. > > This would change to something like: > > struct cgroup_filter_slot { > struct bpf_prog *effective; > struct cgroup_filter_slot *next; > struct bpf_prog *local; > } > > local is NULL unless *this* cgroup has a filter. effective points to > the bpf_prog that's active in this cgroup or the nearest ancestor that > has a filter. next is NULL if there are no filters higher in the > chain or points to the next slot that has a filter. struct cgroup > has: > > struct cgroup_filter_slot filters[MAX_BPF_ATTACH_TYPE]; > > To evaluate it, you do: > > struct cgroup_filter_slot *slot = &cgroup->slot[the index]; > > if (!slot->effective) > return; > > do { > evaluate(slot->effective); > slot = slot->next; > } while (unlikely(slot)); yes. something like this can work as a future extension to support multiple programs for security use case. Please propose a patch. Again, it's not needed today and there is no rush to implement it.
Re: [PATCH v10 0/8] power: add power sequence library
On Mon, Dec 19, 2016 at 09:15:04PM +0200, Krzysztof Kozlowski wrote: > On Mon, Nov 14, 2016 at 09:35:51AM +0800, Peter Chen wrote: > > Hi all, > > > > This is a follow-up for my last power sequence framework patch set [1]. > > According to Rob Herring and Ulf Hansson's comments[2]. The kinds of > > power sequence instances will be added at postcore_initcall, the match > > criteria is compatible string first, if the compatible string is not > > matched between dts and library, it will try to use generic power sequence. > > > > The host driver just needs to call of_pwrseq_on/of_pwrseq_off > > if only one power sequence instance is needed, for more power sequences > > are used, using of_pwrseq_on_list/of_pwrseq_off_list instead (eg, USB hub > > driver). > > > > In future, if there are special power sequence requirements, the special > > power sequence library can be created. > > > > This patch set is tested on i.mx6 sabresx evk using a dts change, I use > > two hot-plug devices to simulate this use case, the related binding > > change is updated at patch [1/6], The udoo board changes were tested > > using my last power sequence patch set.[3] > > > > Except for hard-wired MMC and USB devices, I find the USB ULPI PHY also > > need to power on itself before it can be found by ULPI bus. > > > > [1] http://www.spinics.net/lists/linux-usb/msg142755.html > > [2] http://www.spinics.net/lists/linux-usb/msg143106.html > > [3] http://www.spinics.net/lists/linux-usb/msg142815.html > > > > Changes for v10: > > - Improve the kernel-doc for power sequence core, including exported APIs > > and > > main structure. [Patch 2/8] > > - Change Kconfig, and let the user choose power sequence. [Patch 2/8] > > - Delete EXPORT_SYMBOL and change related APIs as local, these APIs do not > > be intended to export currently. [Patch 2/8] > > - Selete POWER_SEQUENCE at USB core's Kconfig. [Patch 4/8] > > Hi Peter, > > It is great that you continued the work on this. > > I saw (in some previous mails) your repo mentioned: > https://git.kernel.org/cgit/linux/kernel/git/peter.chen/usb.git/ > Does it contain the recent version of this patchset? > > I want to build on top of it fixes for usb3503 on Odroid U3 board. Krzysztof, I put v10 patch set at branch: pwrseq-lib. It seems there are no more comments I will send my v11 patch set after new year holiday. -- Best Regards, Peter Chen
Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command
Hi Mathias, On 12/19/2016 08:13 PM, Mathias Nyman wrote: > On 19.12.2016 13:34, Baolin Wang wrote: >> Hi Mathias, >> >> On 19 December 2016 at 18:33, Mathias Nyman >> wrote: >>> On 13.12.2016 05:21, Baolin Wang wrote: Hi Mathias, On 12 December 2016 at 23:52, Mathias Nyman wrote: > > On 05.12.2016 09:51, Baolin Wang wrote: >> >> >> If a command event is found on the event ring during an interrupt, >> we need to stop the command timer with del_timer(). Since del_timer() >> can fail if the timer is running and waiting on the xHCI lock, then >> it maybe get the wrong timeout command in xhci_handle_command_timeout() >> if host fetched a new command and updated the xhci->current_cmd in >> handle_cmd_completion(). For this situation, we need a way to signal >> to the command timer that everything is fine and it should exit. > > > > Ah, right, this could actually happen. > >> >> >> We should introduce a counter (xhci->current_cmd_pending) for the number >> of pending commands. If we need to cancel the command timer and >> del_timer() >> succeeds, we decrement the number of pending commands. If del_timer() >> fails, >> we leave the number of pending commands alone. >> >> For handling timeout command, in xhci_handle_command_timeout() we will >> check >> the counter after decrementing it, if the counter >> (xhci->current_cmd_pending) >> is 0, which means xhci->current_cmd is the right timeout command. If the >> counter (xhci->current_cmd_pending) is greater than 0, which means >> current >> timeout command has been handled by host and host has fetched new >> command >> as >> xhci->current_cmd, then just return and wait for new current command. > > > > A counter like this could work. > > Writing the abort bit can generate either ABORT+STOP, or just STOP > event, this seems to cover both. > > quick check, case 1: timeout and cmd completion at the same time. > > cpu1cpu2 > > queue_command(first), p++ (=1) > queue_command(more), > --completion irq fires---- timer times out at same time-- > handle_cmd_completion() handle_cmd_timeout(),) > lock(xhci_lock ) spin_on(xhci_lock) > del_timer() fail, p (=1, nochange) > cur_cmd = list_next(), p++ (=2) > unlock(xhci_lock) > lock(xhci_lock) > p-- (=1) > if (p > 0), exit > OK works > > case 2: normal timeout case with ABORT+STOP, no race. > > cpu1cpu2 > > queue_command(first), p++ (=1) > queue_command(more), > handle_cmd_timeout() > p-- (P=0), don't exit > mod_timer(), p++ (P=1) > write_abort_bit() > handle_cmd_comletion(ABORT) > del_timer(), ok, p-- (p = 0) > handle_cmd_completion(STOP) > del_timer(), fail, (P=0) > handle_stopped_cmd_ring() > cur_cmd = list_next(), p++ (=1) > mod_timer() > > OK, works, and same for just STOP case, with the only difference that > during handle_cmd_completion(STOP) p is decremented (p--) Yes, that's the cases what I want to handle, thanks for your explicit explanation. >>> >>> Gave this some more thought over the weekend, and this implementation >>> doesn't solve the case when the last command times out and races with the >>> completion handler: >>> >>> cpu1cpu2 >>> >>> queue_command(first), p++ (=1) >>> --completion irq fires---- timer times out at same time-- >>> handle_cmd_completion() handle_cmd_timeout(),) >>> lock(xhci_lock )spin_on(xhci_lock) >>> del_timer() fail, p (=1, nochange) >>> no more commands, P (=1, nochange) >>> unlock(xhci_lock) >>> lock(xhci_lock) >>> p-- (=0) >>> p == 0, continue, even if we should >>> not. >>>For this we still need to rely on >>> checking cur_cmd == NULL in the timeout function. >>> (Baolus patch sets it to NULL if there are no more commands pending) >> >> As I pointed out in patch 1 of this patchset, this patchset is based >> on Lu Baolu's new fix patch: >> usb: xhci: fix possible wild pointer >> https://www.spinics.net/lists/linux-usb/msg150219.html >> >> After applying Baolu's patch, after decrement the counter, we will >> check the xhci->cur_command if is NULL. So in thi
Re: [PATCH] thermal: add thermal_zone_remove_device_groups()
Hey Yasuaki San, On Thu, Dec 15, 2016 at 04:47:08PM -0500, Yasuaki Ishimatsu wrote: > When offlining all cores on a CPU, the following system panic > occurs: > > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: strlen+0x0/0x20 > > Call Trace: > ? kernfs_name_hash+0x17/0x80 > kernfs_find_ns+0x3f/0xd0 > kernfs_remove_by_name_ns+0x36/0xa0 > remove_files.isra.1+0x36/0x70 > sysfs_remove_group+0x44/0x90 > sysfs_remove_groups+0x2e/0x50 > device_remove_attrs+0x5e/0x90 > device_del+0x1ea/0x350 > device_unregister+0x1a/0x60 > thermal_zone_device_unregister+0x1f2/0x210 > pkg_thermal_cpu_offline+0x14f/0x1a0 [x86_pkg_temp_thermal] > ? kzalloc.constprop.2+0x10/0x10 [x86_pkg_temp_thermal] > cpuhp_invoke_callback+0x8d/0x3f0 > cpuhp_down_callbacks+0x42/0x80 > cpuhp_thread_fun+0x8b/0xf0 > smpboot_thread_fn+0x110/0x160 > kthread+0x101/0x140 > ? sort_range+0x30/0x30 > ? kthread_park+0x90/0x90 > ret_from_fork+0x25/0x30 > > thermal_zone_create_device_group() sets attribute_groups in > thermal_zone_attribute_groups[] to tz->device.groups. But these > attributes_groups do not have name argument. > > So when offlining all cores on CPU and executing > thermal_zone_device_unregister(), the panic occurs in strlen() > called from kernfs_name_hash() because name argument is NULL. > > The patch adds thermal_zone_remove_device_groups() to free > tz->device.groups and set NULL pointer. > Thanks for finding this and sending a fix. Overall I am OK with the patch, but I have some questions, as follows. > Signed-off-by: Yasuaki Ishimatsu > CC: Zhang Rui > CC: Eduardo Valentin > --- > drivers/thermal/thermal_core.c | 3 ++- > drivers/thermal/thermal_core.h | 1 + > drivers/thermal/thermal_sysfs.c | 6 ++ First question is, are you seeing same problem with cooling devices ? They follow same strategy, of setting the groups property, and the cooling device groups also have no name. > 3 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index 641faab..926e385 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -1251,6 +1251,7 @@ struct thermal_zone_device * > > unregister: > release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id); > + thermal_zone_remove_device_groups(tz); > device_unregister(&tz->device); > return ERR_PTR(result); > } > @@ -1315,8 +1316,8 @@ void thermal_zone_device_unregister(struct > thermal_zone_device *tz) > release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id); > idr_destroy(&tz->idr); > mutex_destroy(&tz->lock); > + thermal_zone_remove_device_groups(tz); > device_unregister(&tz->device); > - kfree(tz->device.groups); > } > EXPORT_SYMBOL_GPL(thermal_zone_device_unregister); > > diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h > index 2412b37..e3a60db 100644 > --- a/drivers/thermal/thermal_core.h > +++ b/drivers/thermal/thermal_core.h > @@ -70,6 +70,7 @@ void thermal_zone_device_unbind_exception(struct > thermal_zone_device *, > int thermal_build_list_of_policies(char *buf); > > /* sysfs I/F */ > +void thermal_zone_remove_device_groups(struct thermal_zone_device *tz); > int thermal_zone_create_device_groups(struct thermal_zone_device *, int); > void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *); > /* used only at binding time */ > diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c > index a694de9..3dfd29b 100644 > --- a/drivers/thermal/thermal_sysfs.c > +++ b/drivers/thermal/thermal_sysfs.c > @@ -605,6 +605,12 @@ static int create_trip_attrs(struct thermal_zone_device > *tz, int mask) > return 0; > } > > +void thermal_zone_remove_device_groups(struct thermal_zone_device *tz) > +{ > + kfree(tz->device.groups); OK. This part needs to be done regardless > + tz->device.groups = NULL; This almost defeats the purpose of the effort to put all properties to device.groups, as we need to care about releasing them. I guess the problem is we cannot keep the same sysfs entries if we set a name on the thermal device groups. That would break userspace. And if we do not set, we need to care about releasing them. Almost feel like we might consider patching sysfs core. Or maybe find another way to setup the device groups for thermal devices. > +} > + > int thermal_zone_create_device_groups(struct thermal_zone_device *tz, > int mask) > { > -- > 1.8.3.1 >
Re: [PATCH v4 0/2] FPGA: TS-7300 FPGA manager
On 12/18/2016 12:21 PM, Florian Fainelli wrote: > Hi all, > > This patch series adds support for loading bitstreams into the Altera Cyclone > II > connected to an EP9302 on a TS-7300 board. > > Changes in v4: > > - fixed ops->write not to do the final configuration release > - reordered patches > > Changes in v3: > > - fix write_init and write_complete signatures > > Changes in v2: > > - rebased against fpga/for-next > - added defines for configuration bits and delays > - added error mesage if ioremap() fails > - detailed how the configuration through CPLD is done Alan, Moritz, thanks for providing Acked-by, I was under the impression these patches would be taken by you through the FPGA tree, since Hartley acked the EP93xx part and since that was the tree used as a baseline. Let me know how you want to proceed, since EP93xx is not as active as other ARM SoCs, there may not be a specific tree where to stage these patches (unless you take them). Thanks! -- Florian
Re: [PATCH v3 4/5] nvme: Implement resume_from_suspend and SED Allocation code.
Hi Scott, [auto build test WARNING on linus/master] [also build test WARNING on v4.9 next-20161219] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Scott-Bauer/include-Add-definitions-for-sed/20161220-110214 config: x86_64-randconfig-i0-201651 (attached as .config) compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): drivers/nvme/host/core.c: In function 'nvme_sec_submit': >> drivers/nvme/host/core.c:770:9: warning: missing braces around initializer >> [-Wmissing-braces] struct nvme_command cmd = { 0 }; ^ drivers/nvme/host/core.c:770:9: warning: (near initialization for 'cmd.') [-Wmissing-braces] Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size Cyclomatic Complexity 5 include/linux/compiler.h:__write_once_size Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:__set_bit Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:test_and_set_bit Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:constant_test_bit Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:variable_test_bit Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls64 Cyclomatic Complexity 1 include/uapi/linux/byteorder/little_endian.h:__le64_to_cpup Cyclomatic Complexity 1 include/uapi/linux/byteorder/little_endian.h:__le16_to_cpup Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u64 Cyclomatic Complexity 1 include/linux/list.h:INIT_LIST_HEAD Cyclomatic Complexity 1 include/linux/list.h:__list_del Cyclomatic Complexity 1 include/linux/list.h:list_empty Cyclomatic Complexity 1 arch/x86/include/asm/current.h:get_current Cyclomatic Complexity 2 arch/x86/include/asm/page_64.h:__phys_addr_nodebug Cyclomatic Complexity 1 include/asm-generic/getorder.h:__get_order Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_read Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_set Cyclomatic Complexity 2 arch/x86/include/asm/atomic.h:atomic_sub_and_test Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_add_return Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_cmpxchg Cyclomatic Complexity 5 arch/x86/include/asm/atomic.h:__atomic_add_unless Cyclomatic Complexity 1 include/linux/atomic.h:atomic_add_unless Cyclomatic Complexity 1 include/linux/err.h:PTR_ERR Cyclomatic Complexity 1 include/linux/thread_info.h:check_object_size Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check Cyclomatic Complexity 1 include/linux/spinlock.h:spin_lock Cyclomatic Complexity 1 include/linux/spinlock.h:spin_lock_irq Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock_irq Cyclomatic Complexity 1 include/linux/jiffies.h:_msecs_to_jiffies Cyclomatic Complexity 5 include/linux/jiffies.h:msecs_to_jiffies Cyclomatic Complexity 1 include/linux/workqueue.h:to_delayed_work Cyclomatic Complexity 1 include/linux/signal.h:sigismember Cyclomatic Complexity 1 include/linux/sched.h:task_thread_info Cyclomatic Complexity 1 include/linux/slab.h:__kmalloc_node Cyclomatic Complexity 68 include/linux/slab.h:kmalloc_large Cyclomatic Complexity 5 include/linux/slab.h:kmalloc Cyclomatic Complexity 1 include/linux/slab.h:kmalloc_node Cyclomatic Complexity 1 include/linux/slab.h:kzalloc Cyclomatic Complexity 1 include/linux/slab.h:kzalloc_node Cyclomatic Complexity 1 include/linux/kref.h:kref_init Cyclomatic Complexity 1 include/linux/kref.h:kref_get_unless_zero Cyclomatic Complexity 1 include/linux/device.h:dev_to_node Cyclomatic Complexity 1 include/linux/device.h:dev_get_drvdata Cyclomatic Complexity 1 include/linux/fs.h:iminor Cyclomatic Complexity 1 include/linux/genhd.h:get_capacity Cyclomatic Complexity 1 include/linux/genhd.h:set_capacity Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_read Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_write Cyclomatic Complexity 11 arch/x86/include/asm/uaccess.h:copy_from_user Cyclomatic Complexity 11 arch/x86/include/asm/uaccess.h:copy_to_user Cyclomatic Complexity 1 include/linux/blk_types.h:op_is_write Cyclomatic Complexity 1 include/linux/blkdev.h:queue_flag_set_unlocked Cyclomatic Complexity 1 include/linux/blkdev.h:blk_rq_pos Cyclomatic Complexity 1 include/linux/blkdev.h:blk_rq_bytes Cyclomatic Complexity 1 include/linux/blkdev.h:blk_integrity_rq Cyclomatic Complexity 1 include/linux/blkdev.h:blk_queue_max_integrity_segments Cyclomatic Complexity 1 include/linux/blk-mq.h:blk_mq_rq_to_pdu Cyclomatic Complexity 1 include/linux/module.h:try_module_get Cyclomatic Complexity 1 include/linux/module.h
Re: [PATCH] nvmem: core: Allow ignoring length when reading a cell
On Tue, Dec 20, 2016 at 3:17 AM, Stephen Boyd wrote: > On 12/19, Vivek Gautam wrote: >> nvmem_cell_read() API fills in the argument 'len' with >> the number of bytes read from the cell. Many users don't >> care about this length value. So allow users to pass a >> NULL pointer to this len field. >> >> Signed-off-by: Vivek Gautam >> --- > > Reviewed-by: Stephen Boyd Thanks Stephen. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: thermal zones break with patch "Reimplement IDR and IDA using the radix tree" (mainline+next)
Hey, On Sat, Dec 17, 2016 at 01:48:33AM +0100, Heiko Stuebner wrote: > Hi Matthew, > > Am Freitag, 16. Dezember 2016, 21:19:37 CET schrieb Matthew Wilcox: > > From: Heiko Stuebner [mailto:he...@sntech.de] > > > > > commit b05bbe3ea2db ("Reimplement IDR and IDA using the radix tree") > > > seems to > > > break thermal zone allocation. This happens both on todays mainline and > > > linux-next-20161216 and produces errors like: > > > > > > > > > While I haven't looked to deeply into what idr exactly does, some > > > findings: > - thermal_zone0 and thermal_zone1 are allocated correctly > > > - every further thermal_zone always gets allocated the number "1" > > > - thermal core calls idr_alloc with 0 for both start and end > > > - the rewrite-patch seems to change the semantics of idr_alloc > > > > > > where it orignally said "@end: the maximum id (exclusive, <= 0 for > > > max)" > > > the "<= 0" part is gone now, but I checked, simply setting INT_MAX > > > as end in the thermal_core does not help > > > > > > Hi Heiko, > > > > Thanks for the report! The problem is because the thermal subsystem calls > > idr_alloc() passing a NULL pointer for the data. I have fixed this problem > > in my git tree but haven't sent the patch to Andrew yet. This patch should > > fix the problem for you: > > > http://git.infradead.org/users/willy/linux-dax.git/commitdiff/c52eeed7b759c3 > > fefe9b7f1b0a17a438df6950f3 > > yay, this fixes the issue. This should definitly go as fix into 4.10-rc and > when > you send it to Andrew you can add my > > Tested-by: Heiko Stuebner > > > > Now ... thermal is actually using an IDR when it could save memory by using > > an IDA. Are you interested in doing that conversion? > > That would more be a question for Eduardo and Rui :-) . I'm just in the > process of tracking down the smallish issues on my Rockchip board that are > poping up during the merge-window. I would prefer to get the right implementation, and fixing thermal core by using IDA. But given that this touches thermal core, we need an agreement with Rui too. > > > Anyway, thanks for pointing to the fix > Heiko
RE: [PATCH] block: loose check on sg gap
> From: Jens Axboe [mailto:ax...@fb.com] > Sent: Tuesday, December 20, 2016 10:31 > To: Ming Lei > Cc: Linux Kernel Mailing List ; linux-block > bl...@vger.kernel.org>; Christoph Hellwig ; Dexuan Cui > ; Vitaly Kuznetsov ; Keith Busch > ; Hannes Reinecke ; Mike Christie > ; Martin K. Petersen ; > Toshi Kani ; Dan Williams ; > Damien Le Moal > Subject: Re: [PATCH] block: loose check on sg gap > > On 12/19/2016 07:07 PM, Ming Lei wrote: > > On Sun, Dec 18, 2016 at 12:49 AM, Jens Axboe wrote: > >> On 12/17/2016 03:49 AM, Ming Lei wrote: > >>> If the last bvec of the 1st bio and the 1st bvec of the next > >>> bio are contineous physically, and the latter can be merged > >>> to last segment of the 1st bio, we should think they don't > >>> violate sg gap(or virt boundary) limit. > >>> > >>> Both Vitaly and Dexuan reported lots of unmergeable small bios > >>> are observed when running mkfs on Hyper-V virtual storage, and > >>> performance becomes quite low, so this patch is figured out for > >>> fixing the performance issue. > >>> > >>> The same issue should exist on NVMe too sine it sets virt boundary too. > >> > >> It looks pretty reasonable to me. I'll queue it up for some testing, > >> changes like this always make me a little nervous. > > > > Understood. > > > > But given it is still in early stage of 4.10 cycle, seems fine to expose > > it now, and we should have enough time to fix it if there might be > > regressions. > > > > BTW, it passes my xfstest(ext4) over sata/NVMe. > > It's been fine here in testing, too. I'm not worried about performance > regressions, those we can always fix. Merging makes me worried about > corruption, and those regressions are much worse. > > Any reason we need to rush this? I'd be more comfortable pushing this to > 4.11, unless there are strong reasons this should make 4.10. > > -- > Jens Axboe Hi Jens, As far as I know, the patch is important to popular Linux distros, e.g. at least Ubuntu 14.04.5, 16.x and RHEL 7.3, when they run on Hyper-V/Azure, because they can suffer from a pretty bad throughput/latency in some cases, e.g. mkfs.ext4 for a 100GB partition can take 8 minutes, but with the patch, it only takes 1 second. Thanks, -- Dexuan
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Mon, Dec 19, 2016 at 7:18 PM, Alexei Starovoitov wrote: > On Mon, Dec 19, 2016 at 04:25:32PM -0800, Andy Lutomirski wrote: >> I think we're still talking past each other. A big part of the point >> of changing it is that none of this is specific to bpf. You could (in > > the hooks and context passed into the program is very much bpf specific. > That's what I've been trying to convey all along. You mean BPF_CGROUP_RUN_PROG_INET_SOCK(sk)? There is nothing bpf specfic about the hook except that the name of this macro has "BPF" in it. There is nothing whatsoever that's bpf-specific about the context -- sk is not bpf-specific at all. The only thing bpf-specific about it is that it currently only invokes bpf programs. That could easily change. > >> theory -- I'm not proposing implementing these until there's demand) >> have: >> >> net.socket_create_filter = "none": no filter >> net.socket_create_filter = "bpf:baadf00d": bpf filter > > i'm assuming 'baadf00d' is bpf program fd expressed a text string? > and kernel needs to parse above? will you allow capital and lower > case for 'bpf:' ? and mixed case too? spaces and tabs allowed or not? > can program fd expressed as decimal or hex or both? > how do you return the error? as a text string for user space > to parse? No. The kernel does not parse it because you cannot write this to the file. You set a bpf filter with ioctl and pass an fd. If you *read* the file, you get the same bpf program hash that fdinfo on the bpf object would show -- this is for debugging and (eventually) CRIU. > >> net.socket_create_filter = "iptables:foobar": some iptables thingy >> net.socket_create_filter = "nft:blahblahblah": some nft thingy > > iptables/nft are not applicable to 'bpf_socket_create' which > looks into: > struct bpf_sock { > __u32 bound_dev_if; > __u32 family; > __u32 type; > __u32 protocol; > }; > so don't fit as an example. The code that takes a 'struct sock' and sets up bpf_sock is bpf-specific and would obviously not be used for non-bpf filter. But if you had a filter that was just a like of address families, that filter would look at struct sock and do its own thing. iptables wouldn't make sense for a socket creation filter, but it would make perfect sense for an ingress filter. Obviously the bpf-specific state object wouldn't be used, but it would still be a hook, invoked from the same network code, guarded by the same static key, looking at the same skb. > >> net.socket_create_filter = "disallow": no sockets created period >> net.socket_create_filter = "address_family_list:1,2,3": allow AF 1, 2, and 3 > > so you're proposing to add a bunch of hard coded logic to the kernel. > First to parse such text into some sort of syntax tree or list/set > and then have hard coded logic specifically for these two use cases? > While above two can be implemented as trivial bpf programs already?! > That goes 180% degree vs bpf philosophy. bpf is about moving > the specific code out of the kernel and keeping kernel generic that > it can solve as many use cases as possible by being programmable. I'm not seriously proposing implementing these. My point is that *bpf*, while wonderful, is not the be-all-and-end-all of kernel configurability, and other types of hooks might want to be hooked in here. > ... >> What exactly isn't sensible about using cgroup_bpf for containers? > > my use case is close to zero overhead application network monitoring. So if I set up a cgroup that's monitored and call it /cgroup/a and enable delegation and if the program running there wants to do its own monitoring in /cgroup/a/b (via delegation), then you really want the outer monitor to silently drop events coming from /cgroup/a/b? > >> >> > As you're pointing out, in case of security, we probably >> >> > want to preserve original bpf program that should always be >> >> > run first and only after it returned 'ok' (we'd need to define >> >> > what 'ok' means in secruity context) run program attached to >> >> > sub-hierarchy. >> >> >> >> It's already defined AFAICT. 1 means okay. 0 means not okay. >> > >> > sorry that doesn't make any sense. For seccomp we have a set of >> > ranges that mean different things. Here you're proposing to >> > hastily assign 1 and 0 ? How is that extensible? >> > We need to carefully think through what should be the semantics >> > of attaching multiple programs, consider performance implications, >> > return codes and so on. >> >> You already assigned it. The return value of the bpf program, loaded >> in Linus' tree today, tells the kernel whether to accept or reject. > > yes. that's what the program tells the hook. > I'm saying that whenever we have a link list of the programs > interaction between them may or may not be expressible with 1/0 > and I don't want to rush such decision. Then disallow nesting. You're welcome to not rush the decision, but that's not what you've done. If 4.10 is released as is, you've made
Re: [PATCH v3 2/5] lib: Add Sed-opal library
Hi Scott, [auto build test WARNING on linus/master] [also build test WARNING on v4.9 next-20161219] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Scott-Bauer/include-Add-definitions-for-sed/20161220-110214 config: i386-randconfig-r0-201651 (attached as .config) compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904 reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): In file included from lib/sed.c:20:0: >> include/linux/sed-opal.h:37:40: warning: 'struct request_queue' declared >> inside parameter list struct opal_dev *alloc_opal_dev(struct request_queue *q); ^ >> include/linux/sed-opal.h:37:40: warning: its scope is only this definition >> or declaration, which is probably not what you want lib/sed.c: In function 'fdev_sed_ioctl': lib/sed.c:161:12: error: dereferencing pointer to incomplete type 'struct file' if (!filep->f_sedctx || !filep->f_sedctx->ops || !filep->f_sedctx->dev) ^ -- In file included from lib/sed-opal.c:29:0: >> include/linux/sed-opal.h:37:40: warning: 'struct request_queue' declared >> inside parameter list struct opal_dev *alloc_opal_dev(struct request_queue *q); ^ >> include/linux/sed-opal.h:37:40: warning: its scope is only this definition >> or declaration, which is probably not what you want lib/sed-opal.c: In function 'opal_discovery0_end': lib/sed-opal.c:276:6: warning: unused variable 'error' [-Wunused-variable] int error = 0; ^ lib/sed-opal.c: In function 'response_parse': lib/sed-opal.c:793:15: error: invalid storage class for function 'response_get_string' static size_t response_get_string(const struct parsed_resp *resp, int n, ^ lib/sed-opal.c:793:1: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] static size_t response_get_string(const struct parsed_resp *resp, int n, ^ lib/sed-opal.c:817:12: error: invalid storage class for function 'response_get_u64' static u64 response_get_u64(const struct parsed_resp *resp, int n) ^ lib/sed-opal.c:846:11: error: invalid storage class for function 'response_status' static u8 response_status(const struct parsed_resp *resp) ^ lib/sed-opal.c:866:12: error: invalid storage class for function 'parse_and_check_status' static int parse_and_check_status(struct opal_dev *dev) ^ lib/sed-opal.c:883:13: error: invalid storage class for function 'clear_opal_cmd' static void clear_opal_cmd(struct opal_cmd *cmd) ^ lib/sed-opal.c:891:12: error: invalid storage class for function 'start_opal_session_cont' static int start_opal_session_cont(struct opal_dev *dev) ^ lib/sed-opal.c:913:20: error: invalid storage class for function 'opal_dev_get' static inline void opal_dev_get(struct opal_dev *dev) ^ lib/sed-opal.c:918:20: error: invalid storage class for function 'opal_dev_put' static inline void opal_dev_put(struct opal_dev *dev) ^ lib/sed-opal.c:923:12: error: invalid storage class for function 'add_suspend_info' static int add_suspend_info(struct opal_dev *dev, struct opal_suspend_data *sus) ^ lib/sed-opal.c:949:12: error: invalid storage class for function 'end_session_cont' static int end_session_cont(struct opal_dev *dev) ^ lib/sed-opal.c:956:12: error: invalid storage class for function 'finalize_and_send' static int finalize_and_send(struct opal_dev *dev, struct opal_cmd *cmd, ^ lib/sed-opal.c:972:12: error: invalid storage class for function 'gen_key' static int gen_key(struct opal_dev *dev) ^ lib/sed-opal.c:1002:12: error: invalid storage class for function 'get_active_key_cont' static int get_active_key_cont(struct opal_dev *dev) ^ lib/sed-opal.c:1027:12: error: invalid storage class for function 'get_active_key' static int get_active_key(struct opal_dev *dev) ^ lib/sed-opal.c:1067:12: error: invalid storage class for function 'generic_lr_enable_disable' static int generic_lr_enable_disable(struct opal_cmd *cmd, ^ lib/sed-opal.c:1107:19: error: invalid storage class for function 'enable_global_lr' static inline int enable_global_lr(struct opal_cmd *cmd, u8 *uid, ^ lib/sed-opal.c:1118:
Re: [PATCH v2 2/3] Bluetooth: btusb: Add out-of-band wakeup support
On Mon, Dec 19, 2016 at 05:46:19PM -0800, Rajat Jain wrote: > On Mon, Dec 19, 2016 at 3:10 PM, Brian Norris > wrote: > > On Fri, Dec 16, 2016 at 11:30:03AM -0800, Rajat Jain wrote: > >> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c > >> index ce22cef..beca4e9 100644 > >> --- a/drivers/bluetooth/btusb.c > >> +++ b/drivers/bluetooth/btusb.c > >> +static const struct of_device_id btusb_match_table[] = { > >> + { .compatible = "usb1286,204e" }, > >> + { } > >> +}; > >> +MODULE_DEVICE_TABLE(of, btusb_match_table); > > > > You define a match table here, but you also define essentially same > > table for Marvell-specific additions in patch 3. It looks like maybe > > it's legal to have more than one OF table in a module? But it seems like > > it would get confusing, besides being somewhat strange to maintain. It > > might also produce duplicate 'modinfo' output. > > > > If you really want to independently opt into device-tree-specified > > interrupts vs. Marvell-specific interrrupt configuration, then you > > should probably just merge the latter into the former table, and > > implement a Marvell/GPIO flag to stick in the .data field of this table. > > The data we want to stick (The pin number on the chip) is board > specific rather than being chip specific, and hence .data may not be > useful here. I just meant that if you conceptually wanted Marvell devices to "opt in" to this, then you could turn the .data field into some kind of flag or struct for describing capabilities (e.g., MARVELL_GPIO_WAKE or similar), effectively merging the two tables instead of having two mostly-overlapping tables. But you could do the same by putting such flag(s) in the {btusb,blacklist}_table[], or add such flag(s) later whenever there's a Marvell device that doesn't support this feature. > > Or it might be fine to drop one or both "match" checks. Particularly for > > the Marvell-specific stuff, it's probably fair just to check if it has > > an ->of_node and if 'id->driver_info & BTUSB_MARVELL'. Any other Marvell > > device-specific quirks could probably be keyed off of the > > (weirdly-named?) blacklist_table[], which already matches PID/VID. > > Yup, I think I can remove the compatible string checks. > > I'll be sending a V3. Great! Brian
Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command
On 19 December 2016 at 20:13, Mathias Nyman wrote: > On 19.12.2016 13:34, Baolin Wang wrote: >> >> Hi Mathias, >> >> On 19 December 2016 at 18:33, Mathias Nyman >> wrote: >>> >>> On 13.12.2016 05:21, Baolin Wang wrote: Hi Mathias, On 12 December 2016 at 23:52, Mathias Nyman wrote: > > > On 05.12.2016 09:51, Baolin Wang wrote: >> >> >> >> If a command event is found on the event ring during an interrupt, >> we need to stop the command timer with del_timer(). Since del_timer() >> can fail if the timer is running and waiting on the xHCI lock, then >> it maybe get the wrong timeout command in >> xhci_handle_command_timeout() >> if host fetched a new command and updated the xhci->current_cmd in >> handle_cmd_completion(). For this situation, we need a way to signal >> to the command timer that everything is fine and it should exit. > > > > > Ah, right, this could actually happen. > >> >> >> We should introduce a counter (xhci->current_cmd_pending) for the >> number >> of pending commands. If we need to cancel the command timer and >> del_timer() >> succeeds, we decrement the number of pending commands. If del_timer() >> fails, >> we leave the number of pending commands alone. >> >> For handling timeout command, in xhci_handle_command_timeout() we will >> check >> the counter after decrementing it, if the counter >> (xhci->current_cmd_pending) >> is 0, which means xhci->current_cmd is the right timeout command. If >> the >> counter (xhci->current_cmd_pending) is greater than 0, which means >> current >> timeout command has been handled by host and host has fetched new >> command >> as >> xhci->current_cmd, then just return and wait for new current command. > > > > > A counter like this could work. > > Writing the abort bit can generate either ABORT+STOP, or just STOP > event, this seems to cover both. > > quick check, case 1: timeout and cmd completion at the same time. > > cpu1cpu2 > > queue_command(first), p++ (=1) > queue_command(more), > --completion irq fires---- timer times out at same > time-- > handle_cmd_completion() handle_cmd_timeout(),) > lock(xhci_lock ) spin_on(xhci_lock) > del_timer() fail, p (=1, nochange) > cur_cmd = list_next(), p++ (=2) > unlock(xhci_lock) > lock(xhci_lock) > p-- (=1) > if (p > 0), exit > OK works > > case 2: normal timeout case with ABORT+STOP, no race. > > cpu1cpu2 > > queue_command(first), p++ (=1) > queue_command(more), > handle_cmd_timeout() > p-- (P=0), don't exit > mod_timer(), p++ (P=1) > write_abort_bit() > handle_cmd_comletion(ABORT) > del_timer(), ok, p-- (p = 0) > handle_cmd_completion(STOP) > del_timer(), fail, (P=0) > handle_stopped_cmd_ring() > cur_cmd = list_next(), p++ (=1) > mod_timer() > > OK, works, and same for just STOP case, with the only difference that > during handle_cmd_completion(STOP) p is decremented (p--) Yes, that's the cases what I want to handle, thanks for your explicit explanation. >>> >>> Gave this some more thought over the weekend, and this implementation >>> doesn't solve the case when the last command times out and races with the >>> completion handler: >>> >>> cpu1cpu2 >>> >>> queue_command(first), p++ (=1) >>> --completion irq fires---- timer times out at same time-- >>> handle_cmd_completion() handle_cmd_timeout(),) >>> lock(xhci_lock )spin_on(xhci_lock) >>> del_timer() fail, p (=1, nochange) >>> no more commands, P (=1, nochange) >>> unlock(xhci_lock) >>> lock(xhci_lock) >>> p-- (=0) >>> p == 0, continue, even if we >>> should >>> not. >>>For this we still need to rely >>> on >>> checking cur_cmd == NULL in the timeout function. >>> (Baolus patch sets it to NULL if there are no more commands pending) >> >> >> As I pointed out in patch 1 of this patchset, this patchset is based >> on Lu Baolu's new fix patch: >> usb: xhci: fix possible wild pointer >> https://www.spinics.net/lists/linux-usb/msg150219.html >> >> After applying Baolu's patch, after decrement
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Mon, Dec 19, 2016 at 04:25:32PM -0800, Andy Lutomirski wrote: > On Mon, Dec 19, 2016 at 4:02 PM, Alexei Starovoitov > wrote: > > On Mon, Dec 19, 2016 at 01:23:50PM -0800, Andy Lutomirski wrote: > >> On Mon, Dec 19, 2016 at 12:56 PM, Alexei Starovoitov > >> wrote: > >> > On Sat, Dec 17, 2016 at 10:18:44AM -0800, Andy Lutomirski wrote: > >> >> Hi all- > >> >> > >> >> I apologize for being rather late with this. I didn't realize that > >> >> cgroup-bpf was going to be submitted for Linux 4.10, and I didn't see > >> >> it on the linux-api list, so I missed the discussion. > >> >> > >> >> I think that the inet ingress, egress etc filters are a neat feature, > >> >> but I think the API has some issues that will bite us down the road > >> >> if it becomes stable in its current form. > >> >> > >> >> Most of the problems I see are summarized in this transcript: > >> >> > >> >> # mkdir cg2 > >> >> # mount -t cgroup2 none cg2 > >> >> # mkdir cg2/nosockets > >> >> # strace cgrp_socket_rule cg2/nosockets/ 0 > >> >> ... > >> >> open("cg2/nosockets/", O_RDONLY|O_DIRECTORY) = 3 > >> >> > >> >> You can modify a cgroup after opening it O_RDONLY? > >> >> > >> >> bpf(BPF_PROG_LOAD, {prog_type=0x9 /* BPF_PROG_TYPE_??? */, insn_cnt=2, > >> >> insns=0x7fffe3568c10, license="GPL", log_level=1, log_size=262144, > >> >> log_buf=0x6020c0, kern_version=0}, 48) = 4 > >> >> > >> >> This is fine. The bpf() syscall manipulates bpf objects. > >> >> > >> >> bpf(0x8 /* BPF_??? */, 0x7fffe3568bf0, 48) = 0 > >> >> > >> >> This is not so good: > >> >> > >> >> a) The bpf() syscall is supposed to manipulate bpf objects. This > >> >> is manipulating a cgroup. There's no reason that a socket > >> >> creation > >> >> filter couldn't be written in a different language (new iptables > >> >> table? Simple list of address families?), but if that happened, > >> >> then using bpf() to install it would be entirely nonsensical. > >> > > >> > I don't see why it's _modifing_ the cgroup. I'm looking at it as > >> > network stack is using cgroup as an application group that should > >> > invoke bpf program at the certain point in the stack. > >> > imo cgroup management is orthogonal. > >> > >> It is literally modifying the struct cgroup, and, as a practical > >> matter, it's causing membership in the cgroup to have a certain > >> effect. But rather than pointless arguing, let me propose an > >> alternative API that I think solves most of the problems here. > >> > >> In my model, BPF_PROG_ATTACH and BPF_PROG_DETACH go away completely. > >> Instead, the cgroup gets three new control files: > >> "net.ingress_filter", "net.egress_filter", and > >> "net.socket_create_filter". Initially, if you read these files, you > >> see "none\n". > >> > >> To attach a bpf filter, you open the file for write and do an ioctl on > >> it. After doing the ioctl, if you read the file, you'll see > >> "bpf:[hash]\n" where "[hash]" is exactly what you'd see in fdinfo for > >> the bpf program. > >> > >> To detach any type of filter, bpf or otherwise, you open the file for > >> write and write "none\n" (or just "none"). > >> > >> If you write anything else to the file, you get -EINVAL. But, if > >> someone writes a new type of filter (perhaps a simple list of address > >> families), maybe you can enable the filter by writing something > >> appropriate to the file. > > > > I see no difference in what you're proposing vs what is already implemented > > from feature set point of view, but the file approach is very ugly, since > > it's a mismatch to FD style access that bpf is using everywhere. > > In your proposal you'd also need to add bpf prefix everywhere. > > So the control file names should be bpf_inet_ingress, bpf_inet_egress > > and bpf_socket_create. > > I think we're still talking past each other. A big part of the point > of changing it is that none of this is specific to bpf. You could (in the hooks and context passed into the program is very much bpf specific. That's what I've been trying to convey all along. > theory -- I'm not proposing implementing these until there's demand) > have: > > net.socket_create_filter = "none": no filter > net.socket_create_filter = "bpf:baadf00d": bpf filter i'm assuming 'baadf00d' is bpf program fd expressed a text string? and kernel needs to parse above? will you allow capital and lower case for 'bpf:' ? and mixed case too? spaces and tabs allowed or not? can program fd expressed as decimal or hex or both? how do you return the error? as a text string for user space to parse? > net.socket_create_filter = "iptables:foobar": some iptables thingy > net.socket_create_filter = "nft:blahblahblah": some nft thingy iptables/nft are not applicable to 'bpf_socket_create' which looks into: struct bpf_sock { __u32 bound_dev_if; __u32 family; __u32 type; __u32 protocol; }; so don't fit as an example. > net.socket_create_filter = "disallow": no so
[PATCH] sparc: use symbolic names for tsb indexing
Use symbolic names MM_TSB_BASE and MM_TSB_HUGE instead of numeric values 0 and 1 in __tsb_context_switch. Code cleanup only, no functional change. Suggested-by: Sam Ravnborg Signed-off-by: Mike Kravetz --- arch/sparc/include/asm/mmu_context_64.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/sparc/include/asm/mmu_context_64.h b/arch/sparc/include/asm/mmu_context_64.h index b84be67..d031799 100644 --- a/arch/sparc/include/asm/mmu_context_64.h +++ b/arch/sparc/include/asm/mmu_context_64.h @@ -35,15 +35,15 @@ void __tsb_context_switch(unsigned long pgd_pa, static inline void tsb_context_switch(struct mm_struct *mm) { __tsb_context_switch(__pa(mm->pgd), -&mm->context.tsb_block[0], +&mm->context.tsb_block[MM_TSB_BASE], #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE) -(mm->context.tsb_block[1].tsb ? - &mm->context.tsb_block[1] : +(mm->context.tsb_block[MM_TSB_HUGE].tsb ? + &mm->context.tsb_block[MM_TSB_HUGE] : NULL) #else NULL #endif -, __pa(&mm->context.tsb_descr[0])); +, __pa(&mm->context.tsb_descr[MM_TSB_BASE])); } void tsb_grow(struct mm_struct *mm, -- 2.7.4
Re: [PATCH v3 2/5] lib: Add Sed-opal library
Hi Scott, [auto build test ERROR on linus/master] [also build test ERROR on v4.9 next-20161219] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Scott-Bauer/include-Add-definitions-for-sed/20161220-110214 config: i386-tinyconfig (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): In file included from lib/sed.c:20:0: >> include/linux/sed-opal.h:37:40: warning: 'struct request_queue' declared >> inside parameter list will not be visible outside of this definition or >> declaration struct opal_dev *alloc_opal_dev(struct request_queue *q); ^ lib/sed.c: In function 'fdev_sed_ioctl': >> lib/sed.c:161:12: error: dereferencing pointer to incomplete type 'struct >> file' if (!filep->f_sedctx || !filep->f_sedctx->ops || !filep->f_sedctx->dev) ^~ -- In file included from lib/sed-opal.c:29:0: >> include/linux/sed-opal.h:37:40: warning: 'struct request_queue' declared >> inside parameter list will not be visible outside of this definition or >> declaration struct opal_dev *alloc_opal_dev(struct request_queue *q); ^ lib/sed-opal.c: In function 'opal_discovery0_end': lib/sed-opal.c:276:6: warning: unused variable 'error' [-Wunused-variable] int error = 0; ^ lib/sed-opal.c: In function 'response_parse': >> lib/sed-opal.c:793:15: error: invalid storage class for function >> 'response_get_string' static size_t response_get_string(const struct parsed_resp *resp, int n, ^~~ >> lib/sed-opal.c:793:1: warning: ISO C90 forbids mixed declarations and code >> [-Wdeclaration-after-statement] static size_t response_get_string(const struct parsed_resp *resp, int n, ^~ >> lib/sed-opal.c:817:12: error: invalid storage class for function >> 'response_get_u64' static u64 response_get_u64(const struct parsed_resp *resp, int n) ^~~~ >> lib/sed-opal.c:846:11: error: invalid storage class for function >> 'response_status' static u8 response_status(const struct parsed_resp *resp) ^~~ >> lib/sed-opal.c:866:12: error: invalid storage class for function >> 'parse_and_check_status' static int parse_and_check_status(struct opal_dev *dev) ^~ >> lib/sed-opal.c:883:13: error: invalid storage class for function >> 'clear_opal_cmd' static void clear_opal_cmd(struct opal_cmd *cmd) ^~ >> lib/sed-opal.c:891:12: error: invalid storage class for function >> 'start_opal_session_cont' static int start_opal_session_cont(struct opal_dev *dev) ^~~ >> lib/sed-opal.c:913:20: error: invalid storage class for function >> 'opal_dev_get' static inline void opal_dev_get(struct opal_dev *dev) ^~~~ >> lib/sed-opal.c:918:20: error: invalid storage class for function >> 'opal_dev_put' static inline void opal_dev_put(struct opal_dev *dev) ^~~~ >> lib/sed-opal.c:923:12: error: invalid storage class for function >> 'add_suspend_info' static int add_suspend_info(struct opal_dev *dev, struct opal_suspend_data *sus) ^~~~ >> lib/sed-opal.c:949:12: error: invalid storage class for function >> 'end_session_cont' static int end_session_cont(struct opal_dev *dev) ^~~~ >> lib/sed-opal.c:956:12: error: invalid storage class for function >> 'finalize_and_send' static int finalize_and_send(struct opal_dev *dev, struct opal_cmd *cmd, ^ >> lib/sed-opal.c:972:12: error: invalid storage class for function 'gen_key' static int gen_key(struct opal_dev *dev) ^~~ >> lib/sed-opal.c:1002:12: error: invalid storage class for function >> 'get_active_key_cont' static int get_active_key_cont(struct opal_dev *dev) ^~~ >> lib/sed-opal.c:1027:12: error: invalid storage class for function >> 'get_active_key' static int get_active_key(struct opal_dev *dev) ^~ >> lib/sed-opal.c:1067:12: error: invalid storage class for function >>
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Mon, Dec 19, 2016 at 6:52 PM, David Ahern wrote: > On 12/19/16 6:56 PM, Andy Lutomirski wrote: >> On Mon, Dec 19, 2016 at 5:44 PM, David Ahern wrote: >>> On 12/19/16 5:25 PM, Andy Lutomirski wrote: net.socket_create_filter = "none": no filter net.socket_create_filter = "bpf:baadf00d": bpf filter net.socket_create_filter = "disallow": no sockets created period net.socket_create_filter = "iptables:foobar": some iptables thingy net.socket_create_filter = "nft:blahblahblah": some nft thingy net.socket_create_filter = "address_family_list:1,2,3": allow AF 1, 2, and 3 >>> >>> Such a scheme works for the socket create filter b/c it is a very simple >>> use case. It does not work for the ingress and egress which allow generic >>> bpf filters. >> >> Can you elaborate on what goes wrong? (Obviously the >> "address_family_list" example makes no sense in that context.) > > Being able to dump a filter or see that one exists would be a great add-on, > but I don't see how 'net.socket_create_filter = "bpf:baadf00d"' is a viable > API for loading generic BPF filters. Simple cases like "disallow" are easy -- > just return 0 in the filter, no complicated BPF code needed. The rest are > specific cases of the moment which goes against the intent of ebpf and > generic programmability. Oh -- I'm not proposing that at all. Let me clarify. For the bpf case, if you *read* the file, you'd see "bpf:baadf00d". But writing "bpf:baadf00d" is nonsense and would give you -EINVAL. Instead you install a bpf filter by opening the file for write (O_RDWR or O_WRONLY) and doing ioctl(cgroup_socket_create_file_fd, CGROUP_IOCTL_SET_BPF_FILTER, bpf_fd); It's kind of like BPF_PROG_ATTACH except that it respects filesystem permissions, it isn't in the bpf() multiplexer, the filter being set is implied (by the fd in use), and everything is nicely seccompable. To remove the filter, you write "none" or "none\n" to the file. As a future extension, if someone wanted more than one filter to be able to coexist in the cgroup socket_create_filter slot, you could plausibly do 'echo disallow >>net.socket_create_filter' or use a new ioctl CGROUP_IOCTL_APPEND_BPF_FILTER, and then you'd read the file and see more than one line. But this would be a future extension and may never be needed. >> >> a) sub-cgroups cannot have a filter at all of the parent has a filter. >> (This is the "punt" approach -- it lets different semantics be >> assigned later without breaking userspace.) >> >> b) sub-cgroups can have a filter if a parent does, too. The semantics >> are that the sub-cgroup filter runs first and all side-effects occur. >> If that filter says "reject" then ancestor filters are skipped. If >> that filter says "accept", then the ancestor filter is run and its >> side-effects happen as well. (And so on, all the way up to the root.) > > That comes with a big performance hit for skb / data path cases. > > I'm riding my use case on Daniel's work, and as I understand it the nesting > case has been discussed. I'll defer to Daniel and Alexei on this part. I'm not sure I buy the performance hit. If you do it naively, then performance will indeed suck. But there's already a bunch of code in there to pre-populate a filter list for faster use. Currently, we have: struct cgroup_bpf { /* * Store two sets of bpf_prog pointers, one for programs that are * pinned directly to this cgroup, and one for those that are effective * when this cgroup is accessed. */ struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE]; struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE]; }; in struct cgroup, there's a 'struct cgroup_bpf bpf;'. This would change to something like: struct cgroup_filter_slot { struct bpf_prog *effective; struct cgroup_filter_slot *next; struct bpf_prog *local; } local is NULL unless *this* cgroup has a filter. effective points to the bpf_prog that's active in this cgroup or the nearest ancestor that has a filter. next is NULL if there are no filters higher in the chain or points to the next slot that has a filter. struct cgroup has: struct cgroup_filter_slot filters[MAX_BPF_ATTACH_TYPE]; To evaluate it, you do: struct cgroup_filter_slot *slot = &cgroup->slot[the index]; if (!slot->effective) return; do { evaluate(slot->effective); slot = slot->next; } while (unlikely(slot)); The old code was one branch per evaluation. The new code is n+1 branches where n is the number of filters, so it's one extra branch in the worst case. But the extra branch is cache-hot (the variable is right next to slot->effective, which is needed regardless) and highly predictable (in the case where nesting isn't used, the branch is not taken, and it's a backward branch which most CPUs will predict as not taken). I expect that, on x86, this adds at most a cycle or two and quite possibly adds no measurable overhead at all.
Re: [RFC 10/10] kmod: add a sanity check on module loading
"Luis R. Rodriguez" writes: > On Dec 16, 2016 9:54 PM, "Rusty Russell" wrote: > > AFAICT the mistake here is that kmod is returning "done, OK" when the > > module it is trying to load is already loading (but not finished > > loading). That's the root problem; it's an attempt at optimization by > > kmod which goes awry. > > This is true! To be precise though the truth of the matter is that kmod'd > respective usermode helper: modprobe can be buggy and may lie to us. It may > allow request_module() to return 0 but since we don't validate it, any > assumption we make can be deadly. In the case of get_fs_type() its a null > dereference. Wait, what?? I can't see that in get_fs_type, which hasn't changed since 2013. If a caller is assuming get_fs_type() doesn't return NULL, they're broken and need fixing of course: struct file_system_type *get_fs_type(const char *name) { struct file_system_type *fs; const char *dot = strchr(name, '.'); int len = dot ? dot - name : strlen(name); fs = __get_fs_type(name, len); if (!fs && (request_module("fs-%.*s", len, name) == 0)) fs = __get_fs_type(name, len); if (dot && fs && !(fs->fs_flags & FS_HAS_SUBTYPE)) { put_filesystem(fs); fs = NULL; } return fs; } Where does this NULL-deref is the module isn't correctly loaded? > *Iff* we want a sanity check to verify kmod's umh is not lying to us we > need to verify after 0 was returned that it was not lying to us. Since kmod > accepts aliases but find_modules_all() only works on the real module name a > validation check cannot happen when all you have are aliases. request_module() should block until resolution, but that's fundamentally a userspace problem. Let's not paper over it in kernelspace. > *Iff* we are sure we don't want a validation (or another earlier > optimization to avoid calling out to modrobe if the alias requested is > already present, which does the time shaving I mentioned on the tests) then > naturally no request_module() calls returning 0 can assert information > about the requested module. I think we might need to change more code if we > accept we cannot trust request_module() calls, or we accept userspace > telling the kernel something may mean we sometimes crash. This later > predicament seems rather odd so hence the patch. > > Perhaps in some cases validation of work from a umh is not critical in > kernel but for request_module() I can tell you that today get_fs_type code > currently asserts the module found can never be NULL. OK, what am I missing in the code above? > > Looking at the code in the kernel, we *already* get this right: block if > > a module is still loading anyway. Once it succeeds we return -EBUSY if > > > > it fails we'll proceed to try to load it again. > > > > I don't understand what you're trying to fix with adding aliases > > in-kernel? > > Two fold now: > > a) validation on request_module() work when an alias is used But why? > b) since kmod accepts aliaes, if we get aliases support, it means we could > *also* preemptively avoid calling out to userspace for modules already > present. No, because once we have a module we don't request it: requesting is the fallback case. > >> FWIW a few things did occur to me: > >> > >> a) list_add_rcu() is used so new modules get added first > > > > Only after we're sure that there are no duplicates. > > > > > OK! This is a very critical assertion. I should be able to add a debug > WARN_ON() should two modules be on the modules list for the same module > then ? Yes, names must be unique. >> b) find_module_all() returns the last module which was added as it > traverses >>the module list > >> BTW should find_module_all() use rcu to traverse? > > Yes; the kallsyms code does this on Oops. Not really a big issue in > practice, but a nice fix. > > Ok, will bundle into my queue. Please submit to Jessica for her module queue, as it's orthogonal AFAICT. > I will note though that I still think there's a bug in this code -- > upon a failure other "spinning" requests can fail, I believe this may > be due to not having another state or informing pending modules too > early of a failure but I haven't been able to prove this conjecture > yet. That's possible, but I can't see it from quickly re-checking the code. The module should be fully usable at this point; the module's init has been called successfully, so in the case of __get_fs_type() it should now succeed. The module cleans up its init section, but that should be independent. If there is a race, it's likely to be when some other caller wakes the queue. Moving the wakeup as soon as possible should make it easier to trigger: diff --git a/kernel/module.c b/kernel/module.c index f57dd63186e6..78bd89d41a22 100644 --- a/kernel/module.c +++ b/kernel/module.c @@
答复: Re: [kbuild-all] 答复: Re: [PATCH 1/2] ocfs2: add kobject for online file check
Hi Fengguang and all, I will reconstruct my patch set, to make sure each patch can follow this rule. Thanks Gang >>> Fengguang Wu 2016-12-20 上午 9:47 >>> Hi Gang, On Mon, Dec 19, 2016 at 06:43:48PM -0700, Gang He wrote: >Hello Kbuild, >Could you build my whole patch set (2 patch)? I think that the code is OK. We test your whole patch as well as first-N patches, and noticed that the first-1 patch breaks bisectibility: >Note: the >linux-review/Gang-He/ocfs2-add-kobject-for-online-file-check/20161219-181858 >HEAD 6ef9256cd25ef72a5e69490cc3dacde95b8e2ac4 builds fine. > It only hurts bisectibility. Thanks, Fengguang
[PATCH v3 3/3] Bluetooth: btusb: Configure Marvell to use one of the pins for oob wakeup
The Marvell devices may have many gpio pins, and hence for wakeup on these out-of-band pins, the chip needs to be told which pin is to be used for wakeup, using an hci command. Thus, we read the pin number etc from the device tree node and send a command to the chip. Signed-off-by: Rajat Jain Reviewed-by: Brian Norris Acked-by: Rob Herring --- v3: * remove the Marvell specific id table and check * Add reference to marvell-bt-8xxx.txt in btusb.txt * Add "Reviewed-by" and "Acked-by" v2: Fix the binding document to specify to use "wakeup" interrupt-name Documentation/devicetree/bindings/net/btusb.txt| 3 ++ .../{marvell-bt-sd8xxx.txt => marvell-bt-8xxx.txt} | 46 +++ drivers/bluetooth/btusb.c | 51 ++ 3 files changed, 92 insertions(+), 8 deletions(-) rename Documentation/devicetree/bindings/net/{marvell-bt-sd8xxx.txt => marvell-bt-8xxx.txt} (50%) diff --git a/Documentation/devicetree/bindings/net/btusb.txt b/Documentation/devicetree/bindings/net/btusb.txt index 2c0355c..01fa2d4 100644 --- a/Documentation/devicetree/bindings/net/btusb.txt +++ b/Documentation/devicetree/bindings/net/btusb.txt @@ -10,6 +10,9 @@ Required properties: "usb1286,204e" (Marvell 8997) +Also, vendors that use btusb may have device additional properties, e.g: +Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt + Optional properties: - interrupt-parent: phandle of the parent interrupt controller diff --git a/Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt b/Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt similarity index 50% rename from Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt rename to Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt index 6a9a63c..9be1059 100644 --- a/Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt +++ b/Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt @@ -1,16 +1,21 @@ -Marvell 8897/8997 (sd8897/sd8997) bluetooth SDIO devices +Marvell 8897/8997 (sd8897/sd8997) bluetooth devices (SDIO or USB based) -- +The 8997 devices supports multiple interfaces. When used on SDIO interfaces, +the btmrvl driver is used and when used on USB interface, the btusb driver is +used. Required properties: - compatible : should be one of the following: - * "marvell,sd8897-bt" - * "marvell,sd8997-bt" + * "marvell,sd8897-bt" (for SDIO) + * "marvell,sd8997-bt" (for SDIO) + * "usb1286,204e" (for USB) Optional properties: - marvell,cal-data: Calibration data downloaded to the device during initialization. This is an array of 28 values(u8). + This is only applicable to SDIO devices. - marvell,wakeup-pin: It represents wakeup pin number of the bluetooth chip. firmware will use the pin to wakeup host system (u16). @@ -18,10 +23,15 @@ Optional properties: platform. The value will be configured to firmware. This is needed to work chip's sleep feature as expected (u16). - interrupt-parent: phandle of the parent interrupt controller - - interrupts : interrupt pin number to the cpu. Driver will request an irq based -on this interrupt number. During system suspend, the irq will be -enabled so that the bluetooth chip can wakeup host platform under -certain condition. During system resume, the irq will be disabled + - interrupt-names: Used only for USB based devices (See below) + - interrupts : specifies the interrupt pin number to the cpu. For SDIO, the +driver will use the first interrupt specified in the interrupt +array. For USB based devices, the driver will use the interrupt +named "wakeup" from the interrupt-names and interrupt arrays. +The driver will request an irq based on this interrupt number. +During system suspend, the irq will be enabled so that the +bluetooth chip can wakeup host platform under certain +conditions. During system resume, the irq will be disabled to make sure unnecessary interrupt is not received. Example: @@ -29,7 +39,9 @@ Example: IRQ pin 119 is used as system wakeup source interrupt. wakeup pin 13 and gap 100ms are configured so that firmware can wakeup host using this device side pin and wakeup latency. -calibration data is also available in below example. + +Example for SDIO device follows (calibration data is also available in +below example). &mmc3 { status = "okay"; @@ -54,3 +66,21 @@ calibration data is also available in below example. marvell,wakeup-gap-ms = /bits/ 16 <0x64>; }; }; + +Example for USB device: + +&usb_host1_ohci { +status = "okay"; +#address-cells = <1>; +#size-cells = <0>; + +mvl_bt1: bt@1 { + com
[PATCH v3 1/3] Bluetooth: btusb: Use an error label for error paths
Use a label to remove the repetetive cleanup, for error cases. Signed-off-by: Rajat Jain Reviewed-by: Brian Norris --- v3: Added Brian's "Reviewed-by" v2: same as v1 drivers/bluetooth/btusb.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 2f633df..ce22cef 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -2991,18 +2991,15 @@ static int btusb_probe(struct usb_interface *intf, err = usb_set_interface(data->udev, 0, 0); if (err < 0) { BT_ERR("failed to set interface 0, alt 0 %d", err); - hci_free_dev(hdev); - return err; + goto out_free_dev; } } if (data->isoc) { err = usb_driver_claim_interface(&btusb_driver, data->isoc, data); - if (err < 0) { - hci_free_dev(hdev); - return err; - } + if (err < 0) + goto out_free_dev; } #ifdef CONFIG_BT_HCIBTUSB_BCM @@ -3016,14 +3013,16 @@ static int btusb_probe(struct usb_interface *intf, #endif err = hci_register_dev(hdev); - if (err < 0) { - hci_free_dev(hdev); - return err; - } + if (err < 0) + goto out_free_dev; usb_set_intfdata(intf, data); return 0; + +out_free_dev: + hci_free_dev(hdev); + return err; } static void btusb_disconnect(struct usb_interface *intf) -- 2.8.0.rc3.226.g39d4020
[PATCH v3 2/3] Bluetooth: btusb: Add out-of-band wakeup support
Some onboard BT chips (e.g. Marvell 8997) contain a wakeup pin that can be connected to a gpio on the CPU side, and can be used to wakeup the host out-of-band. This can be useful in situations where the in-band wakeup is not possible or not preferable (e.g. the in-band wakeup may require the USB host controller to remain active, and hence consuming more system power during system sleep). The oob gpio interrupt to be used for wakeup on the CPU side, is read from the device tree node, (using standard interrupt descriptors). A devcie tree binding document is also added for the driver. The compatible string is in compliance with Documentation/devicetree/bindings/usb/usb-device.txt Signed-off-by: Rajat Jain Reviewed-by: Brian Norris --- v3: Add Brian's "Reviewed-by" v2: * Use interrupt-names ("wakeup") instead of assuming first interrupt. * Leave it on device tree to specify IRQ flags (level /edge triggered) * Mark the device as non wakeable on exit. Documentation/devicetree/bindings/net/btusb.txt | 40 drivers/bluetooth/btusb.c | 84 + 2 files changed, 124 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/btusb.txt diff --git a/Documentation/devicetree/bindings/net/btusb.txt b/Documentation/devicetree/bindings/net/btusb.txt new file mode 100644 index 000..2c0355c --- /dev/null +++ b/Documentation/devicetree/bindings/net/btusb.txt @@ -0,0 +1,40 @@ +Generic Bluetooth controller over USB (btusb driver) +--- + +Required properties: + + - compatible : should comply with the format "usbVID,PID" specified in +Documentation/devicetree/bindings/usb/usb-device.txt +At the time of writing, the only OF supported devices +(more may be added later) are: + + "usb1286,204e" (Marvell 8997) + +Optional properties: + + - interrupt-parent: phandle of the parent interrupt controller + - interrupt-names: (see below) + - interrupts : The interrupt specified by the name "wakeup" is the interrupt +that shall be used for out-of-band wake-on-bt. Driver will +request this interrupt for wakeup. During system suspend, the +irq will be enabled so that the bluetooth chip can wakeup host +platform out of band. During system resume, the irq will be +disabled to make sure unnecessary interrupt is not received. + +Example: + +Following example uses irq pin number 3 of gpio0 for out of band wake-on-bt: + +&usb_host1_ehci { +status = "okay"; +#address-cells = <1>; +#size-cells = <0>; + +mvl_bt1: bt@1 { + compatible = "usb1286,204e"; + reg = <1>; + interrupt-parent = <&gpio0>; + interrupt-name = "wakeup"; + interrupts = <3 IRQ_TYPE_LEVEL_LOW>; +}; +}; diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index ce22cef..beca4e9 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -24,6 +24,8 @@ #include #include #include +#include +#include #include #include @@ -369,6 +371,7 @@ static const struct usb_device_id blacklist_table[] = { #define BTUSB_BOOTING 9 #define BTUSB_RESET_RESUME 10 #define BTUSB_DIAG_RUNNING 11 +#define BTUSB_OOB_WAKE_DISABLED12 struct btusb_data { struct hci_dev *hdev; @@ -416,6 +419,8 @@ struct btusb_data { int (*recv_bulk)(struct btusb_data *data, void *buffer, int count); int (*setup_on_usb)(struct hci_dev *hdev); + + int oob_wake_irq; /* irq for out-of-band wake-on-bt */ }; static inline void btusb_free_frags(struct btusb_data *data) @@ -2728,6 +2733,65 @@ static int btusb_bcm_set_diag(struct hci_dev *hdev, bool enable) } #endif +#ifdef CONFIG_PM +static irqreturn_t btusb_oob_wake_handler(int irq, void *priv) +{ + struct btusb_data *data = priv; + + /* Disable only if not already disabled (keep it balanced) */ + if (!test_and_set_bit(BTUSB_OOB_WAKE_DISABLED, &data->flags)) { + disable_irq_nosync(irq); + disable_irq_wake(irq); + } + pm_wakeup_event(&data->udev->dev, 0); + return IRQ_HANDLED; +} + +static const struct of_device_id btusb_match_table[] = { + { .compatible = "usb1286,204e" }, + { } +}; +MODULE_DEVICE_TABLE(of, btusb_match_table); + +/* Use an oob wakeup pin? */ +static int btusb_config_oob_wake(struct hci_dev *hdev) +{ + struct btusb_data *data = hci_get_drvdata(hdev); + struct device *dev = &data->udev->dev; + int irq, ret; + + if (!of_match_device(btusb_match_table, dev)) + return 0; + + /* Move on if no IRQ specified */ + irq = of_irq_get_byname(dev->of_node, "wakeup"); + if (irq <= 0) { + bt_dev_dbg(hdev, "%s: no OOB Wakeup IRQ in DT", __func__); + return 0; + } + + set_bit