[tip:perf/urgent] uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation

2016-12-19 Thread tip-bot for Marcin Nowakowski
Commit-ID:  297e765e390a2ac996000b5f7228cbd84d995174
Gitweb: http://git.kernel.org/tip/297e765e390a2ac996000b5f7228cbd84d995174
Author: Marcin Nowakowski 
AuthorDate: Tue, 13 Dec 2016 11:40:57 +0100
Committer:  Ingo Molnar 
CommitDate: Sun, 18 Dec 2016 09:42:11 +0100

uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint 
creation

Commit:

  72e6ae285a1d ('ARM: 8043/1: uprobes need icache flush after xol write'

... has introduced an arch-specific method to ensure all caches are
flushed appropriately after an instruction is written to an XOL page.

However, when the XOL area is created and the out-of-line breakpoint
instruction is copied, caches are not flushed at all and stale data may
be found in icache.

Replace a simple copy_to_page() with arch_uprobe_copy_ixol() to allow
the arch to ensure all caches are updated accordingly.

This change fixes uprobes on MIPS InterAptiv (tested on Creator Ci40).

Signed-off-by: Marcin Nowakowski 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Victor Kamensky 
Cc: linux-m...@linux-mips.org
Link: 
http://lkml.kernel.org/r/1481625657-22850-1-git-send-email-marcin.nowakow...@imgtec.com
Signed-off-by: Ingo Molnar 
---
 kernel/events/uprobes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index f9ec9ad..b5916b4 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1194,7 +1194,7 @@ static struct xol_area *__create_xol_area(unsigned long 
vaddr)
/* Reserve the 1st slot for get_trampoline_vaddr() */
set_bit(0, area->bitmap);
atomic_set(&area->slot_count, 1);
-   copy_to_page(area->pages[0], 0, &insn, UPROBE_SWBP_INSN_SIZE);
+   arch_uprobe_copy_ixol(area->pages[0], 0, &insn, UPROBE_SWBP_INSN_SIZE);
 
if (!xol_add_vma(mm, area))
return area;


Re: [PATCH v4 1/5] pinctrl: aspeed: dt: Fix compatibles for the System Control Unit

2016-12-19 Thread Joel Stanley
On Tue, Dec 20, 2016 at 6:05 PM, Andrew Jeffery  wrote:
> Reference the SoC-specific compatible string in the examples as
> required.
>
> Signed-off-by: Andrew Jeffery 

Acked-by: Joel Stanley 


> ---
>  Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)


Re: [GIT PULL] MFD for v4.10

2016-12-19 Thread Lee Jones
On Mon, 19 Dec 2016, Linus Torvalds wrote:

> On Mon, Dec 19, 2016 at 3:47 AM, Lee Jones  wrote:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git HEAD
> 
> Nothing there.
> 
> I'm assuming you meant the "for-linus-4.10" tag.

Yes.  Brain malfunction trying to get the PR out too quickly...

More haste, less speed!

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog


[PATCH v4 3/5] pinctrl: aspeed-g4: Add mux configuration for all pins

2016-12-19 Thread Andrew Jeffery
The patch introducing the g4 pinctrl driver implemented a smattering of
pins to flesh out the implementation of the core and provide bare-bones
support for some OpenPOWER platforms. Now, update the bindings document
to reflect the complete functionality and implement the necessary pin
configuration tables in the driver.

Cc: Timothy Pearson 
Signed-off-by: Andrew Jeffery 
Acked-by: Joel Stanley 
Acked-by: Rob Herring 
---
 .../devicetree/bindings/pinctrl/pinctrl-aspeed.txt |   19 +-
 drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c | 1097 +++-
 2 files changed, 1096 insertions(+), 20 deletions(-)

diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt 
b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt
index fb7694ec033d..a645d0be3347 100644
--- a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt
+++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt
@@ -44,13 +44,18 @@ supported:
 
 aspeed,ast2400-pinctrl, aspeed,g4-pinctrl:
 
-ACPI BMCINT DDCCLK DDCDAT FLACK FLBUSY FLWP GPID0 GPIE0 GPIE2 GPIE4 GPIE6 I2C10
-I2C11 I2C12 I2C13 I2C3 I2C4 I2C5 I2C6 I2C7 I2C8 I2C9 LPCPD LPCPME LPCSMI MDIO1
-MDIO2 NCTS1 NCTS3 NCTS4 NDCD1 NDCD3 NDCD4 NDSR1 NDSR3 NDTR1 NDTR3 NRI1 NRI3
-NRI4 NRTS1 NRTS3 PWM0 PWM1 PWM2 PWM3 PWM4 PWM5 PWM6 PWM7 RGMII1 RMII1 ROM16
-ROM8 ROMCS1 ROMCS2 ROMCS3 ROMCS4 RXD1 RXD3 RXD4 SD1 SGPMI SIOPBI SIOPBO TIMER3
-TIMER5 TIMER6 TIMER7 TIMER8 TXD1 TXD3 TXD4 UART6 VGAHS VGAVS VPI18 VPI24 VPI30
-VPO12 VPO24
+ACPI ADC0 ADC1 ADC10 ADC11 ADC12 ADC13 ADC14 ADC15 ADC2 ADC3 ADC4 ADC5 ADC6
+ADC7 ADC8 ADC9 BMCINT DDCCLK DDCDAT EXTRST FLACK FLBUSY FLWP GPID GPID0 GPID2
+GPID4 GPID6 GPIE0 GPIE2 GPIE4 GPIE6 I2C10 I2C11 I2C12 I2C13 I2C14 I2C3 I2C4
+I2C5 I2C6 I2C7 I2C8 I2C9 LPCPD LPCPME LPCRST LPCSMI MAC1LINK MAC2LINK MDIO1
+MDIO2 NCTS1 NCTS2 NCTS3 NCTS4 NDCD1 NDCD2 NDCD3 NDCD4 NDSR1 NDSR2 NDSR3 NDSR4
+NDTR1 NDTR2 NDTR3 NDTR4 NDTS4 NRI1 NRI2 NRI3 NRI4 NRTS1 NRTS2 NRTS3 OSCCLK PWM0
+PWM1 PWM2 PWM3 PWM4 PWM5 PWM6 PWM7 RGMII1 RGMII2 RMII1 RMII2 ROM16 ROM8 ROMCS1
+ROMCS2 ROMCS3 ROMCS4 RXD1 RXD2 RXD3 RXD4 SALT1 SALT2 SALT3 SALT4 SD1 SD2 SGPMCK
+SGPMI SGPMLD SGPMO SGPSCK SGPSI0 SGPSI1 SGPSLD SIOONCTRL SIOPBI SIOPBO SIOPWREQ
+SIOPWRGD SIOS3 SIOS5 SIOSCI SPI1 SPI1DEBUG SPI1PASSTHRU SPICS1 TIMER3 TIMER4
+TIMER5 TIMER6 TIMER7 TIMER8 TXD1 TXD2 TXD3 TXD4 UART6 USBCKI VGABIOS_ROM VGAHS
+VGAVS VPI18 VPI24 VPI30 VPO12 VPO24 WDTRST1 WDTRST2
 
 aspeed,ast2500-pinctrl, aspeed,g5-pinctrl:
 
diff --git a/drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c 
b/drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c
index 558bd102416c..09b668415c56 100644
--- a/drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c
+++ b/drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c
@@ -43,9 +43,18 @@
  * Not all pins have their signals defined (yet).
  */
 
+#define D6 0
+SSSF_PIN_DECL(D6, GPIOA0, MAC1LINK, SIG_DESC_SET(SCU80, 0));
+
+#define B5 1
+SSSF_PIN_DECL(B5, GPIOA1, MAC2LINK, SIG_DESC_SET(SCU80, 1));
+
 #define A4 2
 SSSF_PIN_DECL(A4, GPIOA2, TIMER3, SIG_DESC_SET(SCU80, 2));
 
+#define E6 3
+SSSF_PIN_DECL(E6, GPIOA3, TIMER4, SIG_DESC_SET(SCU80, 3));
+
 #define I2C9_DESC  SIG_DESC_SET(SCU90, 22)
 
 #define C5 4
@@ -80,6 +89,26 @@ MS_PIN_DECL(D5, GPIOA7, MDIO2, TIMER8);
 FUNC_GROUP_DECL(TIMER8, D5);
 FUNC_GROUP_DECL(MDIO2, A3, D5);
 
+#define J21 8
+SSSF_PIN_DECL(J21, GPIOB0, SALT1, SIG_DESC_SET(SCU80, 8));
+
+#define J20 9
+SSSF_PIN_DECL(J20, GPIOB1, SALT2, SIG_DESC_SET(SCU80, 9));
+
+#define H18 10
+SSSF_PIN_DECL(H18, GPIOB2, SALT3, SIG_DESC_SET(SCU80, 10));
+
+#define F18 11
+SSSF_PIN_DECL(F18, GPIOB3, SALT4, SIG_DESC_SET(SCU80, 11));
+
+#define E19 12
+SIG_EXPR_DECL(LPCRST, LPCRST, SIG_DESC_SET(SCU80, 12));
+SIG_EXPR_DECL(LPCRST, LPCRSTS, SIG_DESC_SET(HW_STRAP1, 14));
+SIG_EXPR_LIST_DECL_DUAL(LPCRST, LPCRST, LPCRSTS);
+SS_PIN_DECL(E19, GPIOB4, LPCRST);
+
+FUNC_GROUP_DECL(LPCRST, E19);
+
 #define H19 13
 #define H19_DESCSIG_DESC_SET(SCU80, 13)
 SIG_EXPR_LIST_DECL_SINGLE(LPCPD, LPCPD, H19_DESC);
@@ -92,6 +121,19 @@ FUNC_GROUP_DECL(LPCSMI, H19);
 #define H20 14
 SSSF_PIN_DECL(H20, GPIOB6, LPCPME, SIG_DESC_SET(SCU80, 14));
 
+#define E18 15
+SIG_EXPR_LIST_DECL_SINGLE(EXTRST, EXTRST,
+   SIG_DESC_SET(SCU80, 15),
+   SIG_DESC_BIT(SCU90, 31, 0),
+   SIG_DESC_SET(SCU3C, 3));
+SIG_EXPR_LIST_DECL_SINGLE(SPICS1, SPICS1,
+   SIG_DESC_SET(SCU80, 15),
+   SIG_DESC_SET(SCU90, 31));
+MS_PIN_DECL(E18, GPIOB7, EXTRST, SPICS1);
+
+FUNC_GROUP_DECL(EXTRST, E18);
+FUNC_GROUP_DECL(SPICS1, E18);
+
 #define SD1_DESC   SIG_DESC_SET(SCU90, 0)
 #define I2C10_DESC SIG_DESC_SET(SCU90, 23)
 
@@ -170,6 +212,62 @@ MS_PIN_DECL(D16, GPIOD1, SD2CMD, GPID0OUT);
 
 FUNC_GROUP_DECL(GPID0, A18, D16);
 
+#define GPID2_DESC SIG_DESC_SET(SCU8C, 9)
+
+#define B17 26
+SIG_EXPR_LIST_DECL_SINGLE(SD2DAT0, SD2, SD2_DESC);
+SIG_EXPR_DECL(GPID2IN, GPID2, GPID2_DESC);
+SIG_EXPR_DECL(GPID2IN, GPID, GPID_DESC);
+SIG_EXPR_LIST_DECL_DUAL(GPID2IN, GPID2, GPID);
+MS_PIN_DECL(B17, GPIOD2, 

[PATCH v4 2/5] pinctrl: aspeed: Read and write bits in LPC and GFX controllers

2016-12-19 Thread Andrew Jeffery
The System Control Unit IP block in the Aspeed SoCs is typically where
the pinmux configuration is found, but not always. A number of pins
depend on state in one of LPC Host Control (LHC) or SoC Display
Controller (GFX) IP blocks, so the Aspeed pinmux drivers should have the
means to adjust these as necessary.

We use syscon to cast a regmap over the GFX and LPC blocks, which is
used as an arbitration layer between the relevant driver and the pinctrl
subsystem. The regmaps are then exposed to the SoC-specific pinctrl
drivers by phandles in the devicetree, and are selected during a mux
request by querying a new 'ip' member in struct aspeed_sig_desc.

Signed-off-by: Andrew Jeffery 
Reviewed-by: Joel Stanley 
---

Joel: I kept your r-b tag here despite reworking the g5 example bindings, as
you've given your r-b for the lpc bindings which are what I have added.

 .../devicetree/bindings/pinctrl/pinctrl-aspeed.txt |  80 --
 drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c |  18 +--
 drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c |  48 --
 drivers/pinctrl/aspeed/pinctrl-aspeed.c| 161 +
 drivers/pinctrl/aspeed/pinctrl-aspeed.h|  32 ++--
 5 files changed, 242 insertions(+), 97 deletions(-)

diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt 
b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt
index b2efb73337c6..fb7694ec033d 100644
--- a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt
+++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt
@@ -1,15 +1,23 @@
+==
 Aspeed Pin Controllers
---
+==
 
 The Aspeed SoCs vary in functionality inside a generation but have a common mux
 device register layout.
 
-Required properties:
-- compatible : Should be any one of the following:
-   "aspeed,ast2400-pinctrl"
-   "aspeed,g4-pinctrl"
-   "aspeed,ast2500-pinctrl"
-   "aspeed,g5-pinctrl"
+Required properties for g4:
+- compatible : Should be one of the following:
+   "aspeed,ast2400-pinctrl"
+   "aspeed,g4-pinctrl"
+
+Required properties for g5:
+- compatible : Should be one of the following:
+   "aspeed,ast2500-pinctrl"
+   "aspeed,g5-pinctrl"
+
+- aspeed,external-nodes:   A cell of phandles to external controller nodes:
+   0: compatible with "aspeed,ast2500-gfx", 
"syscon"
+   1: compatible with "aspeed,ast2500-lhc", 
"syscon"
 
 The pin controller node should be the child of a syscon node with the required
 property:
@@ -24,7 +32,7 @@ Refer to the the bindings described in
 Documentation/devicetree/bindings/mfd/syscon.txt
 
 Subnode Format
---
+==
 
 The required properties of child nodes are (as defined in pinctrl-bindings):
 - function
@@ -51,8 +59,11 @@ I2C9 MAC1LINK MDIO1 MDIO2 OSCCLK PEWAKE PWM0 PWM1 PWM2 PWM3 
PWM4 PWM5 PWM6 PWM7
 RGMII1 RGMII2 RMII1 RMII2 SD1 SPI1 SPI1DEBUG SPI1PASSTHRU TIMER4 TIMER5 TIMER6
 TIMER7 TIMER8 VGABIOSROM
 
+Examples
+
 
-Examples:
+g4 Example
+--
 
 syscon: scu@1e6e2000 {
compatible = "aspeed,ast2400-scu", "syscon", "simple-mfd";
@@ -68,5 +79,56 @@ syscon: scu@1e6e2000 {
};
 };
 
+g5 Example
+--
+
+ahb {
+   apb {
+   syscon: scu@1e6e2000 {
+   compatible = "aspeed,ast2500-scu", "syscon", 
"simple-mfd";
+   reg = <0x1e6e2000 0x1a8>;
+
+   pinctrl: pinctrl {
+   compatible = "aspeed,g5-pinctrl";
+   aspeed,external-nodes = <&gfx &lhc>;
+
+   pinctrl_i2c3_default: i2c3_default {
+   function = "I2C3";
+   groups = "I2C3";
+   };
+   };
+   };
+
+   gfx: display@1e6e6000 {
+   compatible = "aspeed,ast2500-gfx", "syscon";
+   reg = <0x1e6e6000 0x1000>;
+   };
+   };
+
+   lpc: lpc@1e789000 {
+   compatible = "aspeed,ast2500-lpc", "simple-mfd";
+   reg = <0x1e789000 0x1000>;
+
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0x0 0x1e789000 0x1000>;
+
+   lpc_host: lpc-host@80 {
+   compatible = "aspeed,ast2500-lpc-host", "simple-mfd", 
"syscon";
+   reg = <0x80 0x1e0>;
+   reg-io-width = <4>;
+
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0x0 0x80 0x1e0>;
+
+   lhc: lhc@20 {
+

[PATCH v4 4/5] pinctrl: aspeed-g5: Add mux configuration for all pins

2016-12-19 Thread Andrew Jeffery
The patch introducing the g5 pinctrl driver implemented a smattering of
pins to flesh out the implementation of the core and provide bare-bones
support for some OpenPOWER platforms and the AST2500 evaluation board.
Now, update the bindings document to reflect the complete functionality
and implement the necessary pin configuration tables in the driver.

Signed-off-by: Andrew Jeffery 
Acked-by: Joel Stanley 
Acked-by: Rob Herring 
---
 .../devicetree/bindings/pinctrl/pinctrl-aspeed.txt |   17 +-
 drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c | 1476 +++-
 drivers/pinctrl/aspeed/pinctrl-aspeed.h|1 +
 3 files changed, 1487 insertions(+), 7 deletions(-)

diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt 
b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt
index a645d0be3347..b98e6f030da8 100644
--- a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt
+++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt
@@ -59,10 +59,19 @@ VGAVS VPI18 VPI24 VPI30 VPO12 VPO24 WDTRST1 WDTRST2
 
 aspeed,ast2500-pinctrl, aspeed,g5-pinctrl:
 
-GPID0 GPID2 GPIE0 I2C10 I2C11 I2C12 I2C13 I2C14 I2C3 I2C4 I2C5 I2C6 I2C7 I2C8
-I2C9 MAC1LINK MDIO1 MDIO2 OSCCLK PEWAKE PWM0 PWM1 PWM2 PWM3 PWM4 PWM5 PWM6 PWM7
-RGMII1 RGMII2 RMII1 RMII2 SD1 SPI1 SPI1DEBUG SPI1PASSTHRU TIMER4 TIMER5 TIMER6
-TIMER7 TIMER8 VGABIOSROM
+ACPI ADC0 ADC1 ADC10 ADC11 ADC12 ADC13 ADC14 ADC15 ADC2 ADC3 ADC4 ADC5 ADC6
+ADC7 ADC8 ADC9 BMCINT DDCCLK DDCDAT ESPI FWSPICS1 FWSPICS2 GPID0 GPID2 GPID4
+GPID6 GPIE0 GPIE2 GPIE4 GPIE6 I2C10 I2C11 I2C12 I2C13 I2C14 I2C3 I2C4 I2C5 I2C6
+I2C7 I2C8 I2C9 LAD0 LAD1 LAD2 LAD3 LCLK LFRAME LPCHC LPCPD LPCPLUS LPCPME
+LPCRST LPCSMI LSIRQ MAC1LINK MAC2LINK MDIO1 MDIO2 NCTS1 NCTS2 NCTS3 NCTS4 NDCD1
+NDCD2 NDCD3 NDCD4 NDSR1 NDSR2 NDSR3 NDSR4 NDTR1 NDTR2 NDTR3 NDTR4 NRI1 NRI2
+NRI3 NRI4 NRTS1 NRTS2 NRTS3 NRTS4 OSCCLK PEWAKE PNOR PWM0 PWM1 PWM2 PWM3 PWM4
+PWM5 PWM6 PWM7 RGMII1 RGMII2 RMII1 RMII2 RXD1 RXD2 RXD3 RXD4 SALT1 SALT10
+SALT11 SALT12 SALT13 SALT14 SALT2 SALT3 SALT4 SALT5 SALT6 SALT7 SALT8 SALT9
+SCL1 SCL2 SD1 SD2 SDA1 SDA2 SGPS1 SGPS2 SIOONCTRL SIOPBI SIOPBO SIOPWREQ
+SIOPWRGD SIOS3 SIOS5 SIOSCI SPI1 SPI1CS1 SPI1DEBUG SPI1PASSTHRU SPI2CK SPI2CS0
+SPI2CS1 SPI2MISO SPI2MOSI TIMER3 TIMER4 TIMER5 TIMER6 TIMER7 TIMER8 TXD1 TXD2
+TXD3 TXD4 UART6 USBCKI VGABIOSROM VGAHS VGAVS VPI24 VPO WDTRST1 WDTRST2
 
 Examples
 
diff --git a/drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c 
b/drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c
index c5c9a1b6fa1c..43221a3c7e23 100644
--- a/drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c
+++ b/drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c
@@ -25,14 +25,28 @@
 #include "../pinctrl-utils.h"
 #include "pinctrl-aspeed.h"
 
-#define ASPEED_G5_NR_PINS 228
+#define ASPEED_G5_NR_PINS 232
 
 #define COND1  { ASPEED_IP_SCU, SCU90, BIT(6), 0, 0 }
 #define COND2  { ASPEED_IP_SCU, SCU94, GENMASK(1, 0), 0, 0 }
 
+/* LHCR0 is offset from the end of the H8S/2168-compatible registers */
+#define LHCR0  0x20
+#define GFX064 0x64
+
 #define B14 0
 SSSF_PIN_DECL(B14, GPIOA0, MAC1LINK, SIG_DESC_SET(SCU80, 0));
 
+#define D14 1
+SSSF_PIN_DECL(D14, GPIOA1, MAC2LINK, SIG_DESC_SET(SCU80, 1));
+
+#define D13 2
+SIG_EXPR_LIST_DECL_SINGLE(SPI1CS1, SPI1CS1, SIG_DESC_SET(SCU80, 15));
+SIG_EXPR_LIST_DECL_SINGLE(TIMER3, TIMER3, SIG_DESC_SET(SCU80, 2));
+MS_PIN_DECL(D13, GPIOA2, SPI1CS1, TIMER3);
+FUNC_GROUP_DECL(SPI1CS1, D13);
+FUNC_GROUP_DECL(TIMER3, D13);
+
 #define E13 3
 SSSF_PIN_DECL(E13, GPIOA3, TIMER4, SIG_DESC_SET(SCU80, 3));
 
@@ -72,6 +86,32 @@ FUNC_GROUP_DECL(TIMER8, B13);
 
 FUNC_GROUP_DECL(MDIO2, C13, B13);
 
+#define K19 8
+GPIO_PIN_DECL(K19, GPIOB0);
+
+#define L19 9
+GPIO_PIN_DECL(L19, GPIOB1);
+
+#define L18 10
+GPIO_PIN_DECL(L18, GPIOB2);
+
+#define K18 11
+GPIO_PIN_DECL(K18, GPIOB3);
+
+#define J20 12
+SSSF_PIN_DECL(J20, GPIOB4, USBCKI, SIG_DESC_SET(HW_STRAP1, 23));
+
+#define H21 13
+#define H21_DESC   SIG_DESC_SET(SCU80, 13)
+SIG_EXPR_LIST_DECL_SINGLE(LPCPD, LPCPD, H21_DESC);
+SIG_EXPR_LIST_DECL_SINGLE(LPCSMI, LPCSMI, H21_DESC);
+MS_PIN_DECL(H21, GPIOB5, LPCPD, LPCSMI);
+FUNC_GROUP_DECL(LPCPD, H21);
+FUNC_GROUP_DECL(LPCSMI, H21);
+
+#define H22 14
+SSSF_PIN_DECL(H22, GPIOB6, LPCPME, SIG_DESC_SET(SCU80, 14));
+
 #define H20 15
 GPIO_PIN_DECL(H20, GPIOB7);
 
@@ -168,7 +208,44 @@ MS_PIN_DECL(D20, GPIOD3, SD2DAT1, GPID2OUT);
 
 FUNC_GROUP_DECL(GPID2, F20, D20);
 
-#define GPIE_DESC  SIG_DESC_SET(HW_STRAP1, 21)
+#define GPID4_DESC  SIG_DESC_SET(SCU8C, 10)
+
+#define D21 28
+SIG_EXPR_LIST_DECL_SINGLE(SD2DAT2, SD2, SD2_DESC);
+SIG_EXPR_DECL(GPID4IN, GPID4, GPID4_DESC);
+SIG_EXPR_DECL(GPID4IN, GPID, GPID_DESC);
+SIG_EXPR_LIST_DECL_DUAL(GPID4IN, GPID4, GPID);
+MS_PIN_DECL(D21, GPIOD4, SD2DAT2, GPID4IN);
+
+#define E20 29
+SIG_EXPR_LIST_DECL_SINGLE(SD2DAT3, SD2, SD2_DESC);
+SIG_EXPR_DECL(GPID4OUT, GPID4, GPID4_DESC);
+SIG_EXPR_DECL(GPID4OUT, GPID, GPID_DESC);
+SIG_EXPR_LIST_DECL_DUAL(GPID4OUT, GPID4, GPID);
+

[PATCH v4 5/5] pinctrl: aspeed: Fix kerneldoc return descriptions

2016-12-19 Thread Andrew Jeffery
Signed-off-by: Andrew Jeffery 
Acked-by: Joel Stanley 
---
 drivers/pinctrl/aspeed/pinctrl-aspeed.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pinctrl/aspeed/pinctrl-aspeed.c 
b/drivers/pinctrl/aspeed/pinctrl-aspeed.c
index 782c5c97f853..76f62bd45f02 100644
--- a/drivers/pinctrl/aspeed/pinctrl-aspeed.c
+++ b/drivers/pinctrl/aspeed/pinctrl-aspeed.c
@@ -97,7 +97,7 @@ static inline void aspeed_sig_desc_print_val(
  * @enabled: True to query the enabled state, false to query disabled state
  * @regmap: The IP block's regmap instance
  *
- * @return 1 if the descriptor's bitfield is configured to the state
+ * Return: 1 if the descriptor's bitfield is configured to the state
  * selected by @enabled, 0 if not, and less than zero if an unrecoverable
  * failure occurred
  *
@@ -134,7 +134,7 @@ static int aspeed_sig_desc_eval(const struct 
aspeed_sig_desc *desc,
  * @enabled: True to query the enabled state, false to query disabled state
  * @maps: The list of regmap instances
  *
- * @return 1 if the expression composed by @enabled evaluates true, 0 if not,
+ * Return: 1 if the expression composed by @enabled evaluates true, 0 if not,
  * and less than zero if an unrecoverable failure occurred.
  *
  * A mux function is enabled or disabled if the function's signal expression
@@ -175,7 +175,7 @@ static int aspeed_sig_expr_eval(const struct 
aspeed_sig_expr *expr,
  *  expression, false to disable the function's signal
  * @maps: The list of regmap instances for pinmux register access.
  *
- * @return 0 if the expression is configured as requested and a negative error
+ * Return: 0 if the expression is configured as requested and a negative error
  * code otherwise
  */
 static int aspeed_sig_expr_set(const struct aspeed_sig_expr *expr,
@@ -256,7 +256,7 @@ static int aspeed_sig_expr_disable(const struct 
aspeed_sig_expr *expr,
  * @exprs: The list of signal expressions (from a priority level on a pin)
  * @maps: The list of regmap instances for pinmux register access.
  *
- * @return 0 if all expressions are disabled, otherwise a negative error code
+ * Return: 0 if all expressions are disabled, otherwise a negative error code
  */
 static int aspeed_disable_sig(const struct aspeed_sig_expr **exprs,
   struct regmap * const *maps)
@@ -281,8 +281,8 @@ static int aspeed_disable_sig(const struct aspeed_sig_expr 
**exprs,
  * @exprs: List of signal expressions (haystack)
  * @name: The name of the requested function (needle)
  *
- * @return A pointer to the signal expression whose function tag matches the
- * provided name, otherwise NULL.
+ * Return: A pointer to the signal expression whose function tag matches the
+ * provided name, otherwise NULL.
  *
  */
 static const struct aspeed_sig_expr *aspeed_find_expr_by_name(
-- 
2.9.3



[PATCH v4 0/5] pinctrl: aspeed: Implement remaining pins

2016-12-19 Thread Andrew Jeffery
Hi Linus,

This is v4 of the series implementing the remainder of the pinmux tables for
the AST2400 and AST2500 SoCs. v3 of the series can be found here:

https://lkml.org/lkml/2016/12/5/847

Cheers,

Andrew

Changes since v3:

* Add a patch fixing the AST2400 SCU compatible strings in the Aspeed pinctrl
  bindings. They are mentioned to define the expectations on the pinctrl node's
  parent and also in the examples to illustrate the relationship.

* Rework the g5 example bindings in patch 2/5 to reflect the LPC/LHC
  bindings[1][2], and fix the SCU compatible string.

[1] https://lkml.org/lkml/2016/12/20/63
[2] https://lkml.org/lkml/2016/12/20/62

Significant changes since v2:

* The fix for touching bit SCU90[6] has been applied, so the patch has been
  dropped.
* The MFD devicetree bindings patches have been split out into their own
  series: https://lkml.org/lkml/2016/12/5/835
* Rework the "Read and write bits in LPC and GFX controllers" patch so that the
  changes are backwards compatible with existing devicetrees. This will lead to
  limited functionality, but no more limited than what systems with those
  devicetrees already experience.
* A fix for the kerneldoc return value descriptions

Significant changes since v1:

* Fixes from v1 have been applied, so have been dropped for v2
* A new fix has appeared, "pinctrl-aspeed-g5: Never set SCU90[6]", as noted
  above
* New bindings documents for the SoC Display and LPC Host Controllers, driven
  by the patch "pinctrl: aspeed: Read and write bits in LPCHC and GFX
  controllers"
* The v1 patch "pinctrl: aspeed: Enable capture of off-SCU pinmux state" has
  been significantly reworked and is now titled "pinctrl: aspeed: Read and
  write bits in LPCHC and GFX controllers"

Andrew Jeffery (5):
  pinctrl: aspeed: dt: Fix compatibles for the System Control Unit
  pinctrl: aspeed: Read and write bits in LPC and GFX controllers
  pinctrl: aspeed-g4: Add mux configuration for all pins
  pinctrl: aspeed-g5: Add mux configuration for all pins
  pinctrl: aspeed: Fix kerneldoc return descriptions

 .../devicetree/bindings/pinctrl/pinctrl-aspeed.txt |  127 +-
 drivers/pinctrl/aspeed/pinctrl-aspeed-g4.c | 1115 +-
 drivers/pinctrl/aspeed/pinctrl-aspeed-g5.c | 1524 +++-
 drivers/pinctrl/aspeed/pinctrl-aspeed.c|  165 ++-
 drivers/pinctrl/aspeed/pinctrl-aspeed.h|   33 +-
 5 files changed, 2835 insertions(+), 129 deletions(-)

-- 
2.9.3



[PATCH v4 1/5] pinctrl: aspeed: dt: Fix compatibles for the System Control Unit

2016-12-19 Thread Andrew Jeffery
Reference the SoC-specific compatible string in the examples as
required.

Signed-off-by: Andrew Jeffery 
---
 Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt 
b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt
index 2ad18c4ea55c..b2efb73337c6 100644
--- a/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt
+++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-aspeed.txt
@@ -11,9 +11,14 @@ Required properties:
"aspeed,ast2500-pinctrl"
"aspeed,g5-pinctrl"
 
-The pin controller node should be a child of a syscon node with the required
+The pin controller node should be the child of a syscon node with the required
 property:
-- compatible: "syscon", "simple-mfd"
+
+- compatible : Should be one of the following:
+   "aspeed,ast2400-scu", "syscon", "simple-mfd"
+   "aspeed,g4-scu", "syscon", "simple-mfd"
+   "aspeed,ast2500-scu", "syscon", "simple-mfd"
+   "aspeed,g5-scu", "syscon", "simple-mfd"
 
 Refer to the the bindings described in
 Documentation/devicetree/bindings/mfd/syscon.txt
@@ -50,7 +55,7 @@ TIMER7 TIMER8 VGABIOSROM
 Examples:
 
 syscon: scu@1e6e2000 {
-   compatible = "syscon", "simple-mfd";
+   compatible = "aspeed,ast2400-scu", "syscon", "simple-mfd";
reg = <0x1e6e2000 0x1a8>;
 
pinctrl: pinctrl {
-- 
2.9.3



Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

2016-12-19 Thread Baolin Wang
Hi,

On 20 December 2016 at 15:18, Lu Baolu  wrote:
> Hi,
>
> On 12/20/2016 02:46 PM, Baolin Wang wrote:
>> On 20 December 2016 at 14:39, Lu Baolu  wrote:
>>> Hi,
>>>
>>> On 12/20/2016 02:06 PM, Baolin Wang wrote:
 Hi,

 On 20 December 2016 at 12:29, Lu Baolu  wrote:
> Hi Mathias,
>
> On 12/19/2016 08:13 PM, Mathias Nyman wrote:
>> On 19.12.2016 13:34, Baolin Wang wrote:
>>> Hi Mathias,
>>>
>>> On 19 December 2016 at 18:33, Mathias Nyman
>>>  wrote:
 On 13.12.2016 05:21, Baolin Wang wrote:
> Hi Mathias,
>
> On 12 December 2016 at 23:52, Mathias Nyman
>  wrote:
>> On 05.12.2016 09:51, Baolin Wang wrote:
>>> If a command event is found on the event ring during an interrupt,
>>> we need to stop the command timer with del_timer(). Since 
>>> del_timer()
>>> can fail if the timer is running and waiting on the xHCI lock, then
>>> it maybe get the wrong timeout command in 
>>> xhci_handle_command_timeout()
>>> if host fetched a new command and updated the xhci->current_cmd in
>>> handle_cmd_completion(). For this situation, we need a way to signal
>>> to the command timer that everything is fine and it should exit.
>>
>> Ah, right, this could actually happen.
>>
>>> We should introduce a counter (xhci->current_cmd_pending) for the 
>>> number
>>> of pending commands. If we need to cancel the command timer and
>>> del_timer()
>>> succeeds, we decrement the number of pending commands. If 
>>> del_timer()
>>> fails,
>>> we leave the number of pending commands alone.
>>>
>>> For handling timeout command, in xhci_handle_command_timeout() we 
>>> will
>>> check
>>> the counter after decrementing it, if the counter
>>> (xhci->current_cmd_pending)
>>> is 0, which means xhci->current_cmd is the right timeout command. 
>>> If the
>>> counter (xhci->current_cmd_pending) is greater than 0, which means
>>> current
>>> timeout command has been handled by host and host has fetched new
>>> command
>>> as
>>> xhci->current_cmd, then just return and wait for new current 
>>> command.
>>
>> A counter like this could work.
>>
>> Writing the abort bit can generate either ABORT+STOP, or just STOP
>> event, this seems to cover both.
>>
>> quick check, case 1: timeout and cmd completion at the same time.
>>
>> cpu1cpu2
>>
>> queue_command(first), p++ (=1)
>> queue_command(more),
>> --completion irq fires---- timer times out at same 
>> time--
>> handle_cmd_completion() handle_cmd_timeout(),)
>> lock(xhci_lock  )   spin_on(xhci_lock)
>> del_timer() fail, p (=1, nochange)
>> cur_cmd = list_next(), p++ (=2)
>> unlock(xhci_lock)
>>   lock(xhci_lock)
>>   p-- (=1)
>>   if (p > 0), exit
>> OK works
>>
>> case 2: normal timeout case with ABORT+STOP, no race.
>>
>> cpu1cpu2
>>
>> queue_command(first), p++ (=1)
>> queue_command(more),
>>   handle_cmd_timeout()
>>   p-- (P=0), don't exit
>>   mod_timer(), p++ (P=1)
>>   write_abort_bit()
>> handle_cmd_comletion(ABORT)
>> del_timer(), ok, p-- (p = 0)
>> handle_cmd_completion(STOP)
>> del_timer(), fail, (P=0)
>> handle_stopped_cmd_ring()
>> cur_cmd = list_next(), p++ (=1)
>> mod_timer()
>>
>> OK, works, and same for just STOP case, with the only difference that
>> during handle_cmd_completion(STOP) p is decremented (p--)
> Yes, that's the cases what I want to handle, thanks for your explicit
> explanation.
>
 Gave this some more thought over the weekend, and this implementation
 doesn't solve the case when the last command times out and races with 
 the
 completion handler:

 cpu1cpu2

 queue_command(first), p++ (=1)
 --completion irq fires---- timer times out at same 
 time--
 handle_cmd_completion() handle_cmd_timeout(),)
 lock(xhci_lock )spin_on(xhci_lock)
 del_timer() fail

Re: [PATCH 2/4] mmc: mmc: Add change to set controller to HS400 frequency in enhanced strobe

2016-12-19 Thread Shawn Lin

On 2016/12/20 13:53, Ritesh Harjani wrote:

Currently mmc_select_hs400es is not setting the desired frequency
before sending switch command. This makes CMD13 to timeout on
some controllers.
Thus add a change to set the desired HS400 frequency
in mmc_select_hs400es when the timing is switched to HS400.


What is the desired HS400 freq? I guess it is 200MHz, right?

Well, per the spec, it just say "host *may* changes frequency to
<= 200Mhz". So before setting timing to MMC_TIMING_MMC_HS400, we
already reach ext_csd.hs_max_dtr. So it should meet the spec. If
the fact that some controllers would see CMD13 timeout here, I guess
you wouldn't let folks add max-frequency to the DT as if the max freq
is set to just lower than your *desired HS400 freq* , it the same
failure result even you add this patch, right?

the root cause the controllers see failure for CMD13 is it didn't
configure the right delay phase or something similar to fit the timing
under the freq of hs_max_dtr, I guess.



Signed-off-by: Ritesh Harjani 
---
 drivers/mmc/core/mmc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index b61b52f9..eb69497 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1329,6 +1329,7 @@ static int mmc_select_hs400es(struct mmc_card *card)

/* Set host controller to HS400 timing and frequency */
mmc_set_timing(host, MMC_TIMING_MMC_HS400);
+   mmc_set_bus_speed(card);

/* Controller enable enhanced strobe function */
host->ios.enhanced_strobe = true;




--
Best Regards
Shawn Lin



Re: [PATCH v3 2/5] lib: Add Sed-opal library

2016-12-19 Thread Christoph Hellwig
> + u8 lr;
> + size_t key_name_len;
> + char key_name[36];

Who is going to use the key_name?  I can't find another reference to
it anywhere else in the code.   The reason why this tripped me off was
the hardcoded length so I was going to check on how access to it is
bounds checked.

> +/**
> + * struct opal_dev - The structure representing a OPAL enabled SED.
> + * @sed_ctx:The SED context, contains fn pointers to sec_send/recv.
> + * @opal_step:A series of opal methods that are necessary to complete a 
> comannd.
> + * @func_data:An array of parameters for the opal methods above.
> + * @state:Describes the current opal_step we're working on.
> + * @dev_lock:Locks the entire opal_dev structure.
> + * @parsed:Parsed response from controller.
> + * @prev_data:Data returned from a method to the controller
> + * @error_cb:Error function that handles closing sessions after a failed 
> method.
> + * @unlk_lst:A list of Locking ranges to unlock on this device during a 
> resume.
> + */

Needs spaces after the colons.

> + u16 comID;
> + u32 HSN;
> + u32 TSN;

Please use lower case variable names in Linux code, and separate
words by underscores if needed.

> +DEFINE_SPINLOCK(list_spinlock);

Should be marked static.

> +#define TPER_SYNC_SUPPORTED BIT(0)

This sounds like a protocol constant and should go into the header
with the rest of the protocol.

> +static bool check_tper(const void *data)
> +{
> + const struct d0_tper_features *tper = data;
> + u8 flags = tper->supported_features;
> +
> + if (!(flags & TPER_SYNC_SUPPORTED)) {
> + pr_err("TPer sync not supported. flags = %d\n",
> +tper->supported_features);

It would be great to use dev_err / dev_warn / etc in the opal code,
although for that you'll need to pass a struct device from the driver
into the code.

> +static bool check_SUM(const void *data)

The comment on member naming above also applies to variables and function
names.

> + bool foundComID = false, supported = true, single_user = false;
> + const struct d0_header *hdr;
> + const u8 *epos, *cpos;
> + u16 comID = 0;
> + int error = 0;
> +
> + epos = dev->cmd.resp;
> + cpos = dev->cmd.resp;
> + hdr = (struct d0_header *)dev->cmd.resp;

You probably want to structure your command buffers to use void pointers
and avoid ths kind of casting.

> +#define TINY_ATOM_DATA_MASK GENMASK(5, 0)
> +#define TINY_ATOM_SIGNED BIT(6)
> +
> +#define SHORT_ATOM_ID BIT(7)
> +#define SHORT_ATOM_BYTESTRING BIT(5)
> +#define SHORT_ATOM_SIGNED BIT(4)
> +#define SHORT_ATOM_LEN_MASK GENMASK(3, 0)

Protocol constants for the header again?

> +#define ADD_TOKEN_STRING(cmd, key, keylen)   \
> + if (!err)   \
> + err = test_and_add_string(cmd, key, keylen);
> +
> +#define ADD_TOKEN(type, cmd, tok)\
> + if (!err)   \
> + err = test_and_add_token_##type(cmd, tok);

Please remove these macros and just open code the calls.  If you want
to avoid writing the if err lines again and again just pass err
by reference to the functions and move the err check into the add_token*
helpers.

> + if ((hdr->cp.length == 0)
> + || (hdr->pkt.length == 0)
> + || (hdr->subpkt.length == 0)) {

No need for the inner braces, also please place the operators at the
end of the previous line instead of the beginning of the new line.

> + while (cpos < total) {
> + if (!(pos[0] & 0x80)) /* tiny atom */
> + token_length = response_parse_tiny(iter, pos);
> + else if (!(pos[0] & 0x40)) /* short atom */
> + token_length = response_parse_short(iter, pos);
> + else if (!(pos[0] & 0x20)) /* medium atom */
> + token_length = response_parse_medium(iter, pos);
> + else if (!(pos[0] & 0x10)) /* long atom */
> + token_length = response_parse_long(iter, pos);

Please add symbolic names for these constants to the protocol header.

> + if (num_entries == 0) {
> + pr_err("Couldn't parse response.\n");
> + return -EINVAL;
> + resp->num = num_entries;

Where is the closing brace for that if?

> + if (!((resp->toks[n].width == OPAL_WIDTH_TINY) ||
> +   (resp->toks[n].width == OPAL_WIDTH_SHORT))) {

No need for the inner braces.

> + pr_err("Atom is not short or tiny: %d\n",
> +resp->toks[n].width);
> + return 0;
> + }
> +
> + return resp->toks[n].stored.u;
> +}
> +
> +static u8 response_status(const struct parsed_resp *resp)
> +{
> + if ((token_type(resp, 0) == OPAL_DTA_TOKENID_TOKEN)
> + && (response_get_token(resp, 0) == OPAL_ENDOFSESSION)) {

Same here, also please fix the operator placement.

> + if ((token_type(resp, resp->num - 1) != OPAL_DTA_TOKENID_

Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

2016-12-19 Thread Lu Baolu
Hi,

On 12/20/2016 02:46 PM, Baolin Wang wrote:
> On 20 December 2016 at 14:39, Lu Baolu  wrote:
>> Hi,
>>
>> On 12/20/2016 02:06 PM, Baolin Wang wrote:
>>> Hi,
>>>
>>> On 20 December 2016 at 12:29, Lu Baolu  wrote:
 Hi Mathias,

 On 12/19/2016 08:13 PM, Mathias Nyman wrote:
> On 19.12.2016 13:34, Baolin Wang wrote:
>> Hi Mathias,
>>
>> On 19 December 2016 at 18:33, Mathias Nyman
>>  wrote:
>>> On 13.12.2016 05:21, Baolin Wang wrote:
 Hi Mathias,

 On 12 December 2016 at 23:52, Mathias Nyman
  wrote:
> On 05.12.2016 09:51, Baolin Wang wrote:
>> If a command event is found on the event ring during an interrupt,
>> we need to stop the command timer with del_timer(). Since del_timer()
>> can fail if the timer is running and waiting on the xHCI lock, then
>> it maybe get the wrong timeout command in 
>> xhci_handle_command_timeout()
>> if host fetched a new command and updated the xhci->current_cmd in
>> handle_cmd_completion(). For this situation, we need a way to signal
>> to the command timer that everything is fine and it should exit.
>
> Ah, right, this could actually happen.
>
>> We should introduce a counter (xhci->current_cmd_pending) for the 
>> number
>> of pending commands. If we need to cancel the command timer and
>> del_timer()
>> succeeds, we decrement the number of pending commands. If del_timer()
>> fails,
>> we leave the number of pending commands alone.
>>
>> For handling timeout command, in xhci_handle_command_timeout() we 
>> will
>> check
>> the counter after decrementing it, if the counter
>> (xhci->current_cmd_pending)
>> is 0, which means xhci->current_cmd is the right timeout command. If 
>> the
>> counter (xhci->current_cmd_pending) is greater than 0, which means
>> current
>> timeout command has been handled by host and host has fetched new
>> command
>> as
>> xhci->current_cmd, then just return and wait for new current command.
>
> A counter like this could work.
>
> Writing the abort bit can generate either ABORT+STOP, or just STOP
> event, this seems to cover both.
>
> quick check, case 1: timeout and cmd completion at the same time.
>
> cpu1cpu2
>
> queue_command(first), p++ (=1)
> queue_command(more),
> --completion irq fires---- timer times out at same 
> time--
> handle_cmd_completion() handle_cmd_timeout(),)
> lock(xhci_lock  )   spin_on(xhci_lock)
> del_timer() fail, p (=1, nochange)
> cur_cmd = list_next(), p++ (=2)
> unlock(xhci_lock)
>   lock(xhci_lock)
>   p-- (=1)
>   if (p > 0), exit
> OK works
>
> case 2: normal timeout case with ABORT+STOP, no race.
>
> cpu1cpu2
>
> queue_command(first), p++ (=1)
> queue_command(more),
>   handle_cmd_timeout()
>   p-- (P=0), don't exit
>   mod_timer(), p++ (P=1)
>   write_abort_bit()
> handle_cmd_comletion(ABORT)
> del_timer(), ok, p-- (p = 0)
> handle_cmd_completion(STOP)
> del_timer(), fail, (P=0)
> handle_stopped_cmd_ring()
> cur_cmd = list_next(), p++ (=1)
> mod_timer()
>
> OK, works, and same for just STOP case, with the only difference that
> during handle_cmd_completion(STOP) p is decremented (p--)
 Yes, that's the cases what I want to handle, thanks for your explicit
 explanation.

>>> Gave this some more thought over the weekend, and this implementation
>>> doesn't solve the case when the last command times out and races with 
>>> the
>>> completion handler:
>>>
>>> cpu1cpu2
>>>
>>> queue_command(first), p++ (=1)
>>> --completion irq fires---- timer times out at same 
>>> time--
>>> handle_cmd_completion() handle_cmd_timeout(),)
>>> lock(xhci_lock )spin_on(xhci_lock)
>>> del_timer() fail, p (=1, nochange)
>>> no more commands, P (=1, nochange)
>>> unlock(xhci_lock)
>>>  lock(xhci_lock)
>>>  p-- (=0

[PATCH v4 5/5] mfd: dt: Add bindings for the Aspeed SoC Display Controller (GFX)

2016-12-19 Thread Andrew Jeffery
The Aspeed SoC Display Controller is presented as a syscon device to
arbitrate access by display and pinmux drivers. Video pinmux
configuration on fifth generation SoCs depends on bits in both the
System Control Unit and the Display Controller.

Signed-off-by: Andrew Jeffery 
Acked-by: Rob Herring 
---
 Documentation/devicetree/bindings/mfd/aspeed-gfx.txt | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/aspeed-gfx.txt

diff --git a/Documentation/devicetree/bindings/mfd/aspeed-gfx.txt 
b/Documentation/devicetree/bindings/mfd/aspeed-gfx.txt
new file mode 100644
index ..aea5370efd97
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/aspeed-gfx.txt
@@ -0,0 +1,17 @@
+* Device tree bindings for Aspeed SoC Display Controller (GFX)
+
+The Aspeed SoC Display Controller primarily does as its name suggests, but also
+participates in pinmux requests on the g5 SoCs. It is therefore considered a
+syscon device.
+
+Required properties:
+- compatible:  "aspeed,ast2500-gfx", "syscon"
+- reg: contains offset/length value of the GFX memory
+   region.
+
+Example:
+
+gfx: display@1e6e6000 {
+   compatible = "aspeed,ast2500-gfx", "syscon";
+   reg = <0x1e6e6000 0x1000>;
+};
-- 
2.9.3



[PATCH v4 4/5] mfd: dt: Add bindings for the Aspeed LPC Host Controller (LHC)

2016-12-19 Thread Andrew Jeffery
The LPC bus pinmux configuration on fifth generation Aspeed SoCs depends
on bits in both the System Control Unit and the LPC Host Controller.

The Aspeed LPC Host Controller is described as a child node of the
LPC host-range syscon device for arbitration of access by the host
controller and pinmux drivers.

Signed-off-by: Andrew Jeffery 
Reviewed-by: Linus Walleij 
---

Linus: I've retained your r-b tag I don't think the addition of the ast2400
compatible string will fuss you. Please let me know if you feel this is
inappropriate.

 .../devicetree/bindings/mfd/aspeed-lpc.txt | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt 
b/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt
index a97131aba446..514d82ced95b 100644
--- a/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt
+++ b/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt
@@ -109,3 +109,29 @@ lpc: lpc@1e789000 {
};
 };
 
+Host Node Children
+==
+
+LPC Host Controller
+---
+
+The Aspeed LPC Host Controller configures the Low Pin Count (LPC) bus behaviour
+between the host and the baseboard management controller. The registers exist
+in the "host" portion of the Aspeed LPC controller, which must be the parent of
+the LPC host controller node.
+
+Required properties:
+
+- compatible:  One of:
+   "aspeed,ast2400-lhc";
+   "aspeed,ast2500-lhc";
+
+- reg: contains offset/length values of the LHC memory regions. In the
+   AST2400 and AST2500 there are two regions.
+
+Example:
+
+lhc: lhc@20 {
+   compatible = "aspeed,ast2500-lhc";
+   reg = <0x20 0x24 0x48 0x8>;
+};
-- 
2.9.3



[PATCH v4 1/5] mfd: dt: Fix "indicates" typo in mfd bindings document

2016-12-19 Thread Andrew Jeffery
Signed-off-by: Andrew Jeffery 
Acked-by: Linus Walleij 
Acked-by: Rob Herring 
---
 Documentation/devicetree/bindings/mfd/mfd.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/mfd/mfd.txt 
b/Documentation/devicetree/bindings/mfd/mfd.txt
index af9d6931a1a2..f1fceeda12f1 100644
--- a/Documentation/devicetree/bindings/mfd/mfd.txt
+++ b/Documentation/devicetree/bindings/mfd/mfd.txt
@@ -19,7 +19,7 @@ Optional properties:
 
 - compatible : "simple-mfd" - this signifies that the operating system should
   consider all subnodes of the MFD device as separate devices akin to how
-  "simple-bus" inidicates when to see subnodes as children for a simple
+  "simple-bus" indicates when to see subnodes as children for a simple
   memory-mapped bus. For more complex devices, when the nexus driver has to
   probe registers to figure out what child devices exist etc, this should not
   be used. In the latter case the child devices will be determined by the
-- 
2.9.3



[PATCH v4 2/5] mfd: dt: ranges, #address-cells and #size-cells as optional properties

2016-12-19 Thread Andrew Jeffery
Whilst describing a device and not a bus, simple-mfd is modelled on
simple-bus where child nodes are iterated and registered as platform
devices. Some complex devices, e.g. the Aspeed LPC controller, can
benefit from address space mapping such that child nodes can use the
regs property to describe their resource offsets within the
multi-function device.

Signed-off-by: Andrew Jeffery 
Acked-by: Rob Herring 
---
 Documentation/devicetree/bindings/mfd/mfd.txt | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/devicetree/bindings/mfd/mfd.txt 
b/Documentation/devicetree/bindings/mfd/mfd.txt
index f1fceeda12f1..bcb6abb9d413 100644
--- a/Documentation/devicetree/bindings/mfd/mfd.txt
+++ b/Documentation/devicetree/bindings/mfd/mfd.txt
@@ -25,6 +25,16 @@ Optional properties:
   be used. In the latter case the child devices will be determined by the
   operating system.
 
+- ranges: Describes the address mapping relationship to the parent. Should set
+  the child's base address to 0, the physical address within parent's address
+  space, and the length of the address map.
+
+- #address-cells: Specifies the number of cells used to represent physical base
+  addresses. Must be present if ranges is used.
+
+- #size-cells: Specifies the number of cells used to represent the size of an
+  address. Must be present if ranges is used.
+
 Example:
 
 foo@1000 {
-- 
2.9.3



[PATCH v4 3/5] mfd: dt: Add Aspeed Low Pin Count Controller bindings

2016-12-19 Thread Andrew Jeffery
Signed-off-by: Andrew Jeffery 
Reviewed-by: Linus Walleij 
Reviewed-by: Joel Stanley 
Acked-by: Rob Herring 
---
 .../devicetree/bindings/mfd/aspeed-lpc.txt | 111 +
 1 file changed, 111 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/aspeed-lpc.txt

diff --git a/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt 
b/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt
new file mode 100644
index ..a97131aba446
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt
@@ -0,0 +1,111 @@
+==
+Device tree bindings for the Aspeed Low Pin Count (LPC) Bus Controller
+==
+
+The LPC bus is a means to bridge a host CPU to a number of low-bandwidth
+peripheral devices, replacing the use of the ISA bus in the age of PCI[0]. The
+primary use case of the Aspeed LPC controller is as a slave on the bus
+(typically in a Baseboard Management Controller SoC), but under certain
+conditions it can also take the role of bus master.
+
+The LPC controller is represented as a multi-function device to account for the
+mix of functionality it provides. The principle split is between the register
+layout at the start of the I/O space which is, to quote the Aspeed datasheet,
+"basically compatible with the [LPC registers from the] popular BMC controller
+H8S/2168[1]", and everything else, where everything else is an eclectic
+collection of functions with a esoteric register layout. "Everything else",
+here labeled the "host" portion of the controller, includes, but is not limited
+to:
+
+* An IPMI Block Transfer[2] Controller
+
+* An LPC Host Controller: Manages LPC functions such as host vs slave mode, the
+  physical properties of some LPC pins, configuration of serial IRQs, and
+  APB-to-LPC bridging amonst other functions.
+
+* An LPC Host Interface Controller: Manages functions exposed to the host such
+  as LPC firmware hub cycles, configuration of the LPC-to-AHB mapping, UART
+  management and bus snoop configuration.
+
+* A set of SuperIO[3] scratch registers: Enables implementation of e.g. custom
+  hardware management protocols for handover between the host and baseboard
+  management controller.
+
+Additionally the state of the LPC controller influences the pinmux
+configuration, therefore the host portion of the controller is exposed as a
+syscon as a means to arbitrate access.
+
+[0] http://www.intel.com/design/chipsets/industry/25128901.pdf
+[1] 
https://www.renesas.com/en-sg/doc/products/mpumcu/001/rej09b0078_h8s2168.pdf?key=7c88837454702128622bee53acbda8f4
+[2] 
http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/ipmi-second-gen-interface-spec-v2-rev1-1.pdf
+[3] https://en.wikipedia.org/wiki/Super_I/O
+
+Required properties
+===
+
+- compatible:  One of:
+   "aspeed,ast2400-lpc", "simple-mfd"
+   "aspeed,ast2500-lpc", "simple-mfd"
+
+- reg: contains the physical address and length values of the Aspeed
+LPC memory region.
+
+- #address-cells: <1>
+- #size-cells: <1>
+- ranges:  Maps 0 to the physical address and length of the LPC memory
+region
+
+Required LPC Child nodes
+
+
+BMC Node
+
+
+- compatible:  One of:
+   "aspeed,ast2400-lpc-bmc"
+   "aspeed,ast2500-lpc-bmc"
+
+- reg: contains the physical address and length values of the
+H8S/2168-compatible LPC controller memory region
+
+Host Node
+-
+
+- compatible:   One of:
+   "aspeed,ast2400-lpc-host", "simple-mfd", "syscon"
+   "aspeed,ast2500-lpc-host", "simple-mfd", "syscon"
+
+- reg: contains the address and length values of the host-related
+register space for the Aspeed LPC controller
+
+- #address-cells: <1>
+- #size-cells: <1>
+- ranges:  Maps 0 to the address and length of the host-related LPC memory
+region
+
+Example:
+
+lpc: lpc@1e789000 {
+   compatible = "aspeed,ast2500-lpc", "simple-mfd";
+   reg = <0x1e789000 0x1000>;
+
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0x0 0x1e789000 0x1000>;
+
+   lpc_bmc: lpc-bmc@0 {
+   compatible = "aspeed,ast2500-lpc-bmc";
+   reg = <0x0 0x80>;
+   };
+
+   lpc_host: lpc-host@80 {
+   compatible = "aspeed,ast2500-lpc-host", "simple-mfd", "syscon";
+   reg = <0x80 0x1e0>;
+   reg-io-width = <4>;
+
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0x0 0x80 0x1e0>;
+   };
+};
+
-- 
2.9.3



[PATCH v4 0/5] mfd: dt: Add bindings for the Aspeed MFDs

2016-12-19 Thread Andrew Jeffery
Hi Lee,

Here's v4 of the Aspeed LPC MFD devicetree bindings series. v3 can be found at:

  https://lkml.org/lkml/2016/12/5/835

Changes since v3:

* Based on Arnd's argument[1], drop the addition of the mfd/syscon bindings
  directory as well as the the last patch in v3, which moved a number of
  existing bindings. Eventually the Aspeed display controller will have a
  device-specific driver so it doesn't belong there either.

* Add a compatible string for the AST2400 in the LPC Host Controller bindings
  as requested by Joel and slightly tweak the reg description for Rob.

[1] https://lkml.org/lkml/2016/12/13/202

Andrew Jeffery (5):
  mfd: dt: Fix "indicates" typo in mfd bindings document
  mfd: dt: ranges, #address-cells and #size-cells as optional properties
  mfd: dt: Add Aspeed Low Pin Count Controller bindings
  mfd: dt: Add bindings for the Aspeed LPC Host Controller (LHC)
  mfd: dt: Add bindings for the Aspeed SoC Display Controller (GFX)

 .../devicetree/bindings/mfd/aspeed-gfx.txt |  17 +++
 .../devicetree/bindings/mfd/aspeed-lpc.txt | 137 +
 Documentation/devicetree/bindings/mfd/mfd.txt  |  12 +-
 3 files changed, 165 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/mfd/aspeed-gfx.txt
 create mode 100644 Documentation/devicetree/bindings/mfd/aspeed-lpc.txt

-- 
2.9.3



Re: [PATCH] staging: lustre: ldlm: use designated initializers

2016-12-19 Thread Dan Carpenter
On Mon, Dec 19, 2016 at 08:47:50AM -0800, Bruce Korb wrote:
> On Mon, Dec 19, 2016 at 8:22 AM, James Simmons
> >> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> >> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> >> @@ -143,7 +143,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock 
> >> *req, __u64 *flags,
> >>   int added = (mode == LCK_NL);
> >>   int overlaps = 0;
> >>   int splitted = 0;
> >> - const struct ldlm_callback_suite null_cbs = { NULL };
> >> + const struct ldlm_callback_suite null_cbs = { };
> >>
> >>   CDEBUG(D_DLMTRACE,
> >>  "flags %#llx owner %llu pid %u mode %u start %llu end %llu\n",
> >
> > Nak. Filling null_cbs with random data is a bad idea. If you look at
> > ldlm_lock_create() where this is used you have
> >
> > if (cbs) {
> > lock->l_blocking_ast = cbs->lcs_blocking;
> > lock->l_completion_ast = cbs->lcs_completion;
> > lock->l_glimpse_ast = cbs->lcs_glimpse;
> > }
> >
> > Having lock->l_* point to random addresses is a bad idea.
> > What really needs to be done is proper initialization of that
> > structure. A bunch of patches will be coming to address this.
> 
> I'm not understanding the effect of the original difference.  If you
> specify any initializer, then all fields not specified are filled with
> zero bits. Any pointers are, perforce, NULL.  That should make both "{
> NULL }" and "{}" equivalent.

They are equivalent, yes, but people want to use a GCC plugin that
randomizes struct layouts for internal structures and the plugin doesn't
work when you use struct ordering to initialize the struct.  The plugin
requires that you use designated intializers.

regards,
dan carpenter



Re: [tip:x86/urgent] x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6

2016-12-19 Thread H. Peter Anvin
On 12/19/16 02:56, tip-bot for Andy Lutomirski wrote:
> Commit-ID:  3df8d9208569ef0b2313e516566222d745f3b94b
> Gitweb: http://git.kernel.org/tip/3df8d9208569ef0b2313e516566222d745f3b94b
> Author: Andy Lutomirski 
> AuthorDate: Thu, 15 Dec 2016 10:14:42 -0800
> Committer:  Thomas Gleixner 
> CommitDate: Mon, 19 Dec 2016 11:50:24 +0100
> 
> x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6
> 
> A typo (or mis-merge?) resulted in leaf 6 only being probed if
> cpuid_level >= 7.
> 
> Fixes: 2ccd71f1b278 ("x86/cpufeature: Move some of the scattered feature bits 
> to x86_capability")
> Signed-off-by: Andy Lutomirski 
> Acked-by: Borislav Petkov 
> Cc: Brian Gerst 
> Link: 
> http://lkml.kernel.org/r/6ea30c0e9daec21e488b54761881a6dfcf3e04d0.1481825597.git.l...@kernel.org
> Signed-off-by: Thomas Gleixner 
> 
> ---
>  arch/x86/kernel/cpu/common.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 1f6b50a..dc1697c 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -667,13 +667,14 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
>   c->x86_capability[CPUID_1_EDX] = edx;
>   }
>  
> + /* Thermal and Power Management Leaf: level 0x0006 (eax) */
> + if (c->cpuid_level >= 0x0006)
> + c->x86_capability[CPUID_6_EAX] = cpuid_eax(0x0006);
> +
>   /* Additional Intel-defined flags: level 0x0007 */
>   if (c->cpuid_level >= 0x0007) {
>   cpuid_count(0x0007, 0, &eax, &ebx, &ecx, &edx);
> -
>   c->x86_capability[CPUID_7_0_EBX] = ebx;
> -
> - c->x86_capability[CPUID_6_EAX] = cpuid_eax(0x0006);
>   c->x86_capability[CPUID_7_ECX] = ecx;
>   }

Perhaps we should have something like:

void cpuid_cond(u32 leaf, u32 subleaf,
u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
{
const struct cpuinfo_x86 *c = this_cpu_ptr(&cpu_info);
u32 maxleaf;
int cpuid_level = c->cpuid_level;

switch (leaf >> 16) {
case 0x:
maxleaf = c->cpuid_level;
break;
case 0x8000:
maxleaf = c->extended_cpuid_level;
break;
case 0x4000 .. 0x40ff:
/* Not ideal, can we do better? */
maxleaf = cpu_has(X86_FEATURE_HYPERVISOR)
  ? 0x40ff : 0;
break;
/* Add Transmeta 0x8086 and VIA/Cyrix 0xc000? */
default:
/* Can we do better? */
if (c->x86_vendor_id) == X86_VENDOR_INTEL)
maxleaf = 0;
else
maxleaf = leaf | 0x;
break;
}

if (cpuid_level >= 0 && leaf >= maxleaf) {
cpuid_count(leaf, subleaf, eax, ebx, ecx, edx);
} else {
*eax = *ebx = *ecx = *edx = 0;
}
}

-hpa

P.S. I would love to (a) move the CPUID bits into a structure instead of
passing all those crazy pointers around, and (b) stop passing struct
cpuinfo * around when we only use it for the current processor anyway (a
lot of these functions are in fact completely invalid if we don't); we
could define this_cpu_info as (*this_cpu_ptr(&cpu_info)) -- basically
what I have above -- or directly use percpu functions to access these
variables.




Re: [PATCH v18 0/4] Introduce usb charger framework to deal with the usb gadget power negotation

2016-12-19 Thread Baolin Wang
Hi Neil,

On 3 November 2016 at 09:25, NeilBrown  wrote:
> On Tue, Nov 01 2016, Baolin Wang wrote:
>
>
>>> So I won't be responding on this topic any further until I see a genuine
>>> attempt to understand and resolve the inconsistencies with
>>> usb_register_notifier().
>>
>> Any better solution?
>
> I'm not sure exactly what you are asking, so I'll assume you are asking
> the question I want to answer :-)
>
> 1/ Liase with the extcon developers to resolve the inconsistencies
>   with USB connector types.
>   e.g. current there is both "EXTCON_USB" and "EXTCON_CHG_USB_SDP"
>   which both seem to suggest a standard downstream port.  There is no
>   documentation describing how these relate, and no consistent practice
>   to copy.
>   I suspect the intention is that
> EXTCON_USB and EXTCON_USB_HOST indicated that data capabilities of
> the cable, while EXTCON_CHG_USB* indicate the power capabilities of
> the cable.
> So EXTCON_CHG_USB_SDP should always appear together with EXTCON_USB
> while EXTCON_CHG_USB_DCP would not, and EXTCON_CHG_USB_ACA
> would normally appear with EXTCON_USB_HOST (I think).
>   Some drivers follow this model, particularly extcon-max14577.c
>   but it is not consistent.
>
>   This policy should be well documented and possibly existing drivers
>   should be updated to follow it.
>
>   At the same time it would make sense to resolve EXTCON_CHG_USB_SLOW
>   and EXTCON_CHG_USB_FAST.  These names don't mean much.
>   They were recently removed from drivers/power/axp288_charger.c
>   which is good, but are still used in drivers/extcon/extcon-max*
>   Possibly they should be changed to names from the standard, or
>   possibly they should be renamed to identify the current they are
>   expected to provide. e.g. EXTCON_CHG_USB_500MA and EXTCON_CHG_USB_1A

Now I am creating the new patchset with fixing and converting exist drivers.

I did some investigation about EXTCON subsystem. From your suggestion:
1. EXTCON_CHG_USB_SDP should always appear together with EXTCON_USB.
 After checking, now all extcon drivers were following this rule.

2. EXTCON_CHG_USB_ACA would normally appear with EXTCON_USB_HOST.
 Now no extcon drivers used EXTCON_CHG_USB_ACA, then no need to change.

3. Change EXTCON_CHG_USB_SLOW/FAST to EXTCON_CHG_USB_500MA/1A.
 There are no model that shows the slow charger should be 500mA
and fast charger is 1A. (In extcon-max77693.c file, the fast charger
can be drawn 2A), so changing to EXTCON_CHG_USB_500MA/1A is not useful
I think.

>From above investigation, I did not find anything need to be changed
now. What do you think? Thanks.

>
> 2/ Change all usb phys to register an extcon and to send appropriate
>notifications.  Many already do, but I don't think it is universal.
>It is probable that the extcon should be registered using common code
>instead of each phy driver having its own
>extcon_get_edev_by_phandle()
>or whatever.
>If the usb phy driver needs to look at battery charger registers to
>know what sort of cable was connected (which I believe is the case
>for the chips you are interested in), then it should do that.
>
> 3/ Currently some USB controllers discover that a cable was connected by
>listening on an extcon, and some by registering for a usb_notifier
>(described below) ... though there seem to only be 2 left which do that.
>Now that all USB phys send connection information via extcon (see 2),
>the USB controllers should be changed to all find out about the cable
>using extcon.
>
> 4/ struct usb_phy contains:
> /* for notification of usb_phy_events */
> struct atomic_notifier_head notifier;
>
>   This is used inconsistently.  Sometimes the argument passed
>   is NULL, sometimes it is a pointer to 'vbus_draw' - the current
>   limited negotiated via USB, sometimes it is a pointer the the gadget
>   though as far as I can tell, that last one is never used.
>
>   This should be changed to be consistent.  This notifier is no longer
>   needed to tell the USB controller that a cable was connected (extcon
>   now does that, see 3) so it is only used to communicate the
>   'vbus_draw' information.
>   So it should be changed to *only* send a notification when vbus_draw
>   is known, and it should carry that information.
>   This should probably be done in common code, and removed
>   from individual drivers.
>
> 5/ Now that all cable connection notifications are sent over extcon and
>all vbus_draw notifications are sent over the usb_phy notifier, write
>some support code that a power supply client can use to be told what
>power is available.
>e.g. a battery charger driver would call:
>register_power_client(.)
>or similar, providing a phandle (or similar) for the usb phy and a
>function to call back when the available current changes (or maybe a
>work_struct containing the function pointer)
>
>register_power_client() would the

Klientskie bazy dannih! Uznaite podrobnee! Skype: prodawez390 Email: prodawez...@gmail.com tel +79139230330 (whatsapp\viber\telegram) Soberem dlja Vas po internet bazu dannyh potencial'nyh klientov d

2016-12-19 Thread linux-kernel@vger.kernel.org
Klientskie bazy dannih!!! Uznaite podrobnee po Skype: prodawez390 Email: 
prodawez...@gmail.com  tel +79139230330 (whatsapp\viber\telegram) Soberem dlja 
Vas po internet bazu dannyh potencial'nyh klientov dlja Vashego Biznesa Mnogo! 
Bystro! Nedorogo! Uznajte podrobnee na kartinke https://is.gd/HP0cCY


Re: [PATCH] drm/mediatek: Support UYVY and YUYV format for overlay

2016-12-19 Thread YT Shen
Hi Bibby,

On Wed, 2016-12-14 at 13:14 +0800, Bibby Hsieh wrote:
> MT8173 overlay can support UYVY and YUYV format,
> we add the format in DRM driver.
> 
> Signed-off-by: Bibby Hsieh 
> ---
>  drivers/gpu/drm/mediatek/mtk_disp_ovl.c  | 6 ++
>  drivers/gpu/drm/mediatek/mtk_drm_plane.c | 2 ++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c 
> b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
> index 019b7ca..0a340f3 100644
> --- a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
> +++ b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
> @@ -44,6 +44,8 @@
>  #define OVL_CON_CLRFMT_RGB888(1 << 12)
>  #define OVL_CON_CLRFMT_RGBA  (2 << 12)
>  #define OVL_CON_CLRFMT_ARGB  (3 << 12)
> +#define OVL_CON_CLRFMT_UYVY  (4 << 12)
> +#define OVL_CON_CLRFMT_YUYV  (5 << 12)
>  #define  OVL_CON_AEN BIT(8)
>  #define  OVL_CON_ALPHA   0xff
>  
> @@ -161,6 +163,10 @@ static unsigned int ovl_fmt_convert(unsigned int fmt)
>   case DRM_FORMAT_XBGR:
>   case DRM_FORMAT_ABGR:
>   return OVL_CON_CLRFMT_RGBA | OVL_CON_BYTE_SWAP;
> + case DRM_FORMAT_YUYV:
> + return OVL_CON_CLRFMT_YUYV;
> + case DRM_FORMAT_UYVY:
> + return OVL_CON_CLRFMT_UYVY;
>   }
>  }
Your patch looks good, but I am not sure about some details.
AFAIK, there is a color matrix here to describe how to transform from
BT.601 / BT.709 / other color space to RGB color space.  You can use the
default value if you don't care how colors are represented.  Or you can
specify the matrix in this patch.

Regards,
yt.shen

>  
> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_plane.c 
> b/drivers/gpu/drm/mediatek/mtk_drm_plane.c
> index c461a23..b94c6ee 100644
> --- a/drivers/gpu/drm/mediatek/mtk_drm_plane.c
> +++ b/drivers/gpu/drm/mediatek/mtk_drm_plane.c
> @@ -28,6 +28,8 @@
>   DRM_FORMAT_XRGB,
>   DRM_FORMAT_ARGB,
>   DRM_FORMAT_RGB565,
> + DRM_FORMAT_YUYV,
> + DRM_FORMAT_UYVY,
>  };
>  
>  static void mtk_plane_reset(struct drm_plane *plane)




Re: [PATCH v3 2/5] lib: Add Sed-opal library

2016-12-19 Thread Al Viro
On Mon, Dec 19, 2016 at 12:35:46PM -0700, Scott Bauer wrote:

> +int fdev_sed_ioctl(struct file *filep, unsigned int cmd,
> +unsigned long arg)
> +{
> + struct sed_key key;
> + struct sed_context *sed_ctx;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EACCES;
> +
> + if (!filep->f_sedctx || !filep->f_sedctx->ops || !filep->f_sedctx->dev)
> + return -ENODEV;
> +
> + sed_ctx = filep->f_sedctx;

First of all, that's a bisect hazard.  What's more, looking through the
rest of patchset, WTF does it
* need to be called that early in ioctl(2) handling, instead of
having ->ioctl() instance for that sucker calling it?
* _not_ get your ->f_sedctx as an explicit argument, passed by
the caller in ->ioctl(), seeing that it's possible to calculate by
file->private_data?
* store that thing in struct file itself, bloating it for everything
all for the sake of few drivers that might want to use that helper?


[PATCH] arm64/dts/ls2080a-rdb: remove disable status of pca9547

2016-12-19 Thread Meng Yi
pca9547 won't probed since its status property is disabled.
while there are devices connected to it, we need remove status
property to let ds3232 and adt7461 probed correctly.

Signed-off-by: Meng Yi 
---
 arch/arm64/boot/dts/freescale/fsl-ls2080a-rdb.dts | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls2080a-rdb.dts 
b/arch/arm64/boot/dts/freescale/fsl-ls2080a-rdb.dts
index 265e0a8..2ff46ca 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls2080a-rdb.dts
+++ b/arch/arm64/boot/dts/freescale/fsl-ls2080a-rdb.dts
@@ -102,7 +102,6 @@
reg = <0x75>;
#address-cells = <1>;
#size-cells = <0>;
-   status = "disabled";
i2c@1 {
#address-cells = <1>;
#size-cells = <0>;
-- 
2.1.0.27.g96db324



Re: [PATCH v3 4/5] nvme: Implement resume_from_suspend and SED Allocation code.

2016-12-19 Thread Christoph Hellwig
> +void nvme_unlock_from_suspend(struct nvme_ctrl *ctrl)
> +{
> + if (opal_unlock_from_suspend(&ctrl->sed_ctx))
> + pr_warn("Failed to unlock one or more locking ranges!\n");
> +}
> +EXPORT_SYMBOL_GPL(nvme_unlock_from_suspend);

I don't think we even need this wrapper.  Also for the warning please
use dev_warn so that the user knows which controller failed.

> @@ -1765,7 +1766,7 @@ static void nvme_reset_work(struct work_struct *work)
>  {
>   struct nvme_dev *dev = container_of(work, struct nvme_dev, reset_work);
>   int result = -ENODEV;
> -
> + bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
>   if (WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING))

Please don't remove the empty line after the variable declarations.
Also I would place your new line before the result line, as that makes
it a tad easier to read.


Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

2016-12-19 Thread Baolin Wang
On 20 December 2016 at 14:39, Lu Baolu  wrote:
> Hi,
>
> On 12/20/2016 02:06 PM, Baolin Wang wrote:
>> Hi,
>>
>> On 20 December 2016 at 12:29, Lu Baolu  wrote:
>>> Hi Mathias,
>>>
>>> On 12/19/2016 08:13 PM, Mathias Nyman wrote:
 On 19.12.2016 13:34, Baolin Wang wrote:
> Hi Mathias,
>
> On 19 December 2016 at 18:33, Mathias Nyman
>  wrote:
>> On 13.12.2016 05:21, Baolin Wang wrote:
>>> Hi Mathias,
>>>
>>> On 12 December 2016 at 23:52, Mathias Nyman
>>>  wrote:
 On 05.12.2016 09:51, Baolin Wang wrote:
>
> If a command event is found on the event ring during an interrupt,
> we need to stop the command timer with del_timer(). Since del_timer()
> can fail if the timer is running and waiting on the xHCI lock, then
> it maybe get the wrong timeout command in 
> xhci_handle_command_timeout()
> if host fetched a new command and updated the xhci->current_cmd in
> handle_cmd_completion(). For this situation, we need a way to signal
> to the command timer that everything is fine and it should exit.


 Ah, right, this could actually happen.

>
> We should introduce a counter (xhci->current_cmd_pending) for the 
> number
> of pending commands. If we need to cancel the command timer and
> del_timer()
> succeeds, we decrement the number of pending commands. If del_timer()
> fails,
> we leave the number of pending commands alone.
>
> For handling timeout command, in xhci_handle_command_timeout() we will
> check
> the counter after decrementing it, if the counter
> (xhci->current_cmd_pending)
> is 0, which means xhci->current_cmd is the right timeout command. If 
> the
> counter (xhci->current_cmd_pending) is greater than 0, which means
> current
> timeout command has been handled by host and host has fetched new
> command
> as
> xhci->current_cmd, then just return and wait for new current command.


 A counter like this could work.

 Writing the abort bit can generate either ABORT+STOP, or just STOP
 event, this seems to cover both.

 quick check, case 1: timeout and cmd completion at the same time.

 cpu1cpu2

 queue_command(first), p++ (=1)
 queue_command(more),
 --completion irq fires---- timer times out at same 
 time--
 handle_cmd_completion() handle_cmd_timeout(),)
 lock(xhci_lock  )   spin_on(xhci_lock)
 del_timer() fail, p (=1, nochange)
 cur_cmd = list_next(), p++ (=2)
 unlock(xhci_lock)
   lock(xhci_lock)
   p-- (=1)
   if (p > 0), exit
 OK works

 case 2: normal timeout case with ABORT+STOP, no race.

 cpu1cpu2

 queue_command(first), p++ (=1)
 queue_command(more),
   handle_cmd_timeout()
   p-- (P=0), don't exit
   mod_timer(), p++ (P=1)
   write_abort_bit()
 handle_cmd_comletion(ABORT)
 del_timer(), ok, p-- (p = 0)
 handle_cmd_completion(STOP)
 del_timer(), fail, (P=0)
 handle_stopped_cmd_ring()
 cur_cmd = list_next(), p++ (=1)
 mod_timer()

 OK, works, and same for just STOP case, with the only difference that
 during handle_cmd_completion(STOP) p is decremented (p--)
>>>
>>> Yes, that's the cases what I want to handle, thanks for your explicit
>>> explanation.
>>>
>> Gave this some more thought over the weekend, and this implementation
>> doesn't solve the case when the last command times out and races with the
>> completion handler:
>>
>> cpu1cpu2
>>
>> queue_command(first), p++ (=1)
>> --completion irq fires---- timer times out at same time--
>> handle_cmd_completion() handle_cmd_timeout(),)
>> lock(xhci_lock )spin_on(xhci_lock)
>> del_timer() fail, p (=1, nochange)
>> no more commands, P (=1, nochange)
>> unlock(xhci_lock)
>>  lock(xhci_lock)
>>  p-- (=0)
>>  p == 0, continue, even if we 
>> should
>> not.
>>   

Re: [PATCH v3 1/5] include: Add definitions for sed

2016-12-19 Thread Christoph Hellwig
On Mon, Dec 19, 2016 at 12:35:45PM -0700, Scott Bauer wrote:
> This patch adds the definitions and structures for the SED
> Opal code.

This seems to contain a few things:  userspace ABIs, on the wire
defintions, and prototypes for the code added in patch 2.

The userspace ABIs should be a header on it's own.  For new code
we also usually try to split the protocol definitions from the
in-kernel APIs, although that's not even reflected in the headers
here.  If you think it's reasonable to split those out those should
be a separate new patch, and the actual in-kernel data structures
and protoypes should just go with the actual code in your current patch
2.

> +/**
> + * struct sed_context - SED Security context for a device
> + * @ops:The Trusted Send/Recv functions.
> + * @sec_data:Opaque pointer that will be passed to the send/recv fn.
> + *Drivers can use this to pass necessary data required for
> + *Their implementation of send/recv.
> + * @dev:Currently an Opal Dev structure. In the future can be other types
> + *Of security structures.
> + */

This is missing a couple spaces for the common kerneldoc format.

> +struct sed_context {
> + const struct sec_ops *ops;
> + void *sec_data;
> + void *dev;
> +};

What is the difference between sec_data and dev?  And do your really
need either one when using the container_of trick I pointed out in
response to the other patch.

> +struct sed_key {
> + __u32 sed_type;
> + union {
> + struct opal_keyopal;
> + struct opal_new_pw opal_pw;
> + struct opal_session_info   opal_session;
> + struct opal_user_lr_setup  opal_lrs;
> + struct opal_lock_unlockopal_lk_unlk;
> + struct opal_mbr_data   opal_mbr;
> + /* additional command set key types */
> + };
> +};
> +
> +#define IOC_SED_SAVE_IOW('p', 220, struct sed_key)
> +#define IOC_SED_LOCK_UNLOCK _IOW('p', 221, struct sed_key)
> +#define IOC_SED_TAKE_OWNERSHIP  _IOW('p', 222, struct sed_key)
> +#define IOC_SED_ACTIVATE_LSP   _IOW('p', 223, struct sed_key)
> +#define IOC_SED_SET_PW _IOW('p', 224, struct sed_key)
> +#define IOC_SED_ACTIVATE_USR   _IOW('p', 225, struct sed_key)
> +#define IOC_SED_REVERT_TPR _IOW('p', 226, struct sed_key)
> +#define IOC_SED_LR_SETUP   _IOW('p', 227, struct sed_key)
> +#define IOC_SED_ADD_USR_TO_LR  _IOW('p', 228, struct sed_key)
> +#define IOC_SED_ENABLE_DISABLE_MBR _IOW('p', 229, struct sed_key)
> +#define IOC_SED_ERASE_LR   _IOW('p', 230, struct sed_key)
> +#define IOC_SED_SECURE_ERASE_LR_IOW('p', 231, struct sed_key)

I'll need to look at the details again, but it seems like each ioctl
uses exactly one of the structures in the union above, so there is no
real need for that union.


Re: [PATCH v2] fscrypt: Factor out bio specific functions

2016-12-19 Thread David Gstir
Hi,

> On 19.12.2016, at 12:25, Richard Weinberger  wrote:
> 
> That way we can get rid of the direct dependency on CONFIG_BLOCK.
> 
> Reported-by: Arnd Bergmann 
> Reported-by: Randy Dunlap 
> Suggested-by: Christoph Hellwig 
> Fixes: d475a507457b ("ubifs: Add skeleton for fscrypto")
> Signed-off-by: Richard Weinberger 
> ---
> Changes since v1:
> - Moved fscrypt_zeroout_range() also to bio.c

Looks good to me.

Reviewed-by: David Gstir 

- David


[PATCH v3 4/6] powerpc/perf: Add event attribute and group to IMC pmus

2016-12-19 Thread Hemant Kumar
Device tree IMC driver code parses the IMC units and their events. It
passes the information to IMC pmu code which is placed in powerpc/perf
as "imc-pmu.c".

This patch creates only event attributes and attribute groups for the
IMC pmus.

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Daniel Axtens 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 
---
 arch/powerpc/perf/Makefile|  6 +-
 arch/powerpc/perf/imc-pmu.c   | 96 +++
 arch/powerpc/platforms/powernv/opal-imc.c | 12 +++-
 3 files changed, 111 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/perf/imc-pmu.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index f102d53..6f1d0ac 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -2,10 +2,14 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 
 obj-$(CONFIG_PERF_EVENTS)  += callchain.o perf_regs.o
 
+imc-$(CONFIG_PPC_POWERNV)   += imc-pmu.o
+
 obj-$(CONFIG_PPC_PERF_CTRS)+= core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)  += power4-pmu.o ppc970-pmu.o power5-pmu.o \
   power5+-pmu.o power6-pmu.o power7-pmu.o \
-  isa207-common.o power8-pmu.o power9-pmu.o
+  isa207-common.o power8-pmu.o power9-pmu.o \
+  $(imc-y)
+
 obj32-$(CONFIG_PPC_PERF_CTRS)  += mpc7450-pmu.o
 
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
new file mode 100644
index 000..7b6ce50
--- /dev/null
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -0,0 +1,96 @@
+/*
+ * Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2016 Madhavan Srinivasan, IBM Corporation.
+ *  (C) 2016 Hemant K Shaw, IBM Corporation.
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+
+/* dev_str_attr : Populate event "name" and string "str" in attribute */
+static struct attribute *dev_str_attr(const char *name, const char *str)
+{
+   struct perf_pmu_events_attr *attr;
+
+   attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+   sysfs_attr_init(&attr->attr.attr);
+
+   attr->event_str = str;
+   attr->attr.attr.name = name;
+   attr->attr.attr.mode = 0444;
+   attr->attr.show = perf_event_sysfs_show;
+
+   return &attr->attr.attr;
+}
+
+/*
+ * update_events_in_group: Update the "events" information in an attr_group
+ * and assign the attr_group to the pmu "pmu".
+ */
+static int update_events_in_group(struct imc_events *events,
+ int idx, struct imc_pmu *pmu)
+{
+   struct attribute_group *attr_group;
+   struct attribute **attrs;
+   int i;
+
+   /* Allocate memory for attribute group */
+   attr_group = kzalloc(sizeof(*attr_group), GFP_KERNEL);
+   if (!attr_group)
+   return -ENOMEM;
+
+   /* Allocate memory for attributes */
+   attrs = kzalloc((sizeof(struct attribute *) * (idx + 1)), GFP_KERNEL);
+   if (!attrs) {
+   kfree(attr_group);
+   return -ENOMEM;
+   }
+
+   attr_group->name = "events";
+   attr_group->attrs = attrs;
+   for (i = 0; i < idx; i++, events++) {
+   attrs[i] = dev_str_attr((char *)events->ev_name,
+   (char *)events->ev_value);
+   }
+
+   pmu->attr_groups[0] = attr_group;
+   return 0;
+}
+
+/*
+ * init_imc_pmu : Setup the IMC pmu device in "pmu_ptr" and its events
+ *"events".
+ * Setup the cpu mask information for these pmus and setup the state machine
+ * hotplug notifiers as well.
+ */
+int init_imc_pmu(struct imc_events *events, int idx,
+struct imc_pmu *pmu_ptr)
+{
+   int ret = -ENODEV;
+
+   ret = update_events_in_group(events, idx, pmu_ptr);
+   if (ret)
+   goto err_free;
+
+   return 0;
+
+err_free:
+   /* Only free the attr_groups which are dynamically allocated  */
+   if (pmu_ptr->attr_groups[0]) {
+   kfree(pmu_ptr->attr_groups[0]->attrs);
+   kfree(pmu_ptr->attr_groups[0]);
+   }
+
+   return ret;
+}
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 5ee93402..7870401 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -31,8 +31,11 @@
 #include 
 #include 
 
-struct perchip_nest_info nest_perchip_i

[PATCH v3 3/6] powerpc/powernv: Detect supported IMC units and its events

2016-12-19 Thread Hemant Kumar
Parse device tree to detect IMC units. Traverse through each IMC unit
node to find supported events and corresponding unit/scale files (if any).

Right now, only nest IMC units are supported.
The nest IMC unit event node from device tree will contain the offset in
the reserved memory region to get the counter data for a given
event. The offsets for the nest events are contained in the "reg"
property of the event "node".

Kernel code uses this offset as event configuration value.

Device tree parser code also looks for scale/unit property in the event
node and passes on the value as an event attr for perf interface to use
in the post processing by the perf tool. Some PMUs may have common scale
and unit properties which implies that all events supported by this PMU
inherit the scale and unit properties of the PMU itself. For those
events, we need to set the common unit and scale values.

For failure to initialize any unit or any event, disable that unit and
continue setting up the rest of them.

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Daniel Axtens 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 
---
 arch/powerpc/platforms/powernv/opal-imc.c | 332 ++
 1 file changed, 332 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index ee2ae45..5ee93402 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -32,6 +32,337 @@
 #include 
 
 struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+
+static int imc_event_info(char *name, struct imc_events *events)
+{
+   char *buf;
+
+   /* memory for content */
+   buf = kzalloc(IMC_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   events->ev_name = name;
+   events->ev_value = buf;
+   return 0;
+}
+
+static int imc_event_info_str(struct property *pp, char *name,
+  struct imc_events *events)
+{
+   int ret;
+
+   ret = imc_event_info(name, events);
+   if (ret)
+   return ret;
+
+   if (!pp->value || (strnlen(pp->value, pp->length) == pp->length) ||
+  (pp->length > IMC_MAX_PMU_NAME_LEN))
+   return -EINVAL;
+   strncpy(events->ev_value, (const char *)pp->value, pp->length);
+
+   return 0;
+}
+
+static int imc_event_info_val(char *name, u32 val,
+ struct imc_events *events)
+{
+   int ret;
+
+   ret = imc_event_info(name, events);
+   if (ret)
+   return ret;
+   sprintf(events->ev_value, "event=0x%x", val);
+
+   return 0;
+}
+
+static int set_event_property(struct property *pp, char *event_prop,
+ struct imc_events *events, char *ev_name)
+{
+   char *buf;
+   int ret;
+
+   buf = kzalloc(IMC_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   sprintf(buf, "%s.%s", ev_name, event_prop);
+   ret = imc_event_info_str(pp, buf, events);
+   if (ret) {
+   kfree(events->ev_name);
+   kfree(events->ev_value);
+   }
+
+   return ret;
+}
+
+/*
+ * imc_events_node_parser: Parse the event node "dev" and assign the parsed
+ * information to event "events".
+ *
+ * Parses the "reg" property of this event. "reg" gives us the event offset.
+ * Also, parse the "scale" and "unit" properties, if any.
+ */
+static int imc_events_node_parser(struct device_node *dev,
+ struct imc_events *events,
+ struct property *event_scale,
+ struct property *event_unit)
+{
+   struct property *name, *pp;
+   char *ev_name;
+   u32 val;
+   int idx = 0, ret;
+
+   if (!dev)
+   return -EINVAL;
+
+   /*
+* Loop through each property of an event node
+*/
+   name = of_find_property(dev, "event-name", NULL);
+   if (!name)
+   return -ENODEV;
+
+   if (!name->value ||
+ (strnlen(name->value, name->length) == name->length) ||
+ (name->length > IMC_MAX_PMU_NAME_LEN))
+   return -EINVAL;
+
+   ev_name = kzalloc(IMC_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!ev_name)
+   return -ENOMEM;
+
+   strncpy(ev_name, name->value, name->length);
+
+   /*
+* Parse each property of this event node "dev". Property "reg" has
+* the offset which is assigned to the event name. Other properties
+* like "scale" and "unit" are assigned to event.scale and event.unit
+* accordingly.
+*/
+   for_each_property_of_node(dev, pp) {
+   /*
+* If there is an issu

[PATCH v3 6/6] powerpc/perf: IMC pmu cpumask and cpu hotplug support

2016-12-19 Thread Hemant Kumar
Adds cpumask attribute to be used by each IMC pmu. Only one cpu (any
online CPU) from each chip for nest PMUs is designated to read counters.

On CPU hotplug, dying CPU is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same chip (for nest
units) is designated as new cpu to read counters.

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Daniel Axtens 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 
---
 arch/powerpc/include/asm/opal-api.h|   3 +-
 arch/powerpc/include/asm/opal.h|   2 +
 arch/powerpc/perf/imc-pmu.c| 167 -
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 4 files changed, 171 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 0e2e57b..48e1d3e 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -167,7 +167,8 @@
 #define OPAL_INT_EOI   124
 #define OPAL_INT_SET_MFRR  125
 #define OPAL_PCI_TCE_KILL  126
-#define OPAL_LAST  126
+#define OPAL_NEST_IMC_COUNTERS_CONTROL  128
+#define OPAL_LAST  128
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e958b70..fe72b57 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -229,6 +229,8 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t 
kill_type,
 int64_t opal_rm_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
 uint32_t pe_num, uint32_t tce_size,
 uint64_t dma_addr, uint32_t npages);
+int64_t opal_nest_imc_counters_control(uint64_t mode, uint64_t value1,
+ uint64_t value2, uint64_t value3);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index f12ece8..49f6486 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -16,6 +16,7 @@
 
 struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
 struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+static cpumask_t nest_imc_cpumask;
 
 /* Needed for sanity check */
 extern u64 nest_max_offset;
@@ -31,6 +32,164 @@ static struct attribute_group imc_format_group = {
.attrs = imc_format_attrs,
 };
 
+/* Get the cpumask printed to a buffer "buf" */
+static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   cpumask_t *active_mask;
+
+   active_mask = &nest_imc_cpumask;
+   return cpumap_print_to_pagebuf(true, buf, active_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, imc_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *imc_pmu_cpumask_attrs[] = {
+   &dev_attr_cpumask.attr,
+   NULL,
+};
+
+static struct attribute_group imc_pmu_cpumask_attr_group = {
+   .attrs = imc_pmu_cpumask_attrs,
+};
+
+/*
+ * nest_init : Initializes the nest imc engine for the current chip.
+ */
+static void nest_init(int *loc)
+{
+   int rc;
+
+   rc = opal_nest_imc_counters_control(NEST_IMC_PRODUCTION_MODE,
+   NEST_IMC_ENGINE_START, 0, 0);
+   if (rc)
+   loc[smp_processor_id()] = 1;
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+   int i;
+
+   for (i = 0;
+(per_nest_pmu_arr[i] != NULL) && (i < IMC_MAX_PMUS); i++)
+   perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
+   old_cpu, new_cpu);
+}
+
+static int ppc_nest_imc_cpu_online(unsigned int cpu)
+{
+   int nid, fcpu, ncpu;
+   struct cpumask *l_cpumask, tmp_mask;
+
+   /* Fint the cpumask of this node */
+   nid = cpu_to_node(cpu);
+   l_cpumask = cpumask_of_node(nid);
+
+   /*
+* If any of the cpu from this node is already present in the mask,
+* just return, if not, then set this cpu in the mask.
+*/
+   if (!cpumask_and(&tmp_mask, l_cpumask, &nest_imc_cpumask)) {
+   cpumask_set_cpu(cpu, &nest_imc_cpumask);
+   return 0;
+   }
+
+   fcpu = cpumask_first(l_cpumask);
+   ncpu = cpumask_next(cpu, l_cpumask);
+   if (cpu == fcpu) {
+   if (cpumask_test_and_clear_cpu(ncpu, &nest_imc_cpumask)) {
+   cpumask_set_cpu(cpu, &nest_imc_cpumask);
+   nest_change_cpu_context(ncpu, cpu);
+   }
+   }
+
+   return 0;
+}
+
+static int ppc_nest_imc_cpu_offline(unsigned int cpu)
+{
+   int nid, target = -1;
+   st

[PATCH v3 1/6] powerpc/powernv: Data structure and macros definitions

2016-12-19 Thread Hemant Kumar
Create new header file "imc-pmu.h" to add the data structures
and macros needed for IMC pmu support.

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Daniel Axtens 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 
---
 arch/powerpc/include/asm/imc-pmu.h | 73 ++
 1 file changed, 73 insertions(+)
 create mode 100644 arch/powerpc/include/asm/imc-pmu.h

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
new file mode 100644
index 000..911d837
--- /dev/null
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -0,0 +1,73 @@
+#ifndef PPC_POWERNV_IMC_PMU_DEF_H
+#define PPC_POWERNV_IMC_PMU_DEF_H
+
+/*
+ * IMC Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2016 Madhavan Srinivasan, IBM Corporation.
+ *   (C) 2016 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define IMC_MAX_CHIPS  32
+#define IMC_MAX_PMUS   32
+#define IMC_MAX_PMU_NAME_LEN   256
+
+#define NEST_IMC_ENGINE_START  1
+#define NEST_IMC_ENGINE_STOP   0
+#define NEST_MAX_PAGES 16
+
+#define NEST_IMC_PRODUCTION_MODE   1
+
+#define IMC_DTB_COMPAT "ibm,opal-in-memory-counters"
+#define IMC_DTB_NEST_COMPAT"ibm,imc-counters-chip"
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+   u32 chip_id;
+   u64 pbase;
+   u64 vbase[NEST_MAX_PAGES];
+   u64 size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct imc_events {
+   char *ev_name;
+   char *ev_value;
+};
+
+/*
+ * Device tree parser code detects IMC pmu support and
+ * registers new IMC pmus. This structure will
+ * hold the pmu functions and attrs for each imc pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct imc_pmu {
+   struct pmu pmu;
+   int domain;
+   const struct attribute_group *attr_groups[4];
+};
+
+/*
+ * Domains for IMC PMUs
+ */
+#define IMC_DOMAIN_NEST1
+
+#define UNKNOWN_DOMAIN -1
+
+#endif /* PPC_POWERNV_IMC_PMU_DEF_H */
-- 
2.7.4



[PATCH v3 5/6] powerpc/perf: Generic imc pmu event functions

2016-12-19 Thread Hemant Kumar
Since, the IMC counters' data are periodically fed to a memory location,
the functions to read/update, start/stop, add/del can be generic and can
be used by all IMC PMU units.

This patch adds a set of generic imc pmu related event functions to be
used  by each imc pmu unit. Add code to setup format attribute and to
register imc pmus. Add a event_init function for nest_imc events.

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Daniel Axtens 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 
---
 arch/powerpc/include/asm/imc-pmu.h|   1 +
 arch/powerpc/perf/imc-pmu.c   | 122 ++
 arch/powerpc/platforms/powernv/opal-imc.c |  29 ++-
 3 files changed, 148 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 911d837..ceb6b1f 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -70,4 +70,5 @@ struct imc_pmu {
 
 #define UNKNOWN_DOMAIN -1
 
+int imc_get_domain(struct device_node *pmu_dev);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 7b6ce50..f12ece8 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -17,6 +17,117 @@
 struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
 struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 
+/* Needed for sanity check */
+extern u64 nest_max_offset;
+
+PMU_FORMAT_ATTR(event, "config:0-20");
+static struct attribute *imc_format_attrs[] = {
+   &format_attr_event.attr,
+   NULL,
+};
+
+static struct attribute_group imc_format_group = {
+   .name = "format",
+   .attrs = imc_format_attrs,
+};
+
+static int nest_imc_event_init(struct perf_event *event)
+{
+   int chip_id;
+   u32 config = event->attr.config;
+   struct perchip_nest_info *pcni;
+
+   if (event->attr.type != event->pmu->type)
+   return -ENOENT;
+
+   /* Sampling not supported */
+   if (event->hw.sample_period)
+   return -EINVAL;
+
+   /* unsupported modes and filters */
+   if (event->attr.exclude_user   ||
+   event->attr.exclude_kernel ||
+   event->attr.exclude_hv ||
+   event->attr.exclude_idle   ||
+   event->attr.exclude_host   ||
+   event->attr.exclude_guest)
+   return -EINVAL;
+
+   if (event->cpu < 0)
+   return -EINVAL;
+
+   /* Sanity check for config (event offset) */
+   if (config > nest_max_offset)
+   return -EINVAL;
+
+   chip_id = topology_physical_package_id(event->cpu);
+   pcni = &nest_perchip_info[chip_id];
+   event->hw.event_base = pcni->vbase[config/PAGE_SIZE] +
+   (config & ~PAGE_MASK);
+
+   return 0;
+}
+
+static void imc_read_counter(struct perf_event *event)
+{
+   u64 *addr, data;
+
+   addr = (u64 *)event->hw.event_base;
+   data = __be64_to_cpu(*addr);
+   local64_set(&event->hw.prev_count, data);
+}
+
+static void imc_perf_event_update(struct perf_event *event)
+{
+   u64 counter_prev, counter_new, final_count, *addr;
+
+   addr = (u64 *)event->hw.event_base;
+   counter_prev = local64_read(&event->hw.prev_count);
+   counter_new = __be64_to_cpu(*addr);
+   final_count = counter_new - counter_prev;
+
+   local64_set(&event->hw.prev_count, counter_new);
+   local64_add(final_count, &event->count);
+}
+
+static void imc_event_start(struct perf_event *event, int flags)
+{
+   imc_read_counter(event);
+}
+
+static void imc_event_stop(struct perf_event *event, int flags)
+{
+   if (flags & PERF_EF_UPDATE)
+   imc_perf_event_update(event);
+}
+
+static int imc_event_add(struct perf_event *event, int flags)
+{
+   if (flags & PERF_EF_START)
+   imc_event_start(event, flags);
+
+   return 0;
+}
+
+/* update_pmu_ops : Populate the appropriate operations for "pmu" */
+static int update_pmu_ops(struct imc_pmu *pmu)
+{
+   if (!pmu)
+   return -EINVAL;
+
+   pmu->pmu.task_ctx_nr = perf_invalid_context;
+   pmu->pmu.event_init = nest_imc_event_init;
+   pmu->pmu.add = imc_event_add;
+   pmu->pmu.del = imc_event_stop;
+   pmu->pmu.start = imc_event_start;
+   pmu->pmu.stop = imc_event_stop;
+   pmu->pmu.read = imc_perf_event_update;
+   pmu->attr_groups[1] = &imc_format_group;
+   pmu->pmu.attr_groups = pmu->attr_groups;
+
+   return 0;
+}
+
 /* dev_str_attr : Populate event "name" and string "str" in attribute */
 static struct attribute *dev_str_attr(const char *name, const char *str)
 {
@@ -83,6 +194,17 @@ int init_imc_pmu(struct imc_events *events, int idx,
if (ret)
goto err_free;
 
+   ret = upda

[PATCH v3 2/6] powerpc/powernv: Autoload IMC device driver module

2016-12-19 Thread Hemant Kumar
This patch does three things :
 - Enables "opal.c" to create a platform device for the IMC interface
   according to the appropriate compatibility string.
 - Find the reserved-memory region details from the system device tree
   and get the base address of HOMER region address for each chip.
 - We also get the Nest PMU counter data offsets (in the HOMER region)
   and their sizes. The offsets for the counters' data are fixed and
   won't change from chip to chip.

The device tree parsing logic is separated from the PMU creation
functions (which is done in subsequent patches). Right now, only Nest
units are taken care of.

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Daniel Axtens 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 
---
 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/platforms/powernv/opal-imc.c | 117 ++
 arch/powerpc/platforms/powernv/opal.c |  13 
 3 files changed, 131 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb..44909fe 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o
+obj-y  += opal-kmsg.o opal-imc.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
new file mode 100644
index 000..ee2ae45
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -0,0 +1,117 @@
+/*
+ * OPAL IMC interface detection driver
+ * Supported on POWERNV platform
+ *
+ * Copyright  (C) 2016 Madhavan Srinivasan, IBM Corporation.
+ *(C) 2016 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+
+static int opal_imc_counters_probe(struct platform_device *pdev)
+{
+   struct device_node *child, *imc_dev, *rm_node = NULL;
+   struct perchip_nest_info *pcni;
+   u32 reg[4], pages, nest_offset, nest_size, idx;
+   int i = 0;
+   const char *node_name;
+
+   if (!pdev || !pdev->dev.of_node)
+   return -ENODEV;
+
+   imc_dev = pdev->dev.of_node;
+
+   /*
+* nest_offset : where the nest-counters' data start.
+* size : size of the entire nest-counters region
+*/
+   if (of_property_read_u32(imc_dev, "imc-nest-offset", &nest_offset))
+   goto err;
+   if (of_property_read_u32(imc_dev, "imc-nest-size", &nest_size))
+   goto err;
+
+   /* Find the "homer region" for each chip */
+   rm_node = of_find_node_by_path("/reserved-memory");
+   if (!rm_node)
+   goto err;
+
+   for_each_child_of_node(rm_node, child) {
+   if (of_property_read_string_index(child, "name", 0,
+ &node_name))
+   continue;
+   if (strncmp("ibm,homer-image", node_name,
+   strlen("ibm,homer-image")))
+   continue;
+
+   /* Get the chip id to which the above homer region belongs to */
+   if (of_property_read_u32(child, "ibm,chip-id", &idx))
+   goto err;
+
+   /* reg property will have four u32 cells. */
+   if (of_property_read_u32_array(child, "reg", reg, 4))
+   goto err;
+
+   pcni = &nest_perchip_info[idx];
+
+   /* Fetch the homer region base address */
+   pcni->pbase = reg[0];
+   pcni->pbase = pcni->pbase << 32 | reg[1];
+   /* Add the nest IMC Base offset */
+   pcni->pbase = pcni->pbase + nest_offset;
+

[PATCH v3 0/6] IMC Instrumentation Support

2016-12-19 Thread Hemant Kumar
Power 9 has In-Memory-Collection (IMC) infrastructure which contains
various Performance Monitoring Units (PMUs) at Nest level (these are
on-chip but off-core). These Nest PMU counters are handled by a Nest
IMC microcode. This microcode runs in the OCC (On-Chip Controller)
complex and its purpose is to program the nest counters, collect the
counter data and move the counter data to memory. 

The IMC infrastructure encapsulates nest (per-chip), core and thread
level counters. While the nest IMC PMUs are handled by the nest IMC
microcode, the core and thread level PMUs are handled by the Core-HPMC
engine. This patchset enables the nest IMC PMUs and is based on the
initial work done by Madhavan Srinivasan.
"Nest Instrumentation Support" : 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132078.html

v1 for this patchset can be found here :
https://lwn.net/Articles/705475/

Nest events:
Per-chip nest instrumentation provides various per-chip metrics
such as memory, powerbus, Xlink and Alink bandwidth.

PMU Events' Information:
OPAL obtains the Nest PMU and event information from the IMC Catalog
and passes on to the kernel via the device tree. The events' information
contains :
 - Event name
 - Event Offset
 - Event description
and, maybe :
 - Event scale
 - Event unit

Some PMUs may have a common scale and unit values for all their
supported events. For those cases, the scale and unit properties for
those events must be inherited from the PMU.

The event offset in the memory is where the counter data gets
accumulated.

The OPAL-side patches are posted upstream :
https://lists.ozlabs.org/pipermail/skiboot/2016-November/005552.html

The kernel discovers the IMC counters information in the device tree
at the "imc-counters" device node which has a compatible field
"ibm,opal-in-memory-counters".

Parsing of the Events' information:
To parse the IMC PMUs and events information, the kernel has to
discover the "imc-counters" node and walk through the pmu and event
nodes.

Here is an excerpt of the dt showing the imc-counters and mcs node:
/dts-v1/;

[...]
imc-counters {   
imc-nest-offset = <0x32>;
compatible = "ibm,opal-in-memory-counters";
imc-nest-size = <0x3>;
#address-cells = <0x1>;
#size-cells = <0x1>;
phandle = <0x1238>;
version-id = [00];

mcs0 {
compatible = "ibm,imc-counters-chip";
ranges;
#address-cells = <0x1>;
#size-cells = <0x1>;
phandle = <0x1279>;
scale = "1.2207e-4";
unit = "MiB";

event@528 {
event-name = "PM_MCS_UP_128B_DATA_XFER_MC0" ;
desc = "Total Read Bandwidth seen on both MCS 
of MC0";
phandle = <0x128c>;
reg = <0x118 0x8>;
};
[...]

>From the device tree, the kernel parses the PMUs and their events'
information.

After parsing the nest IMC PMUs and their events, the PMUs and their
attributes are registered in the kernel.

Example Usage :
 # perf list

  [...]
  nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/   [Kernel PMU event]
  nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0_LAST_SAMPLE/ [Kernel PMU event]
  [...]

 # perf stat -e "nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/" -a --per-socket

TODOs:
 - Add support for Core IMC.
 - Add support for thread IMC.

Comments/feedback/suggestions are welcome.

Changelog:
 v2 -> v3 :
 - Changed all references for IMA (In-Memory Accumulation) to IMC (In-Memory
   Collection).
 v1 -> v2 :
 - Account for the cases where a PMU can have a common scale and unit
   values for all its supported events (Patch 3/6).
 - Fixed a Build error (for maple_defconfig) by enabling imc_pmu.o
   only for CONFIG_PPC_POWERNV=y (Patch 4/6)
 - Read from the "event-name" property instead of "name" for an event
   node (Patch 3/6).

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Michael Neuling 
Cc: Stewart Smith 
Cc: Daniel Axtens 
Cc: Stephane Eranian 
Signed-off-by: Hemant Kumar 

Hemant Kumar (6):
  powerpc/powernv: Data structure and macros definitions
  powerpc/powernv: Autoload IMC device driver module
  powerpc/powernv: Detect supported IMC units and its events
  powerpc/perf: Add event attribute and group to IMC pmus
  powerpc/perf: Generic imc pmu event functions
  powerpc/perf: IMC pmu cpumask and cpu hotplug support

 arch/powerpc/include/asm/imc-pmu.h |  74 
 arch/powerpc/include/asm/opal-api.h|   3 +-
 arch/powerpc/include/asm/opal.h   

Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

2016-12-19 Thread Lu Baolu
Hi,

On 12/20/2016 02:06 PM, Baolin Wang wrote:
> Hi,
>
> On 20 December 2016 at 12:29, Lu Baolu  wrote:
>> Hi Mathias,
>>
>> On 12/19/2016 08:13 PM, Mathias Nyman wrote:
>>> On 19.12.2016 13:34, Baolin Wang wrote:
 Hi Mathias,

 On 19 December 2016 at 18:33, Mathias Nyman
  wrote:
> On 13.12.2016 05:21, Baolin Wang wrote:
>> Hi Mathias,
>>
>> On 12 December 2016 at 23:52, Mathias Nyman
>>  wrote:
>>> On 05.12.2016 09:51, Baolin Wang wrote:

 If a command event is found on the event ring during an interrupt,
 we need to stop the command timer with del_timer(). Since del_timer()
 can fail if the timer is running and waiting on the xHCI lock, then
 it maybe get the wrong timeout command in xhci_handle_command_timeout()
 if host fetched a new command and updated the xhci->current_cmd in
 handle_cmd_completion(). For this situation, we need a way to signal
 to the command timer that everything is fine and it should exit.
>>>
>>>
>>> Ah, right, this could actually happen.
>>>

 We should introduce a counter (xhci->current_cmd_pending) for the 
 number
 of pending commands. If we need to cancel the command timer and
 del_timer()
 succeeds, we decrement the number of pending commands. If del_timer()
 fails,
 we leave the number of pending commands alone.

 For handling timeout command, in xhci_handle_command_timeout() we will
 check
 the counter after decrementing it, if the counter
 (xhci->current_cmd_pending)
 is 0, which means xhci->current_cmd is the right timeout command. If 
 the
 counter (xhci->current_cmd_pending) is greater than 0, which means
 current
 timeout command has been handled by host and host has fetched new
 command
 as
 xhci->current_cmd, then just return and wait for new current command.
>>>
>>>
>>> A counter like this could work.
>>>
>>> Writing the abort bit can generate either ABORT+STOP, or just STOP
>>> event, this seems to cover both.
>>>
>>> quick check, case 1: timeout and cmd completion at the same time.
>>>
>>> cpu1cpu2
>>>
>>> queue_command(first), p++ (=1)
>>> queue_command(more),
>>> --completion irq fires---- timer times out at same 
>>> time--
>>> handle_cmd_completion() handle_cmd_timeout(),)
>>> lock(xhci_lock  )   spin_on(xhci_lock)
>>> del_timer() fail, p (=1, nochange)
>>> cur_cmd = list_next(), p++ (=2)
>>> unlock(xhci_lock)
>>>   lock(xhci_lock)
>>>   p-- (=1)
>>>   if (p > 0), exit
>>> OK works
>>>
>>> case 2: normal timeout case with ABORT+STOP, no race.
>>>
>>> cpu1cpu2
>>>
>>> queue_command(first), p++ (=1)
>>> queue_command(more),
>>>   handle_cmd_timeout()
>>>   p-- (P=0), don't exit
>>>   mod_timer(), p++ (P=1)
>>>   write_abort_bit()
>>> handle_cmd_comletion(ABORT)
>>> del_timer(), ok, p-- (p = 0)
>>> handle_cmd_completion(STOP)
>>> del_timer(), fail, (P=0)
>>> handle_stopped_cmd_ring()
>>> cur_cmd = list_next(), p++ (=1)
>>> mod_timer()
>>>
>>> OK, works, and same for just STOP case, with the only difference that
>>> during handle_cmd_completion(STOP) p is decremented (p--)
>>
>> Yes, that's the cases what I want to handle, thanks for your explicit
>> explanation.
>>
> Gave this some more thought over the weekend, and this implementation
> doesn't solve the case when the last command times out and races with the
> completion handler:
>
> cpu1cpu2
>
> queue_command(first), p++ (=1)
> --completion irq fires---- timer times out at same time--
> handle_cmd_completion() handle_cmd_timeout(),)
> lock(xhci_lock )spin_on(xhci_lock)
> del_timer() fail, p (=1, nochange)
> no more commands, P (=1, nochange)
> unlock(xhci_lock)
>  lock(xhci_lock)
>  p-- (=0)
>  p == 0, continue, even if we 
> should
> not.
>For this we still need to rely 
> on
> checking cur_cmd == NULL in the timeout function.
> (Baolus patch sets it to NULL if there are no more commands pending)

[PATCH v2] ipc/sem.c: fix incorrect sem_lock pairing

2016-12-19 Thread Manfred Spraul
Based on the syzcaller test case from dvyukov:
https://gist.githubusercontent.com/dvyukov/d0e5efefe4d7d6daed829f5c3ca26a40/raw/08d0a261fe3c987bed04fbf267e08ba04bd533ea/gistfile1.txt

The slow (i.e.: failure to acquire) syscall exit from semtimedop()
incorrectly assumed that the the same lock is acquired as it was
at the initial syscall entry.

This is wrong:
- thread A: single semop semop(), sleeps
- thread B: multi semop semop(), sleeps
- thread A: woken up by signal/timeout

With this sequence, the initial sem_lock() call locks the
per-semaphore spinlock, and it is unlocked with sem_unlock().
The call at the syscall return locks the global spinlock.
Because locknum is not updated, the following sem_unlock() call
unlocks the per-semaphore spinlock, which is actually not locked.

The fix is trivial: Use the return value from sem_lock.

Reported-by: dvyu...@google.com
Signed-off-by: Manfred Spraul 
Fixes: 370b262c896e ("ipc/sem: avoid idr tree lookup for interrupted semop")
Cc: d...@stgolabs.net

---

Patch V2:
- subject line updated
- documentation slightly updated

 ipc/sem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index e08b948..3ec5742 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1977,7 +1977,7 @@ SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf 
__user *, tsops,
}
 
rcu_read_lock();
-   sem_lock(sma, sops, nsops);
+   locknum = sem_lock(sma, sops, nsops);
 
if (!ipc_valid_object(&sma->sem_perm))
goto out_unlock_free;
-- 
2.7.4



Re: [PATCH v3 3/5] fs: Wire up SED/Opal to ioctl

2016-12-19 Thread Christoph Hellwig
> @@ -853,6 +854,7 @@ struct file {
>  #ifdef CONFIG_SECURITY
>   void*f_security;
>  #endif
> + struct sed_context  *f_sedctx;

Adding a new field to the global struct file for a block driver
feature is not acceptable.  And I don't really see why it would
be nessecary anyway.


Re: [PATCH v3 4/5] nvme: Implement resume_from_suspend and SED Allocation code.

2016-12-19 Thread Christoph Hellwig
> +static int nvme_sec_send(void *ctrl_data, u16 spsp, u8 secp,
> +  void *buf, size_t len)
> +{
> + return nvme_sec_submit(ctrl_data, spsp, secp, buf, len,
> +nvme_admin_security_send);
> +}
> +
> +static int nvme_sec_recv(void *ctrl_data, u16 spsp, u8 secp,
> +  void *buf, size_t len)
> +{
> + return nvme_sec_submit(ctrl_data, spsp, secp, buf, len,
> +nvme_admin_security_recv);
> +}
> +
> +static const struct sec_ops nvme_sec_ops = {
> + .sec_send = nvme_sec_send,
> + .sec_recv = nvme_sec_recv,
> +};

Just make sec_submit the callback passed to the core and avoid
this boiler-plate code.

>  
> +int nvme_opal_initialize(struct nvme_ctrl *ctrl)
> +{
> + /* Opal dev has already been allocated for this controller */
> + if (ctrl->sed_ctx.dev)
> + return 0;
> +
> + ctrl->sed_ctx.dev = alloc_opal_dev(ctrl->admin_q);
> + if (!ctrl->sed_ctx.dev)
> + return -ENOMEM;
> + ctrl->sed_ctx.ops = &nvme_sec_ops;
> + ctrl->sed_ctx.sec_data = ctrl;

No need for sec_data callback, just pass the sed_ctx to the driver
and use container_of to get at the containing structure.


Re: [tpmdd-devel] [PATCH] tpm, tpm_crb: Handle 64-bit resource in crb_check_resource()

2016-12-19 Thread Jiandi An
On 12/19/16 07:56, Jarkko Sakkinen wrote:
> On Sun, Dec 18, 2016 at 10:20:53PM -0600, Jiandi An wrote:
>> crb_check_resource() in TPM CRB driver calls
>> acpi_dev_resource_memory() which only handles 32-bit resources.
>> Adding a call to acpi_dev_resource_address_space() in TPM CRB
>> driver which handles 64-bit resources.
>>
>> Signed-off-by: Jiandi An 
> 
> 1. Is there a platform in existence where this change fixes a problem?
> 2. What is difference between "memory" and "address space" conceptually?
>Just wondering why 32-bit stuff is "memory" and 64-bit stuff is
>"address space". Could there be a one function that would work both
>for 32-bit and 64-bit cases?
>
> Yeah, I do not know this API too well. That's why I'm asking.
> 
> /Jarkko
> 
> 
> If this is the right thing it also needs to be done in tpm_tis.
> 
> I will point out that this driver only works with memory, so using a
> generic decoder without checking for IO maps may not be correct..
> 
> Jason
> 

Hi Jarkko and Jason,

Thanks for your comments.
I am a developer at Qualcomm and we are trying to enable TPM CRB driver
on ARM64 for Qualcomm Centriq 2400.  Spec changes to ACPI table for TPM 2.0
have been submitted and approved by TCG and are currently in the 60 day waiting
period for release.  I have a series of patches that do this TPM CRB driver
enablement for ARM64 that I'll be submitting.

But first, for our platform the control area buffer address is greater than 
32-bit.
It is memory with no IO effects.  I ran into this issue first when I use
QWordMemory in ACPI ASL to describe the resource.  MemoryRangeType is specified 
as
AddressRangeMemory. 

The QWordMemory macro evaluates to a buffer which contains a 64-bit memory
resource descriptor, which describes a range of memory addresses.  It is
a QWord Address Space Descriptor.  acpi_dev_resource_address_space()
handles the 64-bit memory described using QWordMemory macro in ACPI ASL.

The Memory32Fixed macro evaluates to a buffer which contains a 32-bit memory
descriptor, which describes a fixed range of memory addresses.
acpi_dev_resource_memory() handles the 32-bit memory described using 
Memory32Fixed
in ACPI ASL.

I did not see a specific acpi_dev_resource_ service that handles 64-bit resource
address and doesn't use the generic acpi_decode_space() decoder.

I do have a question about having to specify _CRS method in ACPI ASL.
When I was doing early prototyping for this enablement work on ARM64 back
during the time with 4.5 kernel, it was before the introduction of the following
two patches:
commit 1bd047be37d95bf65a219f4931215f71878ac060
tpm_crb: Use devm_ioremap_resource
commit 51dd43dff74b0547ad844638f6910ca29c956819
tpm_tis: Use devm_ioremap_resource

The control area buffer is specified in the TPM2.0 static ACPI table.  TPM
CRB driver maps the control area address and reads out cmd and rsp buffer
addresses and maps them.  There is no requirement in the TCG TPM ACPI spec
for specifying _CRS to describe the control area buffer.  When I rebased
the early prototype for CRB driver ARM64 enablement to the latest kernel,
I hit this issue where I have to specify _CRS method.  So in _CRS method
I specify the same control area address that's in the TPM2.0 static ACPI table.

I see the _CRS requirement in the CRB driver is for resource synchronization
between the TIS and CRB drivers to support force mode in TIS. Could I get some
background on that so I could be more careful not breaking TIS while making
changes to CRB driver for ARM64 enablement?

Thanks in advance.

>> ---
>>  drivers/char/tpm/tpm_crb.c | 8 +---
>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
>> index 717b6b4..86f355b 100644
>> --- a/drivers/char/tpm/tpm_crb.c
>> +++ b/drivers/char/tpm/tpm_crb.c
>> @@ -264,10 +264,12 @@ static bool crb_req_canceled(struct tpm_chip *chip, u8 
>> status)
>>  static int crb_check_resource(struct acpi_resource *ares, void *data)
>>  {
>>  struct resource *io_res = data;
>> -struct resource res;
>> +struct resource_win win;
>> +struct resource *res = &(win.res);
>>  
>> -if (acpi_dev_resource_memory(ares, &res)) {
>> -*io_res = res;
>> +if (acpi_dev_resource_memory(ares, res) ||
>> +acpi_dev_resource_address_space(ares, &win)) {
>> +*io_res = *res;
>>  io_res->name = NULL;
>>  }
>>  
>> -- 
>> Jiandi An
>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
>> Technologies, Inc.
>> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
>> Foundation Collaborative Project.
>>
> 
> --
> Check out the vibrant tech community on one of the world's most 
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> tpmdd-devel mailing list
> tpmdd-de...@lists.sourceforge.ne

Re: [PATCH v3 4/5] nvme: Implement resume_from_suspend and SED Allocation code.

2016-12-19 Thread Christoph Hellwig
On Mon, Dec 19, 2016 at 03:23:12PM -0700, Scott Bauer wrote:
> I went back and reviewed the spec 1.2.1:
> 
> http://www.nvmexpress.org/wp-content/uploads/NVM_Express_1_2_1_Gold_20160603.pdf
> Section 5.18 (page 140->141)
> 
> Describes the security send command type and it doesn't have any reference of
> a namespace ID. Anecdotally, I just removed the ns->ns_id line from the code 
> and
> everything still works as intended. Is there another portion of the spec or 
> errata
> that requires ns_id? (I can't access 1.2.1 errta the link doesn't work).

As far as I can tell Security Send / Receive has always been intended to
apply to the whole controller, even if that's something I would not
personally think is a good idea.


Re: [PATCH v3 2/5] lib: Add Sed-opal library

2016-12-19 Thread Christoph Hellwig
On Mon, Dec 19, 2016 at 04:34:15PM -0500, Keith Busch wrote:
> This seems like an optional library that some environments may wish to
> opt-out of building into the kernel. Any reason not to add an entry into
> the Kconfig to turn this on/off?

This needs to be a CONFIG_BLOCK_SED / CONFIG_BLOCK_SED_OPAL and
should move to block/.


Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

2016-12-19 Thread Baolin Wang
Hi,

On 20 December 2016 at 12:29, Lu Baolu  wrote:
> Hi Mathias,
>
> On 12/19/2016 08:13 PM, Mathias Nyman wrote:
>> On 19.12.2016 13:34, Baolin Wang wrote:
>>> Hi Mathias,
>>>
>>> On 19 December 2016 at 18:33, Mathias Nyman
>>>  wrote:
 On 13.12.2016 05:21, Baolin Wang wrote:
>
> Hi Mathias,
>
> On 12 December 2016 at 23:52, Mathias Nyman
>  wrote:
>>
>> On 05.12.2016 09:51, Baolin Wang wrote:
>>>
>>>
>>> If a command event is found on the event ring during an interrupt,
>>> we need to stop the command timer with del_timer(). Since del_timer()
>>> can fail if the timer is running and waiting on the xHCI lock, then
>>> it maybe get the wrong timeout command in xhci_handle_command_timeout()
>>> if host fetched a new command and updated the xhci->current_cmd in
>>> handle_cmd_completion(). For this situation, we need a way to signal
>>> to the command timer that everything is fine and it should exit.
>>
>>
>>
>> Ah, right, this could actually happen.
>>
>>>
>>>
>>> We should introduce a counter (xhci->current_cmd_pending) for the number
>>> of pending commands. If we need to cancel the command timer and
>>> del_timer()
>>> succeeds, we decrement the number of pending commands. If del_timer()
>>> fails,
>>> we leave the number of pending commands alone.
>>>
>>> For handling timeout command, in xhci_handle_command_timeout() we will
>>> check
>>> the counter after decrementing it, if the counter
>>> (xhci->current_cmd_pending)
>>> is 0, which means xhci->current_cmd is the right timeout command. If the
>>> counter (xhci->current_cmd_pending) is greater than 0, which means
>>> current
>>> timeout command has been handled by host and host has fetched new
>>> command
>>> as
>>> xhci->current_cmd, then just return and wait for new current command.
>>
>>
>>
>> A counter like this could work.
>>
>> Writing the abort bit can generate either ABORT+STOP, or just STOP
>> event, this seems to cover both.
>>
>> quick check, case 1: timeout and cmd completion at the same time.
>>
>> cpu1cpu2
>>
>> queue_command(first), p++ (=1)
>> queue_command(more),
>> --completion irq fires---- timer times out at same time--
>> handle_cmd_completion() handle_cmd_timeout(),)
>> lock(xhci_lock  )   spin_on(xhci_lock)
>> del_timer() fail, p (=1, nochange)
>> cur_cmd = list_next(), p++ (=2)
>> unlock(xhci_lock)
>>   lock(xhci_lock)
>>   p-- (=1)
>>   if (p > 0), exit
>> OK works
>>
>> case 2: normal timeout case with ABORT+STOP, no race.
>>
>> cpu1cpu2
>>
>> queue_command(first), p++ (=1)
>> queue_command(more),
>>   handle_cmd_timeout()
>>   p-- (P=0), don't exit
>>   mod_timer(), p++ (P=1)
>>   write_abort_bit()
>> handle_cmd_comletion(ABORT)
>> del_timer(), ok, p-- (p = 0)
>> handle_cmd_completion(STOP)
>> del_timer(), fail, (P=0)
>> handle_stopped_cmd_ring()
>> cur_cmd = list_next(), p++ (=1)
>> mod_timer()
>>
>> OK, works, and same for just STOP case, with the only difference that
>> during handle_cmd_completion(STOP) p is decremented (p--)
>
>
> Yes, that's the cases what I want to handle, thanks for your explicit
> explanation.
>

 Gave this some more thought over the weekend, and this implementation
 doesn't solve the case when the last command times out and races with the
 completion handler:

 cpu1cpu2

 queue_command(first), p++ (=1)
 --completion irq fires---- timer times out at same time--
 handle_cmd_completion() handle_cmd_timeout(),)
 lock(xhci_lock )spin_on(xhci_lock)
 del_timer() fail, p (=1, nochange)
 no more commands, P (=1, nochange)
 unlock(xhci_lock)
  lock(xhci_lock)
  p-- (=0)
  p == 0, continue, even if we 
 should
 not.
For this we still need to rely 
 on
 checking cur_cmd == NULL in the timeout function.
 (Baolus patch sets it to NULL if there are no more commands pending)
>>>
>>> As I pointed out in patch 1 of this patchset, this patchset is based
>>> on Lu Baolu's new fix patch:
>>> usb: xhci: fix possible wild 

BQ27xxx registers

2016-12-19 Thread Chris Lapa

Hi,

I'm testing out the 4.9 kernel on a AM3359 board fitted with a 
BQ27510-G3 fuel gauge. The board previously worked on the 4.1 kernel, 
however on the 4.9 kernel the bq27xxx_battery.c driver is spitting out 
this error continuously:


power_supply bq27510-0: driver failed to report `charge_full_design' 
property: -121


I have narrowed down the problem to commit 099867a16 where the 
BQ27XXX_REG_DCAP value was changed from 0x2e to 0x3c.


I can generate a patch to fix this issue, however the bigger problem 
exists as to which revision fuel gauge the bq27xxx_battery.c driver is 
intended to support for each family.


Attached is a table I put together of all the register addresses for 
each supported device under the BQ27500 definition and what the current 
driver utilizes (I didn't paste it here as the word wrapping messes with 
the formatting).


There isn't really an ideal solution I can see where we keep the single 
BQ27500 definition and support all the functionality of each revision.


We can try and just support the latest revisions (BQ27510-G3 and 
BQ27520-G4) but I think it could hurt backwards compatibility for 
existing hardware.


I'm happy to work on a fix but just wanted to get some thoughts before 
proceeding.


Cheers,
Chris
RegisterBQ27500/1   BQ27510-G1  BQ27510-G2  
BQ27510-G3  BQ27520-G1  BQ27520-G2  BQ27520-G3  BQ27520-G4  
bq27xxx_battery.c
BQ27XXX_REG_CTRL0x000x000x000x00
0x000x000x000x000x00
BQ27XXX_REG_TEMP0x060x060x060x06
0x060x060x060x060x06
BQ27XXX_REG_INT_TEMP—   -   -   0x28
-   0x360x360x280x28
BQ27XXX_REG_VOLT0x080x080x080x08
0x080x080x080x080x08
BQ27XXX_REG_AI  0x140x140x140x14
0x140x140x140x140x14
BQ27XXX_REG_FLAGS   0x0a0x0a0x0a0x0a
0x0a0x0a0x0a0x0a0x0a
BQ27XXX_REG_TTE 0x160x160x160x16
0x160x160x160x160x16
BQ27XXX_REG_TTF 0x180x180x18—   
0x180x18-   -   -
BQ27XXX_REG_TTES0x1c0x1c0x1c0x1a
0x1c0x1c0x1c0x1a0x1a
BQ27XXX_REG_TTECP   0x260x260x26-   
0x260x260x26-   -
BQ27XXX_REG_NAC 0x0c0x0c0x0c0x0c
0x0c0x0c0x0c0x0c0x0c
BQ27XXX_REG_FCC 0x120x120x120x12
0x120x120x120x120x12
BQ27XXX_REG_CYCT0x2a0x2a0x2a0x1e
-   0x2a0x2a0x1e0x2a
BQ27XXX_REG_AE  0x220x220x22-   
0x220x220x22-   -
BQ27XXX_REG_SOC 0x2c0x2c0x2c0x20
0x2c0x2c0x2c0x200x2c
BQ27XXX_REG_DCAP0x3c0x3c0x3c0x2e
0x3c0x3c0x3c-   0x3c
BQ27XXX_REG_AP  0x240x240x24-   
0x240x240x24-   -

[PATCH v2 2/2] usb: host: xhci: Handle the right timeout command

2016-12-19 Thread Baolin Wang
If a command event is found on the event ring during an interrupt,
we need to stop the command timer with del_timer(). Since del_timer()
can fail if the timer is running and waiting on the xHCI lock, then
it maybe get the wrong timeout command in xhci_handle_command_timeout()
if host fetched a new command and updated the xhci->current_cmd in
handle_cmd_completion(). For this situation, we need a way to signal
to the command timer that everything is fine and it should exit.

We should check if the command timer is pending in xhci_handle_command_timeout()
function, if the command timer is pending, which means current timeout
command has been handled by host and host has fetched new command and
re-added the command timer, then just return and wait for new current
command. If not, it means current command is timeout and need to be
handled.

Signed-off-by: Baolin Wang 
---
Changes since v1:
 - Remove the counter and just check if the command timer is pending.
---
 drivers/usb/host/xhci-ring.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 9965a4c..3947344 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1269,7 +1269,7 @@ void xhci_handle_command_timeout(unsigned long data)
xhci = (struct xhci_hcd *) data;
 
spin_lock_irqsave(&xhci->lock, flags);
-   if (!xhci->current_cmd) {
+   if (!xhci->current_cmd || timer_pending(&xhci->cmd_timer)) {
spin_unlock_irqrestore(&xhci->lock, flags);
return;
}
-- 
1.7.9.5



Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()

2016-12-19 Thread Boqun Feng
On Mon, Dec 19, 2016 at 09:09:13PM -0800, Paul E. McKenney wrote:
> On Mon, Dec 19, 2016 at 11:15:15PM +0800, Boqun Feng wrote:
> > On Thu, Dec 15, 2016 at 02:51:36PM +, Colin Ian King wrote:
> > > On 15/12/16 14:42, Boqun Feng wrote:
> > > > On Thu, Dec 15, 2016 at 12:04:59PM +, Mark Rutland wrote:
> > > >> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
> > > >>> ->qsmask of an RCU leaf node is usually more sparse than the
> > > >>> corresponding cpu_possible_mask. So replace the
> > > >>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> > > >>> for_each_leaf_node_cpu() to save several checks.
> > > >>>
> > > >>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate 
> > > >>> the
> > > >>> corresponding mask for a bit because @mask is unsigned long, this was
> > > >>> spotted by Colin Ian King  and CoverityScan 
> > > >>> in
> > > >>> a previous version of this patch.]
> > > >>
> > > >> Nit: This note can go now that we use leaf_node_cpu_bit(). ;)
> > > >>
> > > > 
> > > > I kinda keep this here for honoring the effort of finding out this bug
> > > > from Colin, but yes, it's no longer needed here for the current code.
> > > 
> > > Yep, remove it.
> > > 
> > 
> > Paul, here is a modified version of this patch, what I only did is
> > removing this note.
> > 
> > Besides I rebased the whole series on the current rcu/dev branch of -rcu
> > tree, on this very commit:
> > 
> > 8e9b2521b18a ("doc: Quick-Quiz answers are now inline")
> > 
> > And I put the latest version at
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git leaf-node
> > 
> > If you thought it's better, I could send a v3 ;-)
> 
> I would feel better about this patchset if it reduced the number of lines
> of code rather than increasing them.  That said, part of the increase
> is a commment.  Still, I am not convinced that the extra level of macro
> is carrying its weight.
> 
> dbf18a2422e2 ("rcu: Introduce for_each_leaf_node_cpu()")
> 
>   The commit log needs a bit of wordsmithing.
> 
>   The added WARN_ON_ONCE(!cpu_possible(cpu)) still seems strange.
>   What is its purpose, really?  What does its triggering tell you?
>   What other checks did you consider as an alternative?
> 

The check is an over-case one, it's introduced because I'm worried about
some code outside the RCU code mis-sets the ->qsmask* or ->expmask* on
an "impossible" CPU. I will explanation later in more details.

>   And if you are going to add checks of this type, should you
>   also check for this being a leaf rcu_node structure?
> 

I don't think I want to check that, and I don't think check
cpu_possible(cpu) in the macro is similar to that.

> 3f0b4ba1fe94 ("rcu: Use for_each_leaf_node_cpu() in RCU stall checking")
> 
>   This does look a bit nicer, but why the added blank lines?
>   Are they really helping?
> 
>   The commit log seems a bit misplaced.  This code is almost never
>   executed (once per 21 seconds at the most), so performance really
>   isn't a consideration.  The simpler-looking code might be.
> 
> fd799f1ac7b7 ("rcu: Use for_each_leaf_node_cpu() in ->expmask iteration")
> 
>   Ditto on blank lines.
> 
>   Again, this code is executed per expedited grace period, so
>   performance really isn't a big deal.  More of a big deal than
>   the stall-warning code, but we still are way off of any fastpath.
> 
> 69a1baedbf42 ("rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()")
> 
>   Ditto again on blank lines.
> 
>   And on the commit log.  This code is executed about once
>   per several jiffies, and on larger machines, per 20 jiffies
>   or so.  Performance really isn't a consideration.
> 
> 7b00e50e3efb ("rcu: Use for_each_leaf_node_cpu() in online CPU iteration")
> 
>   And another ditto on blank lines.
> 
>   This code executes once per CPU-hotplug operation, so again isn't
>   at all performance critical.
> 
> In short, if you are trying to sell this to me as a significant performance
> boost, I am not buying.  The added WARN_ON_ONCE() looks quite dubious,

Yep, it won't help the performance a lot, but it 

1)  helps the performance in theory, because it iterates less CPUs

2)  makes code cleaner. By "cleaner", I mean we can a) affort more
blank lines to make loops separated from other code and b)
descrease the indent levels for those loops. But, yes I should
add those points in the commit log, because those are more
visible effects.

> though perhaps I am misunderstanding its purpose.  My assumption is
> that you want to detect missing UL suffixes on bitmask constants, in
> which case I bet there is a better way.
> 

The WARN_ON_ONCE() is not for detecting missing UL suffixes on bitmask
constatns, and we don't need to check that, because we use
leaf_node_cpu_id() now. As I said, this is an over-case check, and we
can drop if we guarante that CPUs masked in ->

Re: [PATCH v2] fscrypt: Factor out bio specific functions

2016-12-19 Thread Christoph Hellwig
Looks fine,

Reviewed-by: Christoph Hellwig 


[PATCH 4/4] mmc: sdhci-msm: provide enhanced_strobe mode feature support

2016-12-19 Thread Ritesh Harjani
This provide support for enhanced_strobe feature to sdhci-msm.

Signed-off-by: Ritesh Harjani 
---
 drivers/mmc/host/sdhci-msm.c | 32 +++-
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c
index 32879b8..d092a16 100644
--- a/drivers/mmc/host/sdhci-msm.c
+++ b/drivers/mmc/host/sdhci-msm.c
@@ -102,6 +102,7 @@
 
 #define CORE_DDR_200_CFG   0x184
 #define CORE_CDC_T4_DLY_SELBIT(0)
+#define CORE_CMDIN_RCLK_EN BIT(1)
 #define CORE_START_CDC_TRAFFIC BIT(6)
 #define CORE_VENDOR_SPEC3  0x1b0
 #define CORE_PWRSAVE_DLL   BIT(3)
@@ -579,6 +580,7 @@ static int sdhci_msm_cdclp533_calibration(struct sdhci_host 
*host)
 
 static int sdhci_msm_cm_dll_sdc4_calibration(struct sdhci_host *host)
 {
+   struct mmc_host *mmc = host->mmc;
u32 dll_status, config;
int ret;
 
@@ -593,6 +595,12 @@ static int sdhci_msm_cm_dll_sdc4_calibration(struct 
sdhci_host *host)
 */
writel_relaxed(DDR_CONFIG_POR_VAL, host->ioaddr + CORE_DDR_CONFIG);
 
+   if (mmc->ios.enhanced_strobe) {
+   config = readl_relaxed(host->ioaddr + CORE_DDR_200_CFG);
+   config |= CORE_CMDIN_RCLK_EN;
+   writel_relaxed(config, host->ioaddr + CORE_DDR_200_CFG);
+   }
+
config = readl_relaxed(host->ioaddr + CORE_DLL_CONFIG_2);
config |= CORE_DDR_CAL_EN;
writel_relaxed(config, host->ioaddr + CORE_DLL_CONFIG_2);
@@ -627,6 +635,7 @@ static int sdhci_msm_hs400_dll_calibration(struct 
sdhci_host *host)
 {
struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
struct sdhci_msm_host *msm_host = sdhci_pltfm_priv(pltfm_host);
+   struct mmc_host *mmc = host->mmc;
int ret;
u32 config;
 
@@ -640,14 +649,17 @@ static int sdhci_msm_hs400_dll_calibration(struct 
sdhci_host *host)
if (ret)
goto out;
 
-   /* Set the selected phase in delay line hw block */
-   ret = msm_config_cm_dll_phase(host, msm_host->saved_tuning_phase);
-   if (ret)
-   goto out;
+   if (!mmc->ios.enhanced_strobe) {
+   /* Set the selected phase in delay line hw block */
+   ret = msm_config_cm_dll_phase(host,
+ msm_host->saved_tuning_phase);
+   if (ret)
+   goto out;
+   config = readl_relaxed(host->ioaddr + CORE_DLL_CONFIG);
+   config |= CORE_CMD_DAT_TRACK_SEL;
+   writel_relaxed(config, host->ioaddr + CORE_DLL_CONFIG);
+   }
 
-   config = readl_relaxed(host->ioaddr + CORE_DLL_CONFIG);
-   config |= CORE_CMD_DAT_TRACK_SEL;
-   writel_relaxed(config, host->ioaddr + CORE_DLL_CONFIG);
if (msm_host->use_cdclp533)
ret = sdhci_msm_cdclp533_calibration(host);
else
@@ -802,7 +814,8 @@ static void sdhci_msm_set_uhs_signaling(struct sdhci_host 
*host,
spin_unlock_irq(&host->lock);
/* CDCLP533 HW calibration is only required for HS400 mode*/
if (host->clock > CORE_FREQ_100MHZ &&
-   msm_host->tuning_done && !msm_host->calibration_done &&
+   (msm_host->tuning_done || mmc->ios.enhanced_strobe) &&
+   !msm_host->calibration_done &&
mmc->ios.timing == MMC_TIMING_MMC_HS400)
if (!sdhci_msm_hs400_dll_calibration(host))
msm_host->calibration_done = true;
@@ -941,7 +954,8 @@ static void sdhci_msm_set_clock(struct sdhci_host *host, 
unsigned int clock)
 * Select HS400 mode using the HC_SELECT_IN from VENDOR SPEC
 * register
 */
-   if (msm_host->tuning_done && !msm_host->calibration_done) {
+   if ((msm_host->tuning_done || curr_ios.enhanced_strobe) &&
+   !msm_host->calibration_done) {
/*
 * Write 0x6 to HC_SELECT_IN and 1 to HC_SELECT_IN_EN
 * field in VENDOR_SPEC_FUNC
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
a Linux Foundation Collaborative Project.



[PATCH 2/4] mmc: mmc: Add change to set controller to HS400 frequency in enhanced strobe

2016-12-19 Thread Ritesh Harjani
Currently mmc_select_hs400es is not setting the desired frequency
before sending switch command. This makes CMD13 to timeout on
some controllers.
Thus add a change to set the desired HS400 frequency
in mmc_select_hs400es when the timing is switched to HS400.

Signed-off-by: Ritesh Harjani 
---
 drivers/mmc/core/mmc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index b61b52f9..eb69497 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1329,6 +1329,7 @@ static int mmc_select_hs400es(struct mmc_card *card)
 
/* Set host controller to HS400 timing and frequency */
mmc_set_timing(host, MMC_TIMING_MMC_HS400);
+   mmc_set_bus_speed(card);
 
/* Controller enable enhanced strobe function */
host->ios.enhanced_strobe = true;
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
a Linux Foundation Collaborative Project.



[PATCH 3/4] mmc: mmc: enable ios.enhanced_strobe before mmc_set_timing

2016-12-19 Thread Ritesh Harjani
Some controllers may need to configure few registers based on enhanced
strobe mode while configuring to HS400 timing, thus make
ios.enhanced_strobe to true before mmc_set_timing in mmc_select_hs400es.

Signed-off-by: Ritesh Harjani 
---
 drivers/mmc/core/mmc.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index eb69497..052368e 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1327,12 +1327,18 @@ static int mmc_select_hs400es(struct mmc_card *card)
goto out_err;
}
 
+   /*
+* Enable enhanced_strobe in ios, as some controllers
+* may need to configure few registers based on enhanced
+* strobe while changing HS400 timing.
+*/
+   host->ios.enhanced_strobe = true;
+
/* Set host controller to HS400 timing and frequency */
mmc_set_timing(host, MMC_TIMING_MMC_HS400);
mmc_set_bus_speed(card);
 
/* Controller enable enhanced strobe function */
-   host->ios.enhanced_strobe = true;
if (host->ops->hs400_enhanced_strobe)
host->ops->hs400_enhanced_strobe(host, &host->ios);
 
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
a Linux Foundation Collaborative Project.



[PATCH 1/4] mmc: core: Return from mmc_set_clock if hz is already set to ios.clock

2016-12-19 Thread Ritesh Harjani
Signed-off-by: Ritesh Harjani 
---
 drivers/mmc/core/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 543eadd..125f8a9 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -1139,6 +1139,9 @@ void mmc_set_clock(struct mmc_host *host, unsigned int hz)
if (hz > host->f_max)
hz = host->f_max;
 
+   if (hz == host->ios.clock)
+   return;
+
host->ios.clock = hz;
mmc_set_ios(host);
 }
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
a Linux Foundation Collaborative Project.



[PATCH 0/4] mmc: Provide enhanced strobe support for sdhci-msm

2016-12-19 Thread Ritesh Harjani
Hi, 

Patch series is to enable enhanced strobe feature for sdhci-msm driver.
But there are also some changes done in core layer for this.

This was tested on msm8996 based interal target with emmc-5.1 card.

Ritesh Harjani (4):
  mmc: core: Return from mmc_set_clock if hz is already set to ios.clock
  mmc: mmc: Add change to set controller to HS400 frequency in enhanced
strobe
  mmc: mmc: enable ios.enhanced_strobe before mmc_set_timing
  mmc: sdhci-msm: provide enhanced_strobe mode feature support

 drivers/mmc/core/core.c  |  3 +++
 drivers/mmc/core/mmc.c   |  9 -
 drivers/mmc/host/sdhci-msm.c | 32 +++-
 3 files changed, 34 insertions(+), 10 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
a Linux Foundation Collaborative Project.



[PATCHv2 11/11] powerpc/kvm: Advertise availablity of HPT resizing on KVM HV

2016-12-19 Thread David Gibson
This updates the KVM_CAP_SPAPR_RESIZE_HPT capability to advertise the
presence of in-kernel HPT resizing on KVM HV.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/powerpc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index efd1183..1f32235 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -605,6 +605,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_SPAPR_MULTITCE:
r = 1;
break;
+   case KVM_CAP_SPAPR_RESIZE_HPT:
+   r = !!hv_enabled;
+   break;
 #endif
case KVM_CAP_PPC_HTM:
r = cpu_has_feature(CPU_FTR_TM_COMP) &&
-- 
2.9.3



[PATCHv2 06/11] powerpc/kvm: Split HPT allocation from activation

2016-12-19 Thread David Gibson
Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT)
and sets it up as the active page table for a VM.  For the upcoming HPT
resize implementation we're going to want to allocate HPTs separately from
activating them.

So, split the allocation itself out into kvmppc_allocate_hpt() and perform
the activation with a new kvmppc_set_hpt() function.  Likewise we split
kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt()
which unsets it as an active HPT, then frees it.

We also move the logic to fall back to smaller HPT sizes if the first try
fails into the single caller which used that behaviour,
kvmppc_hv_setup_htab_rma().  This introduces a slight semantic change, in
that previously if the initial attempt at CMA allocation failed, we would
fall back to attempting smaller sizes with the page allocator.  Now, we
try first CMA, then the page allocator at each size.  As far as I can tell
this change should be harmless.

To match, we make kvmppc_free_hpt() just free the actual HPT itself.  The
call to kvmppc_free_lpid() that was there, we move to the single caller.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  4 ++
 arch/powerpc/include/asm/kvm_ppc.h   |  5 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 96 +++-
 arch/powerpc/kvm/book3s_hv.c | 18 +-
 4 files changed, 68 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 8810327..bcb3e0a 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -22,6 +22,10 @@
 
 #include 
 
+/* Power architecture requires HPT is at least 256kiB, at most 64TiB */
+#define PPC_MIN_HPT_ORDER  18
+#define PPC_MAX_HPT_ORDER  46
+
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu 
*vcpu)
 {
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 3db6be9..41575b8 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -155,9 +155,10 @@ extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
 extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);
 extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
 
-extern long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp);
+extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
+extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
 extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
-extern void kvmppc_free_hpt(struct kvm *kvm);
+extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
 extern long kvmppc_prepare_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
 extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index fe88132..4683d87 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -40,74 +40,66 @@
 
 #include "trace_hv.h"
 
-/* Power architecture requires HPT is at least 256kB */
-#define PPC_MIN_HPT_ORDER  18
-
 static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
long pte_index, unsigned long pteh,
unsigned long ptel, unsigned long *pte_idx_ret);
 static void kvmppc_rmap_reset(struct kvm *kvm);
 
-long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
+int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order)
 {
unsigned long hpt = 0;
-   struct revmap_entry *rev;
+   int cma = 0;
struct page *page = NULL;
-   long order = KVM_DEFAULT_HPT_ORDER;
+   struct revmap_entry *rev;
+   unsigned long npte;
 
-   if (htab_orderp) {
-   order = *htab_orderp;
-   if (order < PPC_MIN_HPT_ORDER)
-   order = PPC_MIN_HPT_ORDER;
-   }
+   if ((order < PPC_MIN_HPT_ORDER) || (order > PPC_MAX_HPT_ORDER))
+   return -EINVAL;
 
-   kvm->arch.hpt.cma = 0;
page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
memset((void *)hpt, 0, (1ul << order));
-   kvm->arch.hpt.cma = 1;
+   cma = 1;
}
 
-   /* Lastly try successively smaller sizes from the page allocator */
-   /* Only do this if userspace didn't specify a size via ioctl */
-   while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
-   hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
-  __GFP_NOWARN, order - PAGE_SHIFT);
-   if (!hpt)
-   --order;
-   }
+   if (!hpt)
+   hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT

[PATCHv2 02/11] powerpc/kvm: HPT resizing documentation and reserved numbers

2016-12-19 Thread David Gibson
This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
advertise whether KVM is capable of handling the PAPR extensions for
resizing the hashed page table during guest runtime.  It also adds
definitions for two new VM ioctl()s to implement this extension, and
documentation of the same.

Note that, HPT resizing is already possible with KVM PR without kernel
modification, since the HPT is managed within userspace (qemu).  The
capability defined here will only be set where an in-kernel implementation
of resizing is necessary, i.e. for KVM HV.  To determine if the userspace
resize implementation can be used, it's necessary to check
KVM_CAP_PPC_ALLOC_HTAB.  Unfortunately older kernels incorrectly set
KVM_CAP_PPC_ALLOC_HTAB even with KVM PR.  If userspace it want to support
resizing with KVM PR on such kernels, it will need a workaround.

Signed-off-by: David Gibson 
---
 Documentation/virtual/kvm/api.txt | 95 +++
 include/uapi/linux/kvm.h  | 11 +
 2 files changed, 106 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 42be7b1..f452c72 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3201,6 +3201,101 @@ struct kvm_reinject_control {
 pit_reinject = 0 (!reinject mode) is recommended, unless running an old
 operating system that uses the PIT for timing (e.g. Linux 2.4.x).
 
+4.100 KVM_PPC_RESIZE_HPT_PREPARE
+
+Capability: KVM_CAP_SPAPR_RESIZE_HPT
+Architectures: powerpc
+Type: vm ioctl
+Parameters: struct kvm_ppc_resize_hpt (in)
+Returns: 0 on successful completion,
+>0 if a new HPT is being prepared, the value is an estimated
+ number of milliseconds until preparation is complete
+ -EFAULT if struct kvm_reinject_control cannot be read,
+-EINVAL if the supplied shift or flags are invalid
+-ENOMEM if unable to allocate the new HPT
+-ENOSPC if there was a hash collision when moving existing
+  HPT entries to the new HPT
+-EIO on other error conditions
+
+Used to implement the PAPR extension for runtime resizing of a guest's
+Hashed Page Table (HPT).  Specifically this starts, stops or monitors
+the preparation of a new potential HPT for the guest, essentially
+implementing the H_RESIZE_HPT_PREPARE hypercall.
+
+If called with shift > 0 when there is no pending HPT for the guest,
+this begins preparation of a new pending HPT of size 2^(shift) bytes.
+It then returns a positive integer with the estimated number of
+milliseconds until preparation is complete.
+
+If called when there is a pending HPT whose size does not match that
+requested in the parameters, discards the existing pending HPT and
+creates a new one as above.
+
+If called when there is a pending HPT of the size requested, will:
+  * If preparation of the pending HPT is already complete, return 0
+  * If preparation of the pending HPT has failed, return an error
+code, then discard the pending HPT.
+  * If preparation of the pending HPT is still in progress, return an
+estimated number of milliseconds until preparation is complete.
+
+If called with shift == 0, discards any currently pending HPT and
+returns 0 (i.e. cancels any in-progress preparation).
+
+flags is reserved for future expansion, currently setting any bits in
+flags will result in an -EINVAL.
+
+Normally this will be called repeatedly with the same parameters until
+it returns <= 0.  The first call will initiate preparation, subsequent
+ones will monitor preparation until it completes or fails.
+
+struct kvm_ppc_resize_hpt {
+   __u64 flags;
+   __u32 shift;
+   __u32 pad;
+};
+
+4.100 KVM_PPC_RESIZE_HPT_COMMIT
+
+Capability: KVM_CAP_SPAPR_RESIZE_HPT
+Architectures: powerpc
+Type: vm ioctl
+Parameters: struct kvm_ppc_resize_hpt (in)
+Returns: 0 on successful completion,
+ -EFAULT if struct kvm_reinject_control cannot be read,
+-EINVAL if the supplied shift or flags are invalid
+-ENXIO is there is no pending HPT, or the pending HPT doesn't
+ have the requested size
+-EBUSY if the pending HPT is not fully prepared
+-ENOSPC if there was a hash collision when moving existing
+  HPT entries to the new HPT
+-EIO on other error conditions
+
+Used to implement the PAPR extension for runtime resizing of a guest's
+Hashed Page Table (HPT).  Specifically this requests that the guest be
+transferred to working with the new HPT, essentially implementing the
+H_RESIZE_HPT_COMMIT hypercall.
+
+This should only be called after KVM_PPC_RESIZE_HPT_PREPARE has
+returned 0 with the same parameters.  In other cases
+KVM_PPC_RESIZE_HPT_COMMIT will return an error (usually -ENXIO or
+-EBUSY, though others may be possible if the preparation was started,
+but failed).
+
+This will have undefined effects on the guest if it has not already
+placed itself in a quiescent state where no 

[PATCHv2 10/11] powerpc/kvm: KVM-HV HPT resizing implementation

2016-12-19 Thread David Gibson
This adds the "guts" of the implementation for the HPT resizing PAPR
extension.  It has the code to allocate and clear a new HPT, rehash an
existing HPT's entries into it, and accomplish the switchover for a
KVM guest from the old HPT to the new one.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 188 +++-
 1 file changed, 187 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 86a1bcb..e22776e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -66,6 +66,10 @@ struct kvm_resize_hpt {
/* These fields protected by kvm->lock */
int error;
bool prepare_done;
+
+   /* Private to the work thread, until prepare_done is true,
+* then protected by kvm->resize_hpt_sem */
+   struct kvm_hpt_info hpt;
 };
 
 static void kvmppc_rmap_reset(struct kvm *kvm);
@@ -1187,21 +1191,203 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void 
*va, unsigned long gpa,
  */
 static int resize_hpt_allocate(struct kvm_resize_hpt *resize)
 {
+   int rc;
+
+   rc = kvmppc_allocate_hpt(&resize->hpt, resize->order);
+   if (rc < 0)
+   return rc;
+
+   resize_hpt_debug(resize, "resize_hpt_allocate(): HPT @ 0x%lx\n",
+resize->hpt.virt);
+
return 0;
 }
 
+static unsigned long resize_hpt_rehash_hpte(struct kvm_resize_hpt *resize,
+   unsigned long idx)
+{
+   struct kvm *kvm = resize->kvm;
+   struct kvm_hpt_info *old = &kvm->arch.hpt;
+   struct kvm_hpt_info *new = &resize->hpt;
+   unsigned long old_hash_mask = (1ULL << (old->order - 7)) - 1;
+   unsigned long new_hash_mask = (1ULL << (new->order - 7)) - 1;
+   __be64 *hptep, *new_hptep;
+   unsigned long vpte, rpte, guest_rpte;
+   int ret;
+   struct revmap_entry *rev;
+   unsigned long apsize, psize, avpn, pteg, hash;
+   unsigned long new_idx, new_pteg, replace_vpte;
+
+   hptep = (__be64 *)(old->virt + (idx << 4));
+
+   /* Guest is stopped, so new HPTEs can't be added or faulted
+* in, only unmapped or altered by host actions.  So, it's
+* safe to check this before we take the HPTE lock */
+   vpte = be64_to_cpu(hptep[0]);
+   if (!(vpte & HPTE_V_VALID) && !(vpte & HPTE_V_ABSENT))
+   return 0; /* nothing to do */
+
+   while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
+   cpu_relax();
+
+   vpte = be64_to_cpu(hptep[0]);
+
+   ret = 0;
+   if (!(vpte & HPTE_V_VALID) && !(vpte & HPTE_V_ABSENT))
+   /* Nothing to do */
+   goto out;
+
+   /* Unmap */
+   rev = &old->rev[idx];
+   guest_rpte = rev->guest_rpte;
+
+   ret = -EIO;
+   apsize = hpte_page_size(vpte, guest_rpte);
+   if (!apsize)
+   goto out;
+
+   if (vpte & HPTE_V_VALID) {
+   unsigned long gfn = hpte_rpn(guest_rpte, apsize);
+   int srcu_idx = srcu_read_lock(&kvm->srcu);
+   struct kvm_memory_slot *memslot =
+   __gfn_to_memslot(kvm_memslots(kvm), gfn);
+
+   if (memslot) {
+   unsigned long *rmapp;
+   rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn];
+
+   lock_rmap(rmapp);
+   kvmppc_unmap_hpte(kvm, idx, rmapp, gfn);
+   unlock_rmap(rmapp);
+   }
+
+   srcu_read_unlock(&kvm->srcu, srcu_idx);
+   }
+
+   /* Reload PTE after unmap */
+   vpte = be64_to_cpu(hptep[0]);
+
+   BUG_ON(vpte & HPTE_V_VALID);
+   BUG_ON(!(vpte & HPTE_V_ABSENT));
+
+   ret = 0;
+   if (!(vpte & HPTE_V_BOLTED))
+   goto out;
+
+   rpte = be64_to_cpu(hptep[1]);
+   psize = hpte_base_page_size(vpte, rpte);
+   avpn = HPTE_V_AVPN_VAL(vpte) & ~((psize - 1) >> 23);
+   pteg = idx / HPTES_PER_GROUP;
+   if (vpte & HPTE_V_SECONDARY)
+   pteg = ~pteg;
+
+   if (!(vpte & HPTE_V_1TB_SEG)) {
+   unsigned long offset, vsid;
+
+   /* We only have 28 - 23 bits of offset in avpn */
+   offset = (avpn & 0x1f) << 23;
+   vsid = avpn >> 5;
+   /* We can find more bits from the pteg value */
+   if (psize < (1ULL << 23))
+   offset |= ((vsid ^ pteg) & old_hash_mask) * psize;
+
+   hash = vsid ^ (offset / psize);
+   } else {
+   unsigned long offset, vsid;
+
+   /* We only have 40 - 23 bits of seg_off in avpn */
+   offset = (avpn & 0x1) << 23;
+   vsid = avpn >> 17;
+   if (psize < (1ULL << 23))
+   offset |= ((vsid ^ (vsid << 25) ^ pteg) & 
old_hash_mask) * psize;
+
+   hash = vsid ^ (vsid << 25) ^ (offset / psize);
+  

[PATCHv2 05/11] powerpc/kvm: Don't store values derivable from HPT order

2016-12-19 Thread David Gibson
Currently the kvm_hpt_info structure stores the hashed page table's order,
and also the number of HPTEs it contains and a mask for its size.  The
last two can be easily derived from the order, so remove them and just
calculate them as necessary with a couple of helper inlines.

Signed-off-by: David Gibson 
Reviewed-by: Thomas Huth 
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 12 
 arch/powerpc/include/asm/kvm_host.h  |  2 --
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 28 +---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 18 +-
 4 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 8482921..8810327 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -350,6 +350,18 @@ extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 
 extern void kvmhv_rm_send_ipi(int cpu);
 
+static inline unsigned long kvmppc_hpt_npte(struct kvm_hpt_info *hpt)
+{
+   /* HPTEs are 2**4 bytes long */
+   return 1UL << (hpt->order - 4);
+}
+
+static inline unsigned long kvmppc_hpt_mask(struct kvm_hpt_info *hpt)
+{
+   /* 128 (2**7) bytes in each HPTEG */
+   return (1UL << (hpt->order - 7)) - 1;
+}
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 074b17f..7576e0e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -246,8 +246,6 @@ struct kvm_hpt_info {
unsigned long virt;
/* Array of reverse mapping entries for each guest HPTE */
struct revmap_entry *rev;
-   unsigned long npte;
-   unsigned long mask;
/* Guest HPT size is 2**(order) bytes */
u32 order;
/* 1 if HPT allocated with CMA, 0 otherwise */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index b5799d1..fe88132 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -83,15 +83,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 
kvm->arch.hpt.virt = hpt;
kvm->arch.hpt.order = order;
-   /* HPTEs are 2**4 bytes long */
-   kvm->arch.hpt.npte = 1ul << (order - 4);
-   /* 128 (2**7) bytes in each HPTEG */
-   kvm->arch.hpt.mask = (1ul << (order - 7)) - 1;
 
atomic64_set(&kvm->arch.mmio_update, 0);
 
/* Allocate reverse map array */
-   rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte);
+   rev = vmalloc(sizeof(struct revmap_entry) * 
kvmppc_hpt_npte(&kvm->arch.hpt));
if (!rev) {
pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
goto out_freehpt;
@@ -194,8 +190,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
kvm_memory_slot *memslot,
if (npages > 1ul << (40 - porder))
npages = 1ul << (40 - porder);
/* Can't use more than 1 HPTE per HPTEG */
-   if (npages > kvm->arch.hpt.mask + 1)
-   npages = kvm->arch.hpt.mask + 1;
+   if (npages > kvmppc_hpt_mask(&kvm->arch.hpt) + 1)
+   npages = kvmppc_hpt_mask(&kvm->arch.hpt) + 1;
 
hp0 = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
HPTE_V_BOLTED | hpte0_pgsize_encoding(psize);
@@ -205,7 +201,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
kvm_memory_slot *memslot,
for (i = 0; i < npages; ++i) {
addr = i << porder;
/* can't use hpt_hash since va > 64 bits */
-   hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & 
kvm->arch.hpt.mask;
+   hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25)))
+   & kvmppc_hpt_mask(&kvm->arch.hpt);
/*
 * We assume that the hash table is empty and no
 * vcpus are using it at this stage.  Since we create
@@ -1306,7 +1303,7 @@ static ssize_t kvm_htab_read(struct file *file, char 
__user *buf,
 
/* Skip uninteresting entries, i.e. clean on not-first pass */
if (!first_pass) {
-   while (i < kvm->arch.hpt.npte &&
+   while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
   !hpte_dirty(revp, hptp)) {
++i;
hptp += 2;
@@ -1316,7 +1313,7 @@ static ssize_t kvm_htab_read(struct file *file, char 
__user *buf,
hdr.index = i;
 
/* Grab a series of valid entries */
-   while (i < kvm->arch.hpt.npte &&
+   while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
   hdr.n_valid < 0x &&
   nb + HPTE_SIZE < count &&
   record_hpte(flags, hptp, hpte, revp, 1, first_pass)) {
@@ -1332,7 +1329,7 @@ st

[PATCHv2 03/11] powerpc/kvm: Rename kvm_alloc_hpt() for clarity

2016-12-19 Thread David Gibson
The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at
all obvious from the name.  In practice kvmppc_alloc_hpt() allocates an HPT
by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate
it with CMA only.

To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma().
Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma().

Signed-off-by: David Gibson 
Reviewed-by: Thomas Huth 
---
 arch/powerpc/include/asm/kvm_ppc.h   | 4 ++--
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 8 
 arch/powerpc/kvm/book3s_hv_builtin.c | 8 
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 2da67bf..3db6be9 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -186,8 +186,8 @@ extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
unsigned long tce_value, unsigned long npages);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 unsigned long ioba);
-extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
-extern void kvm_release_hpt(struct page *page, unsigned long nr_pages);
+extern struct page *kvm_alloc_hpt_cma(unsigned long nr_pages);
+extern void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages);
 extern int kvmppc_core_init_vm(struct kvm *kvm);
 extern void kvmppc_core_destroy_vm(struct kvm *kvm);
 extern void kvmppc_core_free_memslot(struct kvm *kvm,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index b795dd1..ae17cdd 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -62,7 +62,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
 
kvm->arch.hpt_cma_alloc = 0;
-   page = kvm_alloc_hpt(1ul << (order - PAGE_SHIFT));
+   page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
memset((void *)hpt, 0, (1ul << order));
@@ -108,7 +108,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 
  out_freehpt:
if (kvm->arch.hpt_cma_alloc)
-   kvm_release_hpt(page, 1 << (order - PAGE_SHIFT));
+   kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
else
free_pages(hpt, order - PAGE_SHIFT);
return -ENOMEM;
@@ -155,8 +155,8 @@ void kvmppc_free_hpt(struct kvm *kvm)
kvmppc_free_lpid(kvm->arch.lpid);
vfree(kvm->arch.revmap);
if (kvm->arch.hpt_cma_alloc)
-   kvm_release_hpt(virt_to_page(kvm->arch.hpt_virt),
-   1 << (kvm->arch.hpt_order - PAGE_SHIFT));
+   kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt_virt),
+1 << (kvm->arch.hpt_order - PAGE_SHIFT));
else
free_pages(kvm->arch.hpt_virt,
   kvm->arch.hpt_order - PAGE_SHIFT);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 5bb24be..4c4aa47 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -52,19 +52,19 @@ static int __init early_parse_kvm_cma_resv(char *p)
 }
 early_param("kvm_cma_resv_ratio", early_parse_kvm_cma_resv);
 
-struct page *kvm_alloc_hpt(unsigned long nr_pages)
+struct page *kvm_alloc_hpt_cma(unsigned long nr_pages)
 {
VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 
return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES));
 }
-EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
+EXPORT_SYMBOL_GPL(kvm_alloc_hpt_cma);
 
-void kvm_release_hpt(struct page *page, unsigned long nr_pages)
+void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages)
 {
cma_release(kvm_cma, page, nr_pages);
 }
-EXPORT_SYMBOL_GPL(kvm_release_hpt);
+EXPORT_SYMBOL_GPL(kvm_free_hpt_cma);
 
 /**
  * kvm_cma_reserve() - reserve area for kvm hash pagetable
-- 
2.9.3



[PATCHv2 04/11] powerpc/kvm: Gather HPT related variables into sub-structure

2016-12-19 Thread David Gibson
Currently, the powerpc kvm_arch structure contains a number of variables
tracking the state of the guest's hashed page table (HPT) in KVM HV.  This
patch gathers them all together into a single kvm_hpt_info substructure.
This makes life more convenient for the upcoming HPT resizing
implementation.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/kvm_host.h | 20 ++---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 90 ++---
 arch/powerpc/kvm/book3s_hv.c|  2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 -
 4 files changed, 91 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index e59b172..074b17f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -241,12 +241,24 @@ struct kvm_arch_memory_slot {
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 };
 
+struct kvm_hpt_info {
+   /* Host virtual (linear mapping) address of guest HPT */
+   unsigned long virt;
+   /* Array of reverse mapping entries for each guest HPTE */
+   struct revmap_entry *rev;
+   unsigned long npte;
+   unsigned long mask;
+   /* Guest HPT size is 2**(order) bytes */
+   u32 order;
+   /* 1 if HPT allocated with CMA, 0 otherwise */
+   int cma;
+};
+
 struct kvm_arch {
unsigned int lpid;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
unsigned int tlb_sets;
-   unsigned long hpt_virt;
-   struct revmap_entry *revmap;
+   struct kvm_hpt_info hpt;
atomic64_t mmio_update;
unsigned int host_lpid;
unsigned long host_lpcr;
@@ -256,14 +268,10 @@ struct kvm_arch {
unsigned long lpcr;
unsigned long vrma_slb_v;
int hpte_setup_done;
-   u32 hpt_order;
atomic_t vcpus_running;
u32 online_vcores;
-   unsigned long hpt_npte;
-   unsigned long hpt_mask;
atomic_t hpte_mod_interest;
cpumask_t need_tlb_flush;
-   int hpt_cma_alloc;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index ae17cdd..b5799d1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -61,12 +61,12 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
order = PPC_MIN_HPT_ORDER;
}
 
-   kvm->arch.hpt_cma_alloc = 0;
+   kvm->arch.hpt.cma = 0;
page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
memset((void *)hpt, 0, (1ul << order));
-   kvm->arch.hpt_cma_alloc = 1;
+   kvm->arch.hpt.cma = 1;
}
 
/* Lastly try successively smaller sizes from the page allocator */
@@ -81,22 +81,22 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
if (!hpt)
return -ENOMEM;
 
-   kvm->arch.hpt_virt = hpt;
-   kvm->arch.hpt_order = order;
+   kvm->arch.hpt.virt = hpt;
+   kvm->arch.hpt.order = order;
/* HPTEs are 2**4 bytes long */
-   kvm->arch.hpt_npte = 1ul << (order - 4);
+   kvm->arch.hpt.npte = 1ul << (order - 4);
/* 128 (2**7) bytes in each HPTEG */
-   kvm->arch.hpt_mask = (1ul << (order - 7)) - 1;
+   kvm->arch.hpt.mask = (1ul << (order - 7)) - 1;
 
atomic64_set(&kvm->arch.mmio_update, 0);
 
/* Allocate reverse map array */
-   rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt_npte);
+   rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte);
if (!rev) {
pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
goto out_freehpt;
}
-   kvm->arch.revmap = rev;
+   kvm->arch.hpt.rev = rev;
kvm->arch.sdr1 = __pa(hpt) | (order - 18);
 
pr_info("KVM guest htab at %lx (order %ld), LPID %x\n",
@@ -107,7 +107,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
return 0;
 
  out_freehpt:
-   if (kvm->arch.hpt_cma_alloc)
+   if (kvm->arch.hpt.cma)
kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
else
free_pages(hpt, order - PAGE_SHIFT);
@@ -129,10 +129,10 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
*htab_orderp)
goto out;
}
}
-   if (kvm->arch.hpt_virt) {
-   order = kvm->arch.hpt_order;
+   if (kvm->arch.hpt.virt) {
+   order = kvm->arch.hpt.order;
/* Set the entire HPT to 0, i.e. invalid HPTEs */
-   memset((void *)kvm->arch.hpt_virt, 0, 1ul << order);
+   memset((void *)kvm->arch.hpt.virt, 0, 1ul << order);
/*
 * Reset all the reverse-mapping chains for all memslots
 

[PATCHv2 08/11] powerpc/kvm: Create kvmppc_unmap_hpte_helper()

2016-12-19 Thread David Gibson
The kvm_unmap_rmapp() function, called from certain MMU notifiers, is used
to force all guest mappings of a particular host page to be set ABSENT, and
removed from the reverse mappings.

For HPT resizing, we will have some cases where we want to set just a
single guest HPTE ABSENT and remove its reverse mappings.  To prepare with
this, we split out the logic from kvm_unmap_rmapp() to evict a single HPTE,
moving it to a new helper function.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 77 +
 1 file changed, 44 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 2765d04..74e909a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -738,13 +738,53 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long 
hva,
return kvm_handle_hva_range(kvm, hva, hva + 1, handler);
 }
 
+/* Must be called with both HPTE and rmap locked */
+static void kvmppc_unmap_hpte(struct kvm *kvm, unsigned long i,
+ unsigned long *rmapp, unsigned long gfn)
+{
+   __be64 *hptep = (__be64 *) (kvm->arch.hpt.virt + (i << 4));
+   struct revmap_entry *rev = kvm->arch.hpt.rev;
+   unsigned long j, h;
+   unsigned long ptel, psize, rcbits;
+
+   j = rev[i].forw;
+   if (j == i) {
+   /* chain is now empty */
+   *rmapp &= ~(KVMPPC_RMAP_PRESENT | KVMPPC_RMAP_INDEX);
+   } else {
+   /* remove i from chain */
+   h = rev[i].back;
+   rev[h].forw = j;
+   rev[j].back = h;
+   rev[i].forw = rev[i].back = i;
+   *rmapp = (*rmapp & ~KVMPPC_RMAP_INDEX) | j;
+   }
+
+   /* Now check and modify the HPTE */
+   ptel = rev[i].guest_rpte;
+   psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
+   if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
+   hpte_rpn(ptel, psize) == gfn) {
+   hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+   kvmppc_invalidate_hpte(kvm, hptep, i);
+   hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
+   /* Harvest R and C */
+   rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
+   *rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
+   if (rcbits & HPTE_R_C)
+   kvmppc_update_rmap_change(rmapp, psize);
+   if (rcbits & ~rev[i].guest_rpte) {
+   rev[i].guest_rpte = ptel | rcbits;
+   note_hpte_modification(kvm, &rev[i]);
+   }
+   }
+}
+
 static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
   unsigned long gfn)
 {
-   struct revmap_entry *rev = kvm->arch.hpt.rev;
-   unsigned long h, i, j;
+   unsigned long i;
__be64 *hptep;
-   unsigned long ptel, psize, rcbits;
 
for (;;) {
lock_rmap(rmapp);
@@ -767,37 +807,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
cpu_relax();
continue;
}
-   j = rev[i].forw;
-   if (j == i) {
-   /* chain is now empty */
-   *rmapp &= ~(KVMPPC_RMAP_PRESENT | KVMPPC_RMAP_INDEX);
-   } else {
-   /* remove i from chain */
-   h = rev[i].back;
-   rev[h].forw = j;
-   rev[j].back = h;
-   rev[i].forw = rev[i].back = i;
-   *rmapp = (*rmapp & ~KVMPPC_RMAP_INDEX) | j;
-   }
 
-   /* Now check and modify the HPTE */
-   ptel = rev[i].guest_rpte;
-   psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
-   if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
-   hpte_rpn(ptel, psize) == gfn) {
-   hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
-   kvmppc_invalidate_hpte(kvm, hptep, i);
-   hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
-   /* Harvest R and C */
-   rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
-   *rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
-   if (rcbits & HPTE_R_C)
-   kvmppc_update_rmap_change(rmapp, psize);
-   if (rcbits & ~rev[i].guest_rpte) {
-   rev[i].guest_rpte = ptel | rcbits;
-   note_hpte_modification(kvm, &rev[i]);
-   }
-   }
+   kvmppc_unmap_hpte(kvm, i, rmapp, gfn);
unlock_rmap(rmapp);
__unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
-- 
2.9.3



[PATCHv2 09/11] powerpc/kvm: Outline of KVM-HV HPT resizing implementation

2016-12-19 Thread David Gibson
This adds a not yet working outline of the HPT resizing PAPR
extension.  Specifically it adds the necessary ioctl() functions,
their basic steps, the work function which will handle preparation for
the resize, and synchronization between these, the guest page fault
path and guest HPT update path.

The actual guts of the implementation isn't here yet, so for now the
calls will always fail.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/kvm_host.h |   3 +
 arch/powerpc/include/asm/kvm_ppc.h  |   4 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 191 
 arch/powerpc/kvm/book3s_hv.c|  25 +
 4 files changed, 223 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 7576e0e..fb8e508 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -252,6 +252,8 @@ struct kvm_hpt_info {
int cma;
 };
 
+struct kvm_resize_hpt;
+
 struct kvm_arch {
unsigned int lpid;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -272,6 +274,7 @@ struct kvm_arch {
cpumask_t need_tlb_flush;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
+   struct kvm_resize_hpt *resize_hpt; /* protected by kvm->lock */
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 3b837bc..f8eaed0 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -215,6 +215,10 @@ extern void kvmppc_bookehv_exit(void);
 extern int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu);
 
 extern int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *);
+extern long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
+   struct kvm_ppc_resize_hpt *rhpt);
+extern long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
+  struct kvm_ppc_resize_hpt *rhpt);
 
 int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 74e909a..86a1bcb 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -40,9 +40,34 @@
 
 #include "trace_hv.h"
 
+//#define DEBUG_RESIZE_HPT 1
+
+#ifdef DEBUG_RESIZE_HPT
+#define resize_hpt_debug(resize, ...)  \
+   do {\
+   printk(KERN_DEBUG "RESIZE HPT %p: ", resize);   \
+   printk(__VA_ARGS__);\
+   } while (0)
+#else
+#define resize_hpt_debug(resize, ...)  \
+   do { } while (0)
+#endif
+
 static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
long pte_index, unsigned long pteh,
unsigned long ptel, unsigned long *pte_idx_ret);
+
+struct kvm_resize_hpt {
+   /* These fields read-only after init */
+   struct kvm *kvm;
+   struct work_struct work;
+   u32 order;
+
+   /* These fields protected by kvm->lock */
+   int error;
+   bool prepare_done;
+};
+
 static void kvmppc_rmap_reset(struct kvm *kvm);
 
 int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order)
@@ -1158,6 +1183,172 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void *va, 
unsigned long gpa,
 }
 
 /*
+ * HPT resizing
+ */
+static int resize_hpt_allocate(struct kvm_resize_hpt *resize)
+{
+   return 0;
+}
+
+static int resize_hpt_rehash(struct kvm_resize_hpt *resize)
+{
+   return -EIO;
+}
+
+static void resize_hpt_pivot(struct kvm_resize_hpt *resize)
+{
+}
+
+static void resize_hpt_release(struct kvm *kvm, struct kvm_resize_hpt *resize)
+{
+   BUG_ON(kvm->arch.resize_hpt != resize);
+   kvm->arch.resize_hpt = NULL;
+   kfree(resize);
+}
+
+static void resize_hpt_prepare_work(struct work_struct *work)
+{
+   struct kvm_resize_hpt *resize = container_of(work,
+struct kvm_resize_hpt,
+work);
+   struct kvm *kvm = resize->kvm;
+   int err;
+
+   resize_hpt_debug(resize, "resize_hpt_prepare_work(): order = %d\n",
+resize->order);
+
+   err = resize_hpt_allocate(resize);
+
+   mutex_lock(&kvm->lock);
+
+   resize->error = err;
+   resize->prepare_done = true;
+
+   mutex_unlock(&kvm->lock);
+}
+
+long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
+struct kvm_ppc_resize_hpt *rhpt)
+{
+   unsigned long flags = rhpt->flags;
+   unsigned long shift = rhpt->shift;
+   struct kvm_resize_hpt *resize;
+   int ret;
+
+   if (flags != 0)
+   return -EINVAL;
+
+   if (shift &

[PATCHv2 01/11] Documentation: Correct duplicate section number in kvm/api.txt

2016-12-19 Thread David Gibson
Both KVM_CREATE_SPAPR_TCE_64 and KVM_REINJECT_CONTROL have section number
4.98 in Documentation/virtual/kvm/api.txt, presumably due to a naive merge.
This corrects the duplication.

Signed-off-by: David Gibson 
---
 Documentation/virtual/kvm/api.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 03145b7..42be7b1 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3177,7 +3177,7 @@ of IOMMU pages.
 
 The rest of functionality is identical to KVM_CREATE_SPAPR_TCE.
 
-4.98 KVM_REINJECT_CONTROL
+4.99 KVM_REINJECT_CONTROL
 
 Capability: KVM_CAP_REINJECT_CONTROL
 Architectures: x86
-- 
2.9.3



[PATCHv2 00/11] KVM implementation of PAPR HPT resizing extension

2016-12-19 Thread David Gibson
Here is the KVM implementation for the proposed PAPR extension which
allows the runtime resizing of a PAPR guest's Hashed Page Table (HPT).

Using this requires a guest kernel with support for the extension.
Patches for guest side support in Linux were posted earlier:
  https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-December/152164.html

It also requires userspace (i.e. qemu) to intercept the HPT resizing
hypercalls and invoke the KVM ioctl()s to implement them.  This is
done instead of having KVM direclty intercept the hypercalls, so that
userspace can, if useful, impose additional restrictions on resizes:
for example it could refuse them entirely if policy for the VM
precludes resizing, or it could limit the size of HPT the guest can
request to meet resource limits.

Patches to implement the userspace part of HPT resizing are proposed
for qemu-2.9, and can be found at:
  https://github.com/dgibson/qemu/tree/hpt-resize

I'm posting these now, in the hopes that both these and the
corresponding guest side patches can be staged and merged for the 4.11
window.

Changes in v2:
  * Use a more normal and straightforward encoding of the capability
flags
  * Add information to Documentation/virtual/kvm/api.txt
  * Assorted minor cleanups based on review comments

David Gibson (11):
  Documentation: Correct duplicate section number in kvm/api.txt
  powerpc/kvm: HPT resizing documentation and reserved numbers
  powerpc/kvm: Rename kvm_alloc_hpt() for clarity
  powerpc/kvm: Gather HPT related variables into sub-structure
  powerpc/kvm: Don't store values derivable from HPT order
  powerpc/kvm: Split HPT allocation from activation
  powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size
  powerpc/kvm: Create kvmppc_unmap_hpte_helper()
  powerpc/kvm: Outline of KVM-HV HPT resizing implementation
  powerpc/kvm: KVM-HV HPT resizing implementation
  powerpc/kvm: Advertise availablity of HPT resizing on KVM HV

 Documentation/virtual/kvm/api.txt| 111 +-
 arch/powerpc/include/asm/kvm_book3s_64.h |  16 +
 arch/powerpc/include/asm/kvm_host.h  |  21 +-
 arch/powerpc/include/asm/kvm_ppc.h   |  15 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 625 +--
 arch/powerpc/kvm/book3s_hv.c |  50 ++-
 arch/powerpc/kvm/book3s_hv_builtin.c |   8 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  62 +--
 arch/powerpc/kvm/powerpc.c   |   3 +
 include/uapi/linux/kvm.h |  11 +
 10 files changed, 741 insertions(+), 181 deletions(-)

-- 
2.9.3



[PATCHv2 07/11] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size

2016-12-19 Thread David Gibson
The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
table (HPT) that userspace expects a guest VM to have, and is also used to
clear that HPT when necessary (e.g. guest reboot).

At present, once the ioctl() is called for the first time, the HPT size can
never be changed thereafter - it will be cleared but always sized as from
the first call.

With upcoming HPT resize implementation, we're going to need to allow
userspace to resize the HPT at reset (to change it back to the default size
if the guest changed it).

So, we need to allow this ioctl() to change the HPT size.

This patch also updates Documentation/virtual/kvm/api.txt to reflect
the new behaviour.  In fact the documentation was already slightly
incorrect since 572abd5 "KVM: PPC: Book3S HV: Don't fall back to
smaller HPT size in allocation ioctl"

Signed-off-by: David Gibson 
---
 Documentation/virtual/kvm/api.txt   | 14 --
 arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 29 -
 arch/powerpc/kvm/book3s_hv.c|  5 +
 4 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index f452c72..c12da23 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2443,18 +2443,20 @@ are, it will do nothing and return an EBUSY error.
 The parameter is a pointer to a 32-bit unsigned integer variable
 containing the order (log base 2) of the desired size of the hash
 table, which must be between 18 and 46.  On successful return from the
-ioctl, it will have been updated with the order of the hash table that
-was allocated.
+ioctl, the value will not be changed by the kernel.
 
 If no hash table has been allocated when any vcpu is asked to run
 (with the KVM_RUN ioctl), the host kernel will allocate a
 default-sized hash table (16 MB).
 
 If this ioctl is called when a hash table has already been allocated,
-the kernel will clear out the existing hash table (zero all HPTEs) and
-return the hash table order in the parameter.  (If the guest is using
-the virtualized real-mode area (VRMA) facility, the kernel will
-re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
+with a different order from the existing hash table, the existing hash
+table will be freed and a new one allocated.  If this is ioctl is
+called when a hash table has already been allocated of the same order
+as specified, the kernel will clear out the existing hash table (zero
+all HPTEs).  In either case, if the guest is using the virtualized
+real-mode area (VRMA) facility, the kernel will re-create the VMRA
+HPTEs on the next KVM_RUN of any vcpu.
 
 4.77 KVM_S390_INTERRUPT
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 41575b8..3b837bc 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -157,7 +157,7 @@ extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
 
 extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
 extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
-extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
+extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order);
 extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
 extern long kvmppc_prepare_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 4683d87..2765d04 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -102,10 +102,10 @@ void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info 
*info)
info->virt, (long)info->order, kvm->arch.lpid);
 }
 
-long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
+long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
 {
long err = -EBUSY;
-   long order;
+   struct kvm_hpt_info info;
 
mutex_lock(&kvm->lock);
if (kvm->arch.hpte_setup_done) {
@@ -117,8 +117,9 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
*htab_orderp)
goto out;
}
}
-   if (kvm->arch.hpt.virt) {
-   order = kvm->arch.hpt.order;
+   if (kvm->arch.hpt.order == order) {
+   /* We already have a suitable HPT */
+
/* Set the entire HPT to 0, i.e. invalid HPTEs */
memset((void *)kvm->arch.hpt.virt, 0, 1ul << order);
/*
@@ -127,17 +128,19 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
*htab_orderp)
kvmppc_rmap_reset(kvm);
/* Ensure that each vcpu will flush its TLB on next entry. */
cpumask_setall(&kvm->arch.need_tlb_flush);
-   *htab_orderp = order;
err = 0;
-   } else {
-   struct kvm_hpt_info info;
-
-

Re: [RFC] Question about freeing of resource in 'atlas7_pinmux_probe()', in file 'drivers/pinctrl/sirf/pinctrl-atlas7.c'

2016-12-19 Thread Christophe JAILLET

Le 19/12/2016 à 14:54, Vladimir Zapolskiy a écrit :

Functionally it looks good, I have two comments though.
1) you don't need to revert 151b8c5ba1eb, the commit is a proper fix
per se but incomplete, please add your change on top of it,

2) minimizing the lines of code by removing duplicates is always good,
so here a better and complete fix will be like the following one:


Hi,

thanks for the feedback and comments.
Patch submitted.

CJ


Re: [PATCH v2 4/4] phy: qcom-qmp: new qmp phy driver for qcom-chipsets

2016-12-19 Thread Vivek Gautam
Hi,

On Tue, Nov 29, 2016 at 6:05 AM, Stephen Boyd  wrote:

Thanks for a thorough review. Please find my comments inline.

> On 11/22, Vivek Gautam wrote:
>> diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
>> index f1dcec1..8970d9e 100644
>> --- a/drivers/phy/Kconfig
>> +++ b/drivers/phy/Kconfig
>> @@ -430,6 +430,15 @@ config PHY_STIH407_USB
>> Enable this support to enable the picoPHY device used by USB2
>> and USB3 controllers on STMicroelectronics STiH407 SoC families.
>>
>> +config PHY_QCOM_QMP
>> + tristate "Qualcomm QMP PHY Driver"
>> + depends on OF && (ARCH_QCOM || COMPILE_TEST)
>> + select GENERIC_PHY
>> + select RESET_CONTROLLER
>
> Again, probably should be dropped as we're not providing resets
> here, just using them.

Sure, will drop in v3.

>
>> diff --git a/drivers/phy/phy-qcom-qmp.c b/drivers/phy/phy-qcom-qmp.c
>> new file mode 100644
>> index 000..f85289e
>> --- /dev/null
>> +++ b/drivers/phy/phy-qcom-qmp.c
>> @@ -0,0 +1,1141 @@
>> +/*
>> + * Copyright (c) 2016, The Linux Foundation. All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 and
>> + * only version 2 as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
> [...]
>> +
>> +/* set of registers with offsets different per-PHY */
>> +enum qphy_reg_layout {
>> + /* Common block control registers */
>> + QPHY_COM_SW_RESET,
>> + QPHY_COM_POWER_DOWN_CONTROL,
>> + QPHY_COM_START_CONTROL,
>> + QPHY_COM_PCS_READY_STATUS,
>> + /* PCS registers */
>> + QPHY_PLL_LOCK_CHK_DLY_TIME,
>> + QPHY_FLL_CNTRL1,
>> + QPHY_FLL_CNTRL2,
>> + QPHY_FLL_CNT_VAL_L,
>> + QPHY_FLL_CNT_VAL_H_TOL,
>> + QPHY_FLL_MAN_CODE,
>> + QPHY_PCS_READY_STATUS,
>> +};
>> +
>> +unsigned int pciephy_regs_layout[] = {
>
> static? const?

Will take care of these in v3.
[...]

>> +
>> +/**
>> + * struct qmp_phy_init_cfg:- per-PHY init config.
>
> The colon and dash can't be right. One is kernel-doc, but of
> course there aren't any other member descriptions.

Right, will convert this to a one-line comment.

>
>> + */
>> +struct qmp_phy_init_cfg {
>> + /* phy-type - PCIE/UFS/USB */
>> + unsigned int type;
>> + /* number of lanes provided by phy */
>> + int nlanes;
>> +
>> + /* Initialization sequence for PHY blocks - Serdes, tx, rx, pcs */
>> + struct qmp_phy_init_tbl *phy_init_serdes_tbl;
>> + int phy_init_serdes_tbl_sz;
>> + struct qmp_phy_init_tbl *phy_init_tx_tbl;
>> + int phy_init_tx_tbl_sz;
>> + struct qmp_phy_init_tbl *phy_init_rx_tbl;
>> + int phy_init_rx_tbl_sz;
>> + struct qmp_phy_init_tbl *phy_init_pcs_tbl;
>> + int phy_init_pcs_tbl_sz;
>
> _sz makes it sound like bytes, perhaps _num instead.

sure.

>
>> +
>> + /* array of registers with different offsets */
>> + unsigned int *regs;
>
> const?

taking care of these in v3.

>
>> +
>> + unsigned int mask_start_ctrl;
>> + unsigned int mask_pwr_dn_ctrl;
>> + /* true, if PHY has a separate PHY_COM_CNTRL block */
>> + bool has_phy_com_ctrl;
>> + /* true, if PHY has a reset for individual lanes */
>> + bool has_lane_rst;
>> +};
>> +
>> +/**
>> + * struct qmp_phy_desc:- per-lane phy-descriptor.
>
> Also kernel-doc requests full stop is left out on one line
> descriptions.

sure, will remove it from here and from other places where a full-stop
is not required.

>
>> + *
>> + * @phy: pointer to generic phy
>> + * @tx: pointer to iomapped memory space for PHY's tx
>> + * @rx: pointer to iomapped memory space for PHY's rx
>> + * @pcs: pointer to iomapped memory space for PHY's pcs
>> + * @pipe_clk: pointer to pipe lock
>> + * @index: lane index
>> + * @qphy: pointer to QMP phy to which this lane belongs
>> + * @lane_rst: pointer to lane's reset controller
>
> Not sure we care about "pointer to" as types can be figured out
> other ways.

will drop "pointer to" string.

>
>> + */
>> +struct qmp_phy_desc {
>> + struct phy *phy;
>> + void __iomem *tx;
>> + void __iomem *rx;
>> + void __iomem *pcs;
>> + struct clk *pipe_clk;
>> + unsigned int index;
>> + struct qcom_qmp_phy *qphy;
>> + struct reset_control *lane_rst;
>> +};
>> +
>> +/**
>> + * struct qcom_qmp_phy:- structure holding QMP PHY attributes.
>> + *
>> + * @dev: pointer to device
>> + * @serdes: pointer to iomapped memory space for phy's serdes
>> + *
>> + * @aux_clk: pointer 

[PATCH 2/2] pinctrl: sirf: atlas7: Improve code layout

2016-12-19 Thread Christophe JAILLET
Add some tab in order to improve indentation.

Signed-off-by: Christophe JAILLET 
---
 drivers/pinctrl/sirf/pinctrl-atlas7.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pinctrl/sirf/pinctrl-atlas7.c 
b/drivers/pinctrl/sirf/pinctrl-atlas7.c
index f714f67c4b64..76df538fa814 100644
--- a/drivers/pinctrl/sirf/pinctrl-atlas7.c
+++ b/drivers/pinctrl/sirf/pinctrl-atlas7.c
@@ -5444,7 +5444,7 @@ static int atlas7_pinmux_probe(struct platform_device 
*pdev)
pmx->regs[idx] = of_iomap(np, idx);
if (!pmx->regs[idx]) {
dev_err(&pdev->dev,
-   "can't map ioc bank#%d registers\n", idx);
+   "can't map ioc bank#%d registers\n", idx);
ret = -ENOMEM;
goto unmap_io;
}
@@ -6057,8 +6057,8 @@ static int atlas7_gpio_probe(struct platform_device *pdev)
ret = gpiochip_add_data(chip, a7gc);
if (ret) {
dev_err(&pdev->dev,
-   "%s: error in probe function with status %d\n",
-   np->name, ret);
+   "%s: error in probe function with status %d\n",
+   np->name, ret);
goto failed;
}
 
-- 
2.9.3



[PATCH 1/2] pinctrl: sirf: atlas7: Add missing 'of_node_put()'

2016-12-19 Thread Christophe JAILLET
Reference to 'sys2pci_np' should be dropped in all cases here, not only in
error handling path.

Signed-off-by: Christophe JAILLET 
---
 drivers/pinctrl/sirf/pinctrl-atlas7.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pinctrl/sirf/pinctrl-atlas7.c 
b/drivers/pinctrl/sirf/pinctrl-atlas7.c
index 7f3041697813..f714f67c4b64 100644
--- a/drivers/pinctrl/sirf/pinctrl-atlas7.c
+++ b/drivers/pinctrl/sirf/pinctrl-atlas7.c
@@ -5420,14 +5420,15 @@ static int atlas7_pinmux_probe(struct platform_device 
*pdev)
sys2pci_np = of_find_node_by_name(NULL, "sys2pci");
if (!sys2pci_np)
return -EINVAL;
+
ret = of_address_to_resource(sys2pci_np, 0, &res);
+   of_node_put(sys2pci_np);
if (ret)
return ret;
+
pmx->sys2pci_base = devm_ioremap_resource(&pdev->dev, &res);
-   if (IS_ERR(pmx->sys2pci_base)) {
-   of_node_put(sys2pci_np);
+   if (IS_ERR(pmx->sys2pci_base))
return -ENOMEM;
-   }
 
pmx->dev = &pdev->dev;
 
-- 
2.9.3



Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2016-12-19 Thread Alexei Starovoitov
On Mon, Dec 19, 2016 at 09:27:18PM -0800, Andy Lutomirski wrote:
> On Mon, Dec 19, 2016 at 8:44 PM, Alexei Starovoitov
>  wrote:
> > On Mon, Dec 19, 2016 at 07:12:48PM -0800, Andy Lutomirski wrote:
> >>
> >> struct cgroup_bpf {
> >> /*
> >>  * Store two sets of bpf_prog pointers, one for programs that are
> >>  * pinned directly to this cgroup, and one for those that are 
> >> effective
> >>  * when this cgroup is accessed.
> >>  */
> >> struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE];
> >> struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE];
> >> };
> >>
> >> in struct cgroup, there's a 'struct cgroup_bpf bpf;'.
> >>
> >> This would change to something like:
> >>
> >> struct cgroup_filter_slot {
> >>   struct bpf_prog *effective;
> >>   struct cgroup_filter_slot *next;
> >>   struct bpf_prog *local;
> >> }
> >>
> >> local is NULL unless *this* cgroup has a filter.  effective points to
> >> the bpf_prog that's active in this cgroup or the nearest ancestor that
> >> has a filter.  next is NULL if there are no filters higher in the
> >> chain or points to the next slot that has a filter.  struct cgroup
> >> has:
> >>
> >> struct cgroup_filter_slot filters[MAX_BPF_ATTACH_TYPE];
> >>
> >> To evaluate it, you do:
> >>
> >> struct cgroup_filter_slot *slot = &cgroup->slot[the index];
> >>
> >> if (!slot->effective)
> >>   return;
> >>
> >> do {
> >>   evaluate(slot->effective);
> >>   slot = slot->next;
> >> } while (unlikely(slot));
> >
> > yes. something like this can work as a future extension
> > to support multiple programs for security use case.
> > Please propose a patch.
> > Again, it's not needed today and there is no rush to implement it.
> >
> 
> If this happens after 4.10 and 4.10 is released as is, then this
> change would be an ABI break.

it won't break existing apps.
please study how bpf syscall was extended in the past without
breaking anything.
Same thing here. The default is one program per hook per cgroup.
Everything else is future extensions.



Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2016-12-19 Thread Andy Lutomirski
On Mon, Dec 19, 2016 at 8:44 PM, Alexei Starovoitov
 wrote:
> On Mon, Dec 19, 2016 at 07:12:48PM -0800, Andy Lutomirski wrote:
>>
>> struct cgroup_bpf {
>> /*
>>  * Store two sets of bpf_prog pointers, one for programs that are
>>  * pinned directly to this cgroup, and one for those that are 
>> effective
>>  * when this cgroup is accessed.
>>  */
>> struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE];
>> struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE];
>> };
>>
>> in struct cgroup, there's a 'struct cgroup_bpf bpf;'.
>>
>> This would change to something like:
>>
>> struct cgroup_filter_slot {
>>   struct bpf_prog *effective;
>>   struct cgroup_filter_slot *next;
>>   struct bpf_prog *local;
>> }
>>
>> local is NULL unless *this* cgroup has a filter.  effective points to
>> the bpf_prog that's active in this cgroup or the nearest ancestor that
>> has a filter.  next is NULL if there are no filters higher in the
>> chain or points to the next slot that has a filter.  struct cgroup
>> has:
>>
>> struct cgroup_filter_slot filters[MAX_BPF_ATTACH_TYPE];
>>
>> To evaluate it, you do:
>>
>> struct cgroup_filter_slot *slot = &cgroup->slot[the index];
>>
>> if (!slot->effective)
>>   return;
>>
>> do {
>>   evaluate(slot->effective);
>>   slot = slot->next;
>> } while (unlikely(slot));
>
> yes. something like this can work as a future extension
> to support multiple programs for security use case.
> Please propose a patch.
> Again, it's not needed today and there is no rush to implement it.
>

If this happens after 4.10 and 4.10 is released as is, then this
change would be an ABI break.

--Andy


Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2016-12-19 Thread Andy Lutomirski
On Mon, Dec 19, 2016 at 8:51 PM, Alexei Starovoitov
 wrote:
> On Mon, Dec 19, 2016 at 05:40:53PM -0800, Andy Lutomirski wrote:
>>
>> By the way, even if Alexei is right, the BPF_PROG_DETACH API doesn't
>> even take a reference to a BPF program as an argument.  What is it
>> supposed to do if this mechanism ever gets extended?
>
> we just add another field to that anonymous union just like
> we did for other commands and everything is backwards compatible.
> It's the basics of bpf syscall that we've been relying on for some
> time now and it worked just fine.

And what happens if you don't specify that member and two programs are attached?

--Andy


Re: [PATCH v4 0/2] FPGA: TS-7300 FPGA manager

2016-12-19 Thread Alan Tull
On Mon, 19 Dec 2016, Florian Fainelli wrote:

> On 12/18/2016 12:21 PM, Florian Fainelli wrote:
> > Hi all,
> > 
> > This patch series adds support for loading bitstreams into the Altera 
> > Cyclone II
> > connected to an EP9302 on a TS-7300 board.
> > 
> > Changes in v4:
> > 
> > - fixed ops->write not to do the final configuration release
> > - reordered patches
> > 
> > Changes in v3:
> > 
> > - fix write_init and write_complete signatures
> > 
> > Changes in v2:
> > 
> > - rebased against fpga/for-next
> > - added defines for configuration bits and delays
> > - added error mesage if ioremap() fails
> > - detailed how the configuration through CPLD is done
> 
> Alan, Moritz, thanks for providing Acked-by, I was under the impression
> these patches would be taken by you through the FPGA tree,

Hi Florain,

That's right, I'll handle it.  The FPGA tree goes in through Greg.
We're in the merge window where stuff Greg already has is going to
Linus so it's too late to get this into 4.10.  I'll be doing a pull
after that.

Alan

> since Hartley
> acked the EP93xx part and since that was the tree used as a baseline.
> 
> Let me know how you want to proceed, since EP93xx is not as active as
> other ARM SoCs, there may not be a specific tree where to stage these
> patches (unless you take them).
> 
> Thanks!
> -- 
> Florian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fpga" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()

2016-12-19 Thread Paul E. McKenney
On Mon, Dec 19, 2016 at 11:15:15PM +0800, Boqun Feng wrote:
> On Thu, Dec 15, 2016 at 02:51:36PM +, Colin Ian King wrote:
> > On 15/12/16 14:42, Boqun Feng wrote:
> > > On Thu, Dec 15, 2016 at 12:04:59PM +, Mark Rutland wrote:
> > >> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
> > >>> ->qsmask of an RCU leaf node is usually more sparse than the
> > >>> corresponding cpu_possible_mask. So replace the
> > >>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> > >>> for_each_leaf_node_cpu() to save several checks.
> > >>>
> > >>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate the
> > >>> corresponding mask for a bit because @mask is unsigned long, this was
> > >>> spotted by Colin Ian King  and CoverityScan in
> > >>> a previous version of this patch.]
> > >>
> > >> Nit: This note can go now that we use leaf_node_cpu_bit(). ;)
> > >>
> > > 
> > > I kinda keep this here for honoring the effort of finding out this bug
> > > from Colin, but yes, it's no longer needed here for the current code.
> > 
> > Yep, remove it.
> > 
> 
> Paul, here is a modified version of this patch, what I only did is
> removing this note.
> 
> Besides I rebased the whole series on the current rcu/dev branch of -rcu
> tree, on this very commit:
> 
>   8e9b2521b18a ("doc: Quick-Quiz answers are now inline")
> 
> And I put the latest version at
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git leaf-node
> 
> If you thought it's better, I could send a v3 ;-)

I would feel better about this patchset if it reduced the number of lines
of code rather than increasing them.  That said, part of the increase
is a commment.  Still, I am not convinced that the extra level of macro
is carrying its weight.

dbf18a2422e2 ("rcu: Introduce for_each_leaf_node_cpu()")

The commit log needs a bit of wordsmithing.

The added WARN_ON_ONCE(!cpu_possible(cpu)) still seems strange.
What is its purpose, really?  What does its triggering tell you?
What other checks did you consider as an alternative?

And if you are going to add checks of this type, should you
also check for this being a leaf rcu_node structure?

3f0b4ba1fe94 ("rcu: Use for_each_leaf_node_cpu() in RCU stall checking")

This does look a bit nicer, but why the added blank lines?
Are they really helping?

The commit log seems a bit misplaced.  This code is almost never
executed (once per 21 seconds at the most), so performance really
isn't a consideration.  The simpler-looking code might be.

fd799f1ac7b7 ("rcu: Use for_each_leaf_node_cpu() in ->expmask iteration")

Ditto on blank lines.

Again, this code is executed per expedited grace period, so
performance really isn't a big deal.  More of a big deal than
the stall-warning code, but we still are way off of any fastpath.

69a1baedbf42 ("rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()")

Ditto again on blank lines.

And on the commit log.  This code is executed about once
per several jiffies, and on larger machines, per 20 jiffies
or so.  Performance really isn't a consideration.

7b00e50e3efb ("rcu: Use for_each_leaf_node_cpu() in online CPU iteration")

And another ditto on blank lines.

This code executes once per CPU-hotplug operation, so again isn't
at all performance critical.

In short, if you are trying to sell this to me as a significant performance
boost, I am not buying.  The added WARN_ON_ONCE() looks quite dubious,
though perhaps I am misunderstanding its purpose.  My assumption is
that you want to detect missing UL suffixes on bitmask constants, in
which case I bet there is a better way.

Speaking of which, how do we know that this is free of bugs?

Thanx, Paul

> Regards,
> Boqun
> 
> >8
> From: Boqun Feng 
> Date: Thu, 8 Dec 2016 23:21:11 +0800
> Subject: [PATCH v2.1 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
> 
> ->qsmask of an RCU leaf node is usually more sparse than the
> corresponding cpu_possible_mask. So replace the
> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> for_each_leaf_node_cpu() to save several checks.
> 
> Signed-off-by: Boqun Feng 
> ---
>  kernel/rcu/tree.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 4ea4496f4ecc..c2b753fb7f09 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3046,13 +3046,11 @@ static void force_qs_rnp(struct rcu_state *rsp,
>   continue;
>   }
>   }
> - for_each_leaf_node_possible_cpu(rnp, cpu) {
> - unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
> - if ((rnp->qsmask & bit) != 0) {
> - if (f(per_cpu_p

Re: [PATCH] Xen: ARM: Zero reserved fields of xatp before making hypervisor call

2016-12-19 Thread Jiandi An
On 12/19/16 12:49, Stefano Stabellini wrote:
> On Mon, 19 Dec 2016, Juergen Gross wrote:
>> On 19/12/16 03:56, Jiandi An wrote:
>>> Ensure all reserved fields of xatp are zero before making hypervisor
>>> call to XEN in xen_map_device_mmio().  xenmem_add_to_physmap_one() in
>>> XEN fails the mapping request if extra.res reserved field in xatp is
>>> not zero for XENMAPSPACE_dev_mmio request.
>>>
>>> Signed-off-by: Jiandi An 
>>> ---
>>>  drivers/xen/arm-device.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/xen/arm-device.c b/drivers/xen/arm-device.c
>>> index 778acf8..208273b 100644
>>> --- a/drivers/xen/arm-device.c
>>> +++ b/drivers/xen/arm-device.c
>>> @@ -87,6 +87,9 @@ static int xen_map_device_mmio(const struct resource 
>>> *resources,
>>> idxs[j] = XEN_PFN_DOWN(r->start) + j;
>>> }
>>>  
>>> +   /* Ensure reserved fields are set to zero */
>>> +   memset(&xatp, 0, sizeof(xatp));
>>> +
>>> xatp.domid = DOMID_SELF;
>>> xatp.size = nr;
>>> xatp.space = XENMAPSPACE_dev_mmio;
>>
>> Instead of setting xatp to 0 in each iteration I'd prefer a static
>> initialization of .domid and .space:
>>
>>  struct xen_add_to_physmap_range xatp = {
>>  .domid = DOMID_SELF,
>>  .space = XENMAPSPACE_dev_mmio
>>  };
> 
> +1
> 

Hi Juergen,

Thanks for you comments.  xatp is passed to XEN via the hypervisor call in each 
loop.
XEN touches xatp and returns it back.  For example XEN returns error of 
underlying mapping call in the err[] array in xatp. (The err[] is not checked 
after the hypervisor call returns and it's a bug to be addressed in a separate 
patch)  XEN could theoretically corrupt xatp when it's returned.  And the loop 
would go on to the next iteration passing in whatever that's in xatp returned 
by the previous hypervisor call.  Harder to debug in my opinion if xatp get 
corrupted by XEN somehow when a bug is introduced in XEN.  At first I put the 
memset of xatp at the beginning outside of the loop.  But I thought it's better 
to initialize xatp that's passed in each time a hypervisor call is made so we 
know exactly we set going into the hypervisor call.

-- 
Jiandi An
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.


Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2016-12-19 Thread Alexei Starovoitov
On Mon, Dec 19, 2016 at 05:40:53PM -0800, Andy Lutomirski wrote:
> 
> By the way, even if Alexei is right, the BPF_PROG_DETACH API doesn't
> even take a reference to a BPF program as an argument.  What is it
> supposed to do if this mechanism ever gets extended?

we just add another field to that anonymous union just like
we did for other commands and everything is backwards compatible.
It's the basics of bpf syscall that we've been relying on for some
time now and it worked just fine.



Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2016-12-19 Thread Alexei Starovoitov
On Mon, Dec 19, 2016 at 07:50:01PM -0800, Andy Lutomirski wrote:
> >>
> >> net.socket_create_filter = "none": no filter
> >> net.socket_create_filter = "bpf:baadf00d": bpf filter
> >
> > i'm assuming 'baadf00d' is bpf program fd expressed a text string?
> > and kernel needs to parse above? will you allow capital and lower
> > case for 'bpf:' ? and mixed case too? spaces and tabs allowed or not?
> > can program fd expressed as decimal or hex or both?
> > how do you return the error? as a text string for user space
> > to parse?
> 
> No.  The kernel does not parse it because you cannot write this to the
> file.  You set a bpf filter with ioctl and pass an fd.  If you *read*
> the file, you get the same bpf program hash that fdinfo on the bpf
> object would show -- this is for debugging and (eventually) CRIU.

my understanding that cgroup is based on kernfs and both don't support
ioctl, so you'd need quite some hacking to introduce such concepts
and buy-in from a bunch of people first.

> >> net.socket_create_filter = "iptables:foobar": some iptables thingy
> >> net.socket_create_filter = "nft:blahblahblah": some nft thingy
> >
> > iptables/nft are not applicable to 'bpf_socket_create' which
> > looks into:
> > struct bpf_sock {
> > __u32 bound_dev_if;
> > __u32 family;
> > __u32 type;
> > __u32 protocol;
> > };
> > so don't fit as an example.
> 
> The code that takes a 'struct sock' and sets up bpf_sock is
> bpf-specific and would obviously not be used for non-bpf filter.  But
> if you had a filter that was just a like of address families, that
> filter would look at struct sock and do its own thing.  iptables
> wouldn't make sense for a socket creation filter, but it would make
> perfect sense for an ingress filter.  Obviously the bpf-specific state
> object wouldn't be used, but it would still be a hook, invoked from
> the same network code, guarded by the same static key, looking at the
> same skb.

I strongly suggest to go back and read my first reply where
I think I explained well enough that something like iptables
will not able to reuse the ioctl mechanism you're proposing here,
hook ids will be different, attachment mechanism will be different too.
So your proposed cgroup ioctl is already dead as a reusable interface.

> >> net.socket_create_filter = "disallow": no sockets created period
> >> net.socket_create_filter = "address_family_list:1,2,3": allow AF 1, 2, and 
> >> 3
> >
> > so you're proposing to add a bunch of hard coded logic to the kernel.
> > First to parse such text into some sort of syntax tree or list/set
> > and then have hard coded logic specifically for these two use cases?
> > While above two can be implemented as trivial bpf programs already?!
> > That goes 180% degree vs bpf philosophy. bpf is about moving
> > the specific code out of the kernel and keeping kernel generic that
> > it can solve as many use cases as possible by being programmable.
> 
> I'm not seriously proposing implementing these.  My point is that
> *bpf*, while wonderful, is not the be-all-and-end-all of kernel
> configurability, and other types of hooks might want to be hooked in
> here.

Then please let's talk about real use cases.
This daydreaming of some future llvm in the kernel that you were
bringing up during LPC doesn't help the discussion.
Just like these artificial examples.

> > ...
> >> What exactly isn't sensible about using cgroup_bpf for containers?
> >
> > my use case is close to zero overhead application network monitoring.
> 
> So if I set up a cgroup that's monitored and call it /cgroup/a and
> enable delegation and if the program running there wants to do its own
> monitoring in /cgroup/a/b (via delegation), then you really want the
> outer monitor to silently drop events coming from /cgroup/a/b?

yes. both are root and must talk to each other if they want
to co-exist. When root process is asking kernel to do X, this X has
to happen.

> Then disallow nesting.  You're welcome to not rush the decision, but
> that's not what you've done.  If 4.10 is released as is, you've made
> the decision and you're going to have a hard time changing it.

Nothing needs to be changed.

> > No. As was pointed out before only root can load 
> > BPF_PROG_TYPE_CGROUP_[SKB|SOCK]
> > type programs and only root can attach them.
> 
> Why?  It really seems to me that you expect that future namespaceable
> bpf hooks will use a totally different API.  At KS, I sat in a room
> full of people arguing about cgroup v2 and a lot of them pointed out
> that there are valid, paying use cases that want to stick cgroup v1 in
> a container, because the code that uses cgroup v1 in the container is
> called systemd and the contained OS is called RHEL (or SuSE or Ubuntu
> or Gentoo or whatever), and that code is *already written* and needs
> to be contained.

bpf in general is not namespace aware. It's global and this cgroup scoping
of bpf programs is the first of this kind. Namespacing of bpf is comp

Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2016-12-19 Thread Alexei Starovoitov
On Mon, Dec 19, 2016 at 07:12:48PM -0800, Andy Lutomirski wrote:
> 
> struct cgroup_bpf {
> /*
>  * Store two sets of bpf_prog pointers, one for programs that are
>  * pinned directly to this cgroup, and one for those that are 
> effective
>  * when this cgroup is accessed.
>  */
> struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE];
> struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE];
> };
> 
> in struct cgroup, there's a 'struct cgroup_bpf bpf;'.
> 
> This would change to something like:
> 
> struct cgroup_filter_slot {
>   struct bpf_prog *effective;
>   struct cgroup_filter_slot *next;
>   struct bpf_prog *local;
> }
> 
> local is NULL unless *this* cgroup has a filter.  effective points to
> the bpf_prog that's active in this cgroup or the nearest ancestor that
> has a filter.  next is NULL if there are no filters higher in the
> chain or points to the next slot that has a filter.  struct cgroup
> has:
> 
> struct cgroup_filter_slot filters[MAX_BPF_ATTACH_TYPE];
> 
> To evaluate it, you do:
> 
> struct cgroup_filter_slot *slot = &cgroup->slot[the index];
> 
> if (!slot->effective)
>   return;
> 
> do {
>   evaluate(slot->effective);
>   slot = slot->next;
> } while (unlikely(slot));

yes. something like this can work as a future extension
to support multiple programs for security use case.
Please propose a patch.
Again, it's not needed today and there is no rush to implement it.



Re: [PATCH v10 0/8] power: add power sequence library

2016-12-19 Thread Peter Chen
On Mon, Dec 19, 2016 at 09:15:04PM +0200, Krzysztof Kozlowski wrote:
> On Mon, Nov 14, 2016 at 09:35:51AM +0800, Peter Chen wrote:
> > Hi all,
> > 
> > This is a follow-up for my last power sequence framework patch set [1].
> > According to Rob Herring and Ulf Hansson's comments[2]. The kinds of
> > power sequence instances will be added at postcore_initcall, the match
> > criteria is compatible string first, if the compatible string is not
> > matched between dts and library, it will try to use generic power sequence.
> >  
> > The host driver just needs to call of_pwrseq_on/of_pwrseq_off
> > if only one power sequence instance is needed, for more power sequences
> > are used, using of_pwrseq_on_list/of_pwrseq_off_list instead (eg, USB hub 
> > driver).
> > 
> > In future, if there are special power sequence requirements, the special
> > power sequence library can be created.
> > 
> > This patch set is tested on i.mx6 sabresx evk using a dts change, I use
> > two hot-plug devices to simulate this use case, the related binding
> > change is updated at patch [1/6], The udoo board changes were tested
> > using my last power sequence patch set.[3]
> > 
> > Except for hard-wired MMC and USB devices, I find the USB ULPI PHY also
> > need to power on itself before it can be found by ULPI bus.
> > 
> > [1] http://www.spinics.net/lists/linux-usb/msg142755.html
> > [2] http://www.spinics.net/lists/linux-usb/msg143106.html
> > [3] http://www.spinics.net/lists/linux-usb/msg142815.html
> > 
> > Changes for v10:
> > - Improve the kernel-doc for power sequence core, including exported APIs 
> > and
> >   main structure. [Patch 2/8]
> > - Change Kconfig, and let the user choose power sequence. [Patch 2/8]
> > - Delete EXPORT_SYMBOL and change related APIs as local, these APIs do not
> >   be intended to export currently. [Patch 2/8]
> > - Selete POWER_SEQUENCE at USB core's Kconfig. [Patch 4/8]
> 
> Hi Peter,
> 
> It is great that you continued the work on this.
> 
> I saw (in some previous mails) your repo mentioned:
> https://git.kernel.org/cgit/linux/kernel/git/peter.chen/usb.git/
> Does it contain the recent version of this patchset?
> 
> I want to build on top of it fixes for usb3503 on Odroid U3 board.

Krzysztof, I put v10 patch set at branch: pwrseq-lib. 
It seems there are no more comments I will send my v11 patch set
after new year holiday.

-- 

Best Regards,
Peter Chen


Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

2016-12-19 Thread Lu Baolu
Hi Mathias,

On 12/19/2016 08:13 PM, Mathias Nyman wrote:
> On 19.12.2016 13:34, Baolin Wang wrote:
>> Hi Mathias,
>>
>> On 19 December 2016 at 18:33, Mathias Nyman
>>  wrote:
>>> On 13.12.2016 05:21, Baolin Wang wrote:

 Hi Mathias,

 On 12 December 2016 at 23:52, Mathias Nyman
  wrote:
>
> On 05.12.2016 09:51, Baolin Wang wrote:
>>
>>
>> If a command event is found on the event ring during an interrupt,
>> we need to stop the command timer with del_timer(). Since del_timer()
>> can fail if the timer is running and waiting on the xHCI lock, then
>> it maybe get the wrong timeout command in xhci_handle_command_timeout()
>> if host fetched a new command and updated the xhci->current_cmd in
>> handle_cmd_completion(). For this situation, we need a way to signal
>> to the command timer that everything is fine and it should exit.
>
>
>
> Ah, right, this could actually happen.
>
>>
>>
>> We should introduce a counter (xhci->current_cmd_pending) for the number
>> of pending commands. If we need to cancel the command timer and
>> del_timer()
>> succeeds, we decrement the number of pending commands. If del_timer()
>> fails,
>> we leave the number of pending commands alone.
>>
>> For handling timeout command, in xhci_handle_command_timeout() we will
>> check
>> the counter after decrementing it, if the counter
>> (xhci->current_cmd_pending)
>> is 0, which means xhci->current_cmd is the right timeout command. If the
>> counter (xhci->current_cmd_pending) is greater than 0, which means
>> current
>> timeout command has been handled by host and host has fetched new
>> command
>> as
>> xhci->current_cmd, then just return and wait for new current command.
>
>
>
> A counter like this could work.
>
> Writing the abort bit can generate either ABORT+STOP, or just STOP
> event, this seems to cover both.
>
> quick check, case 1: timeout and cmd completion at the same time.
>
> cpu1cpu2
>
> queue_command(first), p++ (=1)
> queue_command(more),
> --completion irq fires---- timer times out at same time--
> handle_cmd_completion() handle_cmd_timeout(),)
> lock(xhci_lock  )   spin_on(xhci_lock)
> del_timer() fail, p (=1, nochange)
> cur_cmd = list_next(), p++ (=2)
> unlock(xhci_lock)
>   lock(xhci_lock)
>   p-- (=1)
>   if (p > 0), exit
> OK works
>
> case 2: normal timeout case with ABORT+STOP, no race.
>
> cpu1cpu2
>
> queue_command(first), p++ (=1)
> queue_command(more),
>   handle_cmd_timeout()
>   p-- (P=0), don't exit
>   mod_timer(), p++ (P=1)
>   write_abort_bit()
> handle_cmd_comletion(ABORT)
> del_timer(), ok, p-- (p = 0)
> handle_cmd_completion(STOP)
> del_timer(), fail, (P=0)
> handle_stopped_cmd_ring()
> cur_cmd = list_next(), p++ (=1)
> mod_timer()
>
> OK, works, and same for just STOP case, with the only difference that
> during handle_cmd_completion(STOP) p is decremented (p--)


 Yes, that's the cases what I want to handle, thanks for your explicit
 explanation.

>>>
>>> Gave this some more thought over the weekend, and this implementation
>>> doesn't solve the case when the last command times out and races with the
>>> completion handler:
>>>
>>> cpu1cpu2
>>>
>>> queue_command(first), p++ (=1)
>>> --completion irq fires---- timer times out at same time--
>>> handle_cmd_completion() handle_cmd_timeout(),)
>>> lock(xhci_lock )spin_on(xhci_lock)
>>> del_timer() fail, p (=1, nochange)
>>> no more commands, P (=1, nochange)
>>> unlock(xhci_lock)
>>>  lock(xhci_lock)
>>>  p-- (=0)
>>>  p == 0, continue, even if we should
>>> not.
>>>For this we still need to rely on
>>> checking cur_cmd == NULL in the timeout function.
>>> (Baolus patch sets it to NULL if there are no more commands pending)
>>
>> As I pointed out in patch 1 of this patchset, this patchset is based
>> on Lu Baolu's new fix patch:
>> usb: xhci: fix possible wild pointer
>> https://www.spinics.net/lists/linux-usb/msg150219.html
>>
>> After applying Baolu's patch, after decrement the counter, we will
>> check the xhci->cur_command if is NULL. So in thi

Re: [PATCH] thermal: add thermal_zone_remove_device_groups()

2016-12-19 Thread Eduardo Valentin
Hey Yasuaki San,

On Thu, Dec 15, 2016 at 04:47:08PM -0500, Yasuaki Ishimatsu wrote:
> When offlining all cores on a CPU, the following system panic
> occurs:
> 
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: strlen+0x0/0x20
> 
> Call Trace:
>  ? kernfs_name_hash+0x17/0x80
>  kernfs_find_ns+0x3f/0xd0
>  kernfs_remove_by_name_ns+0x36/0xa0
>  remove_files.isra.1+0x36/0x70
>  sysfs_remove_group+0x44/0x90
>  sysfs_remove_groups+0x2e/0x50
>  device_remove_attrs+0x5e/0x90
>  device_del+0x1ea/0x350
>  device_unregister+0x1a/0x60
>  thermal_zone_device_unregister+0x1f2/0x210
>  pkg_thermal_cpu_offline+0x14f/0x1a0 [x86_pkg_temp_thermal]
>  ? kzalloc.constprop.2+0x10/0x10 [x86_pkg_temp_thermal]
>  cpuhp_invoke_callback+0x8d/0x3f0
>  cpuhp_down_callbacks+0x42/0x80
>  cpuhp_thread_fun+0x8b/0xf0
>  smpboot_thread_fn+0x110/0x160
>  kthread+0x101/0x140
>  ? sort_range+0x30/0x30
>  ? kthread_park+0x90/0x90
>  ret_from_fork+0x25/0x30
> 
> thermal_zone_create_device_group() sets attribute_groups in
> thermal_zone_attribute_groups[] to tz->device.groups. But these
> attributes_groups do not have name argument.
> 
> So when offlining all cores on CPU and executing
> thermal_zone_device_unregister(), the panic occurs in strlen()
> called from kernfs_name_hash() because name argument is NULL.
> 
> The patch adds thermal_zone_remove_device_groups() to free
> tz->device.groups and set NULL pointer.
> 

Thanks for finding this and sending a fix. Overall I am OK with the
patch, but I have some questions, as follows.

> Signed-off-by: Yasuaki Ishimatsu 
> CC: Zhang Rui 
> CC: Eduardo Valentin 
> ---
>  drivers/thermal/thermal_core.c  | 3 ++-
>  drivers/thermal/thermal_core.h  | 1 +
>  drivers/thermal/thermal_sysfs.c | 6 ++

First question is, are you seeing same problem with cooling devices ?
They follow same strategy, of setting the groups property, and the
cooling device groups also have no name.

>  3 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 641faab..926e385 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1251,6 +1251,7 @@ struct thermal_zone_device *
> 
>  unregister:
>   release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
> + thermal_zone_remove_device_groups(tz);
>   device_unregister(&tz->device);
>   return ERR_PTR(result);
>  }
> @@ -1315,8 +1316,8 @@ void thermal_zone_device_unregister(struct 
> thermal_zone_device *tz)
>   release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
>   idr_destroy(&tz->idr);
>   mutex_destroy(&tz->lock);
> + thermal_zone_remove_device_groups(tz);
>   device_unregister(&tz->device);
> - kfree(tz->device.groups);
>  }
>  EXPORT_SYMBOL_GPL(thermal_zone_device_unregister);
> 
> diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
> index 2412b37..e3a60db 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -70,6 +70,7 @@ void thermal_zone_device_unbind_exception(struct 
> thermal_zone_device *,
>  int thermal_build_list_of_policies(char *buf);
> 
>  /* sysfs I/F */
> +void thermal_zone_remove_device_groups(struct thermal_zone_device *tz);
>  int thermal_zone_create_device_groups(struct thermal_zone_device *, int);
>  void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *);
>  /* used only at binding time */
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index a694de9..3dfd29b 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -605,6 +605,12 @@ static int create_trip_attrs(struct thermal_zone_device 
> *tz, int mask)
>   return 0;
>  }
> 
> +void thermal_zone_remove_device_groups(struct thermal_zone_device *tz)
> +{
> + kfree(tz->device.groups);

OK. This part needs to be done regardless

> + tz->device.groups = NULL;

This almost defeats the purpose of the effort to put all properties to
device.groups, as we need to care about releasing them. 

I guess the problem is we cannot keep the same sysfs entries if we set a
name on the thermal device groups. That would break userspace. And if we
do not set, we need to care about releasing them.

Almost feel like we might consider patching sysfs core.
Or maybe find another way to setup the device groups for thermal
devices.

> +}
> +
>  int thermal_zone_create_device_groups(struct thermal_zone_device *tz,
> int mask)
>  {
> -- 
> 1.8.3.1
> 


Re: [PATCH v4 0/2] FPGA: TS-7300 FPGA manager

2016-12-19 Thread Florian Fainelli
On 12/18/2016 12:21 PM, Florian Fainelli wrote:
> Hi all,
> 
> This patch series adds support for loading bitstreams into the Altera Cyclone 
> II
> connected to an EP9302 on a TS-7300 board.
> 
> Changes in v4:
> 
> - fixed ops->write not to do the final configuration release
> - reordered patches
> 
> Changes in v3:
> 
> - fix write_init and write_complete signatures
> 
> Changes in v2:
> 
> - rebased against fpga/for-next
> - added defines for configuration bits and delays
> - added error mesage if ioremap() fails
> - detailed how the configuration through CPLD is done

Alan, Moritz, thanks for providing Acked-by, I was under the impression
these patches would be taken by you through the FPGA tree, since Hartley
acked the EP93xx part and since that was the tree used as a baseline.

Let me know how you want to proceed, since EP93xx is not as active as
other ARM SoCs, there may not be a specific tree where to stage these
patches (unless you take them).

Thanks!
-- 
Florian


Re: [PATCH v3 4/5] nvme: Implement resume_from_suspend and SED Allocation code.

2016-12-19 Thread kbuild test robot
Hi Scott,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.9 next-20161219]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Scott-Bauer/include-Add-definitions-for-sed/20161220-110214
config: x86_64-randconfig-i0-201651 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/nvme/host/core.c: In function 'nvme_sec_submit':
>> drivers/nvme/host/core.c:770:9: warning: missing braces around initializer 
>> [-Wmissing-braces]
 struct nvme_command cmd = { 0 };
^
   drivers/nvme/host/core.c:770:9: warning: (near initialization for 
'cmd.') [-Wmissing-braces]
   Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size
   Cyclomatic Complexity 5 include/linux/compiler.h:__write_once_size
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:__set_bit
   Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:test_and_set_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:constant_test_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:variable_test_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls64
   Cyclomatic Complexity 1 
include/uapi/linux/byteorder/little_endian.h:__le64_to_cpup
   Cyclomatic Complexity 1 
include/uapi/linux/byteorder/little_endian.h:__le16_to_cpup
   Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u64
   Cyclomatic Complexity 1 include/linux/list.h:INIT_LIST_HEAD
   Cyclomatic Complexity 1 include/linux/list.h:__list_del
   Cyclomatic Complexity 1 include/linux/list.h:list_empty
   Cyclomatic Complexity 1 arch/x86/include/asm/current.h:get_current
   Cyclomatic Complexity 2 arch/x86/include/asm/page_64.h:__phys_addr_nodebug
   Cyclomatic Complexity 1 include/asm-generic/getorder.h:__get_order
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_read
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_set
   Cyclomatic Complexity 2 arch/x86/include/asm/atomic.h:atomic_sub_and_test
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_add_return
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_cmpxchg
   Cyclomatic Complexity 5 arch/x86/include/asm/atomic.h:__atomic_add_unless
   Cyclomatic Complexity 1 include/linux/atomic.h:atomic_add_unless
   Cyclomatic Complexity 1 include/linux/err.h:PTR_ERR
   Cyclomatic Complexity 1 include/linux/thread_info.h:check_object_size
   Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_lock
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_lock_irq
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock_irq
   Cyclomatic Complexity 1 include/linux/jiffies.h:_msecs_to_jiffies
   Cyclomatic Complexity 5 include/linux/jiffies.h:msecs_to_jiffies
   Cyclomatic Complexity 1 include/linux/workqueue.h:to_delayed_work
   Cyclomatic Complexity 1 include/linux/signal.h:sigismember
   Cyclomatic Complexity 1 include/linux/sched.h:task_thread_info
   Cyclomatic Complexity 1 include/linux/slab.h:__kmalloc_node
   Cyclomatic Complexity 68 include/linux/slab.h:kmalloc_large
   Cyclomatic Complexity 5 include/linux/slab.h:kmalloc
   Cyclomatic Complexity 1 include/linux/slab.h:kmalloc_node
   Cyclomatic Complexity 1 include/linux/slab.h:kzalloc
   Cyclomatic Complexity 1 include/linux/slab.h:kzalloc_node
   Cyclomatic Complexity 1 include/linux/kref.h:kref_init
   Cyclomatic Complexity 1 include/linux/kref.h:kref_get_unless_zero
   Cyclomatic Complexity 1 include/linux/device.h:dev_to_node
   Cyclomatic Complexity 1 include/linux/device.h:dev_get_drvdata
   Cyclomatic Complexity 1 include/linux/fs.h:iminor
   Cyclomatic Complexity 1 include/linux/genhd.h:get_capacity
   Cyclomatic Complexity 1 include/linux/genhd.h:set_capacity
   Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_read
   Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_write
   Cyclomatic Complexity 11 arch/x86/include/asm/uaccess.h:copy_from_user
   Cyclomatic Complexity 11 arch/x86/include/asm/uaccess.h:copy_to_user
   Cyclomatic Complexity 1 include/linux/blk_types.h:op_is_write
   Cyclomatic Complexity 1 include/linux/blkdev.h:queue_flag_set_unlocked
   Cyclomatic Complexity 1 include/linux/blkdev.h:blk_rq_pos
   Cyclomatic Complexity 1 include/linux/blkdev.h:blk_rq_bytes
   Cyclomatic Complexity 1 include/linux/blkdev.h:blk_integrity_rq
   Cyclomatic Complexity 1 
include/linux/blkdev.h:blk_queue_max_integrity_segments
   Cyclomatic Complexity 1 include/linux/blk-mq.h:blk_mq_rq_to_pdu
   Cyclomatic Complexity 1 include/linux/module.h:try_module_get
   Cyclomatic Complexity 1 include/linux/module.h

Re: [PATCH] nvmem: core: Allow ignoring length when reading a cell

2016-12-19 Thread Vivek Gautam
On Tue, Dec 20, 2016 at 3:17 AM, Stephen Boyd  wrote:
> On 12/19, Vivek Gautam wrote:
>> nvmem_cell_read() API fills in the argument 'len' with
>> the number of bytes read from the cell. Many users don't
>> care about this length value. So allow users to pass a
>> NULL pointer to this len field.
>>
>> Signed-off-by: Vivek Gautam 
>> ---
>
> Reviewed-by: Stephen Boyd 

Thanks Stephen.


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: thermal zones break with patch "Reimplement IDR and IDA using the radix tree" (mainline+next)

2016-12-19 Thread Eduardo Valentin
Hey,

On Sat, Dec 17, 2016 at 01:48:33AM +0100, Heiko Stuebner wrote:
> Hi Matthew,
> 
> Am Freitag, 16. Dezember 2016, 21:19:37 CET schrieb Matthew Wilcox:
> > From: Heiko Stuebner [mailto:he...@sntech.de]
> > 
> > > commit b05bbe3ea2db ("Reimplement IDR and IDA using the radix tree")
> > > seems to
> > > break thermal zone allocation. This happens both on todays mainline and
> > > linux-next-20161216 and produces errors like:
> > 
> > 
> > 
> > > While I haven't looked to deeply into what idr exactly does, some
> > > findings:
>  - thermal_zone0 and thermal_zone1 are allocated correctly
> > > - every further thermal_zone always gets allocated the number "1"
> > > - thermal core calls idr_alloc with 0 for both start and end
> > > - the rewrite-patch seems to change the semantics of idr_alloc
> > > 
> > >   where it orignally said "@end: the maximum id (exclusive, <= 0 for
> > >   max)"
> > >   the "<= 0" part is gone now, but I checked, simply setting INT_MAX
> > >   as end in the thermal_core does not help
> > 
> > 
> > Hi Heiko,
> > 
> > Thanks for the report!  The problem is because the thermal subsystem calls
> > idr_alloc() passing a NULL pointer for the data.  I have fixed this problem
> > in my git tree but haven't sent the patch to Andrew yet.  This patch should
> > fix the problem for you:
>  
> > http://git.infradead.org/users/willy/linux-dax.git/commitdiff/c52eeed7b759c3
> > fefe9b7f1b0a17a438df6950f3
> 
> yay, this fixes the issue. This should definitly go as fix into 4.10-rc and 
> when 
> you send it to Andrew you can add my
> 
> Tested-by: Heiko Stuebner 
> 
>  
> > Now ... thermal is actually using an IDR when it could save memory by using
> > an IDA.  Are you interested in doing that conversion?
> 
> That would more be a question for Eduardo and Rui :-) . I'm just in the 
> process of tracking down the smallish issues on my Rockchip board that are 
> poping up during the merge-window.

I would prefer to get the right implementation, and fixing thermal core by 
using IDA.

But given that this touches thermal core, we need an agreement with Rui
too.

> 
> 
> Anyway, thanks for pointing to the fix
> Heiko


RE: [PATCH] block: loose check on sg gap

2016-12-19 Thread Dexuan Cui
> From: Jens Axboe [mailto:ax...@fb.com]
> Sent: Tuesday, December 20, 2016 10:31
> To: Ming Lei 
> Cc: Linux Kernel Mailing List ; linux-block 
>  bl...@vger.kernel.org>; Christoph Hellwig ; Dexuan Cui
> ; Vitaly Kuznetsov ; Keith Busch
> ; Hannes Reinecke ; Mike Christie
> ; Martin K. Petersen ;
> Toshi Kani ; Dan Williams ;
> Damien Le Moal 
> Subject: Re: [PATCH] block: loose check on sg gap
> 
> On 12/19/2016 07:07 PM, Ming Lei wrote:
> > On Sun, Dec 18, 2016 at 12:49 AM, Jens Axboe  wrote:
> >> On 12/17/2016 03:49 AM, Ming Lei wrote:
> >>> If the last bvec of the 1st bio and the 1st bvec of the next
> >>> bio are contineous physically, and the latter can be merged
> >>> to last segment of the 1st bio, we should think they don't
> >>> violate sg gap(or virt boundary) limit.
> >>>
> >>> Both Vitaly and Dexuan reported lots of unmergeable small bios
> >>> are observed when running mkfs on Hyper-V virtual storage, and
> >>> performance becomes quite low, so this patch is figured out for
> >>> fixing the performance issue.
> >>>
> >>> The same issue should exist on NVMe too sine it sets virt boundary too.
> >>
> >> It looks pretty reasonable to me. I'll queue it up for some testing,
> >> changes like this always make me a little nervous.
> >
> > Understood.
> >
> > But given it is still in early stage of 4.10 cycle, seems fine to expose
> > it now, and we should have enough time to fix it if there might be
> > regressions.
> >
> > BTW, it passes my xfstest(ext4) over sata/NVMe.
> 
> It's been fine here in testing, too. I'm not worried about performance
> regressions, those we can always fix. Merging makes me worried about
> corruption, and those regressions are much worse.
> 
> Any reason we need to rush this? I'd be more comfortable pushing this to
> 4.11, unless there are strong reasons this should make 4.10.
> 
> --
> Jens Axboe

Hi Jens,

As far as I know, the patch is important to popular Linux distros,
e.g. at least Ubuntu 14.04.5, 16.x and RHEL 7.3, when they run on 
Hyper-V/Azure, because they can suffer from a pretty bad throughput/latency
in some cases, e.g. mkfs.ext4 for a 100GB partition can take 8 minutes, but
with the patch, it only takes 1 second.

Thanks,
-- Dexuan


Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2016-12-19 Thread Andy Lutomirski
On Mon, Dec 19, 2016 at 7:18 PM, Alexei Starovoitov
 wrote:
> On Mon, Dec 19, 2016 at 04:25:32PM -0800, Andy Lutomirski wrote:
>> I think we're still talking past each other.  A big part of the point
>> of changing it is that none of this is specific to bpf.  You could (in
>
> the hooks and context passed into the program is very much bpf specific.
> That's what I've been trying to convey all along.

You mean BPF_CGROUP_RUN_PROG_INET_SOCK(sk)?  There is nothing bpf
specfic about the hook except that the name of this macro has "BPF" in
it.  There is nothing whatsoever that's bpf-specific about the context
-- sk is not bpf-specific at all.

The only thing bpf-specific about it is that it currently only invokes
bpf programs.  That could easily change.

>
>> theory -- I'm not proposing implementing these until there's demand)
>> have:
>>
>> net.socket_create_filter = "none": no filter
>> net.socket_create_filter = "bpf:baadf00d": bpf filter
>
> i'm assuming 'baadf00d' is bpf program fd expressed a text string?
> and kernel needs to parse above? will you allow capital and lower
> case for 'bpf:' ? and mixed case too? spaces and tabs allowed or not?
> can program fd expressed as decimal or hex or both?
> how do you return the error? as a text string for user space
> to parse?

No.  The kernel does not parse it because you cannot write this to the
file.  You set a bpf filter with ioctl and pass an fd.  If you *read*
the file, you get the same bpf program hash that fdinfo on the bpf
object would show -- this is for debugging and (eventually) CRIU.

>
>> net.socket_create_filter = "iptables:foobar": some iptables thingy
>> net.socket_create_filter = "nft:blahblahblah": some nft thingy
>
> iptables/nft are not applicable to 'bpf_socket_create' which
> looks into:
> struct bpf_sock {
> __u32 bound_dev_if;
> __u32 family;
> __u32 type;
> __u32 protocol;
> };
> so don't fit as an example.

The code that takes a 'struct sock' and sets up bpf_sock is
bpf-specific and would obviously not be used for non-bpf filter.  But
if you had a filter that was just a like of address families, that
filter would look at struct sock and do its own thing.  iptables
wouldn't make sense for a socket creation filter, but it would make
perfect sense for an ingress filter.  Obviously the bpf-specific state
object wouldn't be used, but it would still be a hook, invoked from
the same network code, guarded by the same static key, looking at the
same skb.

>
>> net.socket_create_filter = "disallow": no sockets created period
>> net.socket_create_filter = "address_family_list:1,2,3": allow AF 1, 2, and 3
>
> so you're proposing to add a bunch of hard coded logic to the kernel.
> First to parse such text into some sort of syntax tree or list/set
> and then have hard coded logic specifically for these two use cases?
> While above two can be implemented as trivial bpf programs already?!
> That goes 180% degree vs bpf philosophy. bpf is about moving
> the specific code out of the kernel and keeping kernel generic that
> it can solve as many use cases as possible by being programmable.

I'm not seriously proposing implementing these.  My point is that
*bpf*, while wonderful, is not the be-all-and-end-all of kernel
configurability, and other types of hooks might want to be hooked in
here.

> ...
>> What exactly isn't sensible about using cgroup_bpf for containers?
>
> my use case is close to zero overhead application network monitoring.

So if I set up a cgroup that's monitored and call it /cgroup/a and
enable delegation and if the program running there wants to do its own
monitoring in /cgroup/a/b (via delegation), then you really want the
outer monitor to silently drop events coming from /cgroup/a/b?

>
>> >> > As you're pointing out, in case of security, we probably
>> >> > want to preserve original bpf program that should always be
>> >> > run first and only after it returned 'ok' (we'd need to define
>> >> > what 'ok' means in secruity context) run program attached to 
>> >> > sub-hierarchy.
>> >>
>> >> It's already defined AFAICT.  1 means okay.  0 means not okay.
>> >
>> > sorry that doesn't make any sense. For seccomp we have a set of
>> > ranges that mean different things. Here you're proposing to
>> > hastily assign 1 and 0 ? How is that extensible?
>> > We need to carefully think through what should be the semantics
>> > of attaching multiple programs, consider performance implications,
>> > return codes and so on.
>>
>> You already assigned it.  The return value of the bpf program, loaded
>> in Linus' tree today, tells the kernel whether to accept or reject.
>
> yes. that's what the program tells the hook.
> I'm saying that whenever we have a link list of the programs
> interaction between them may or may not be expressible with 1/0
> and I don't want to rush such decision.

Then disallow nesting.  You're welcome to not rush the decision, but
that's not what you've done.  If 4.10 is released as is, you've made

Re: [PATCH v3 2/5] lib: Add Sed-opal library

2016-12-19 Thread kbuild test robot
Hi Scott,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.9 next-20161219]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Scott-Bauer/include-Add-definitions-for-sed/20161220-110214
config: i386-randconfig-r0-201651 (attached as .config)
compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from lib/sed.c:20:0:
>> include/linux/sed-opal.h:37:40: warning: 'struct request_queue' declared 
>> inside parameter list
struct opal_dev *alloc_opal_dev(struct request_queue *q);
   ^
>> include/linux/sed-opal.h:37:40: warning: its scope is only this definition 
>> or declaration, which is probably not what you want
   lib/sed.c: In function 'fdev_sed_ioctl':
   lib/sed.c:161:12: error: dereferencing pointer to incomplete type 'struct 
file'
 if (!filep->f_sedctx || !filep->f_sedctx->ops || !filep->f_sedctx->dev)
   ^
--
   In file included from lib/sed-opal.c:29:0:
>> include/linux/sed-opal.h:37:40: warning: 'struct request_queue' declared 
>> inside parameter list
struct opal_dev *alloc_opal_dev(struct request_queue *q);
   ^
>> include/linux/sed-opal.h:37:40: warning: its scope is only this definition 
>> or declaration, which is probably not what you want
   lib/sed-opal.c: In function 'opal_discovery0_end':
   lib/sed-opal.c:276:6: warning: unused variable 'error' [-Wunused-variable]
 int error = 0;
 ^
   lib/sed-opal.c: In function 'response_parse':
   lib/sed-opal.c:793:15: error: invalid storage class for function 
'response_get_string'
static size_t response_get_string(const struct parsed_resp *resp, int n,
  ^
   lib/sed-opal.c:793:1: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]
static size_t response_get_string(const struct parsed_resp *resp, int n,
^
   lib/sed-opal.c:817:12: error: invalid storage class for function 
'response_get_u64'
static u64 response_get_u64(const struct parsed_resp *resp, int n)
   ^
   lib/sed-opal.c:846:11: error: invalid storage class for function 
'response_status'
static u8 response_status(const struct parsed_resp *resp)
  ^
   lib/sed-opal.c:866:12: error: invalid storage class for function 
'parse_and_check_status'
static int parse_and_check_status(struct opal_dev *dev)
   ^
   lib/sed-opal.c:883:13: error: invalid storage class for function 
'clear_opal_cmd'
static void clear_opal_cmd(struct opal_cmd *cmd)
^
   lib/sed-opal.c:891:12: error: invalid storage class for function 
'start_opal_session_cont'
static int start_opal_session_cont(struct opal_dev *dev)
   ^
   lib/sed-opal.c:913:20: error: invalid storage class for function 
'opal_dev_get'
static inline void opal_dev_get(struct opal_dev *dev)
   ^
   lib/sed-opal.c:918:20: error: invalid storage class for function 
'opal_dev_put'
static inline void opal_dev_put(struct opal_dev *dev)
   ^
   lib/sed-opal.c:923:12: error: invalid storage class for function 
'add_suspend_info'
static int add_suspend_info(struct opal_dev *dev, struct opal_suspend_data 
*sus)
   ^
   lib/sed-opal.c:949:12: error: invalid storage class for function 
'end_session_cont'
static int end_session_cont(struct opal_dev *dev)
   ^
   lib/sed-opal.c:956:12: error: invalid storage class for function 
'finalize_and_send'
static int finalize_and_send(struct opal_dev *dev, struct opal_cmd *cmd,
   ^
   lib/sed-opal.c:972:12: error: invalid storage class for function 'gen_key'
static int gen_key(struct opal_dev *dev)
   ^
   lib/sed-opal.c:1002:12: error: invalid storage class for function 
'get_active_key_cont'
static int get_active_key_cont(struct opal_dev *dev)
   ^
   lib/sed-opal.c:1027:12: error: invalid storage class for function 
'get_active_key'
static int get_active_key(struct opal_dev *dev)
   ^
   lib/sed-opal.c:1067:12: error: invalid storage class for function 
'generic_lr_enable_disable'
static int generic_lr_enable_disable(struct opal_cmd *cmd,
   ^
   lib/sed-opal.c:1107:19: error: invalid storage class for function 
'enable_global_lr'
static inline int enable_global_lr(struct opal_cmd *cmd, u8 *uid,
  ^
   lib/sed-opal.c:1118:

Re: [PATCH v2 2/3] Bluetooth: btusb: Add out-of-band wakeup support

2016-12-19 Thread Brian Norris
On Mon, Dec 19, 2016 at 05:46:19PM -0800, Rajat Jain wrote:
> On Mon, Dec 19, 2016 at 3:10 PM, Brian Norris  
> wrote:
> > On Fri, Dec 16, 2016 at 11:30:03AM -0800, Rajat Jain wrote:
> >> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
> >> index ce22cef..beca4e9 100644
> >> --- a/drivers/bluetooth/btusb.c
> >> +++ b/drivers/bluetooth/btusb.c

> >> +static const struct of_device_id btusb_match_table[] = {
> >> + { .compatible = "usb1286,204e" },
> >> + { }
> >> +};
> >> +MODULE_DEVICE_TABLE(of, btusb_match_table);
> >
> > You define a match table here, but you also define essentially same
> > table for Marvell-specific additions in patch 3. It looks like maybe
> > it's legal to have more than one OF table in a module? But it seems like
> > it would get confusing, besides being somewhat strange to maintain. It
> > might also produce duplicate 'modinfo' output.
> >
> > If you really want to independently opt into device-tree-specified
> > interrupts vs. Marvell-specific interrrupt configuration, then you
> > should probably just merge the latter into the former table, and
> > implement a Marvell/GPIO flag to stick in the .data field of this table.
> 
> The data we want to stick (The pin number on the chip) is board
> specific rather than being chip specific, and hence .data may not be
> useful here.

I just meant that if you conceptually wanted Marvell devices to "opt in"
to this, then you could turn the .data field into some kind of flag or
struct for describing capabilities (e.g., MARVELL_GPIO_WAKE or similar),
effectively merging the two tables instead of having two
mostly-overlapping tables.

But you could do the same by putting such flag(s) in the
{btusb,blacklist}_table[], or add such flag(s) later whenever there's a
Marvell device that doesn't support this feature.

> > Or it might be fine to drop one or both "match" checks. Particularly for
> > the Marvell-specific stuff, it's probably fair just to check if it has
> > an ->of_node and if 'id->driver_info & BTUSB_MARVELL'. Any other Marvell
> > device-specific quirks could probably be keyed off of the
> > (weirdly-named?) blacklist_table[], which already matches PID/VID.
> 
> Yup, I think I can remove the compatible string checks.
> 
> I'll be sending a V3.

Great!

Brian


Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

2016-12-19 Thread Baolin Wang
On 19 December 2016 at 20:13, Mathias Nyman  wrote:
> On 19.12.2016 13:34, Baolin Wang wrote:
>>
>> Hi Mathias,
>>
>> On 19 December 2016 at 18:33, Mathias Nyman
>>  wrote:
>>>
>>> On 13.12.2016 05:21, Baolin Wang wrote:


 Hi Mathias,

 On 12 December 2016 at 23:52, Mathias Nyman
  wrote:
>
>
> On 05.12.2016 09:51, Baolin Wang wrote:
>>
>>
>>
>> If a command event is found on the event ring during an interrupt,
>> we need to stop the command timer with del_timer(). Since del_timer()
>> can fail if the timer is running and waiting on the xHCI lock, then
>> it maybe get the wrong timeout command in
>> xhci_handle_command_timeout()
>> if host fetched a new command and updated the xhci->current_cmd in
>> handle_cmd_completion(). For this situation, we need a way to signal
>> to the command timer that everything is fine and it should exit.
>
>
>
>
> Ah, right, this could actually happen.
>
>>
>>
>> We should introduce a counter (xhci->current_cmd_pending) for the
>> number
>> of pending commands. If we need to cancel the command timer and
>> del_timer()
>> succeeds, we decrement the number of pending commands. If del_timer()
>> fails,
>> we leave the number of pending commands alone.
>>
>> For handling timeout command, in xhci_handle_command_timeout() we will
>> check
>> the counter after decrementing it, if the counter
>> (xhci->current_cmd_pending)
>> is 0, which means xhci->current_cmd is the right timeout command. If
>> the
>> counter (xhci->current_cmd_pending) is greater than 0, which means
>> current
>> timeout command has been handled by host and host has fetched new
>> command
>> as
>> xhci->current_cmd, then just return and wait for new current command.
>
>
>
>
> A counter like this could work.
>
> Writing the abort bit can generate either ABORT+STOP, or just STOP
> event, this seems to cover both.
>
> quick check, case 1: timeout and cmd completion at the same time.
>
> cpu1cpu2
>
> queue_command(first), p++ (=1)
> queue_command(more),
> --completion irq fires---- timer times out at same
> time--
> handle_cmd_completion() handle_cmd_timeout(),)
> lock(xhci_lock  )   spin_on(xhci_lock)
> del_timer() fail, p (=1, nochange)
> cur_cmd = list_next(), p++ (=2)
> unlock(xhci_lock)
>   lock(xhci_lock)
>   p-- (=1)
>   if (p > 0), exit
> OK works
>
> case 2: normal timeout case with ABORT+STOP, no race.
>
> cpu1cpu2
>
> queue_command(first), p++ (=1)
> queue_command(more),
>   handle_cmd_timeout()
>   p-- (P=0), don't exit
>   mod_timer(), p++ (P=1)
>   write_abort_bit()
> handle_cmd_comletion(ABORT)
> del_timer(), ok, p-- (p = 0)
> handle_cmd_completion(STOP)
> del_timer(), fail, (P=0)
> handle_stopped_cmd_ring()
> cur_cmd = list_next(), p++ (=1)
> mod_timer()
>
> OK, works, and same for just STOP case, with the only difference that
> during handle_cmd_completion(STOP) p is decremented (p--)



 Yes, that's the cases what I want to handle, thanks for your explicit
 explanation.

>>>
>>> Gave this some more thought over the weekend, and this implementation
>>> doesn't solve the case when the last command times out and races with the
>>> completion handler:
>>>
>>> cpu1cpu2
>>>
>>> queue_command(first), p++ (=1)
>>> --completion irq fires---- timer times out at same time--
>>> handle_cmd_completion() handle_cmd_timeout(),)
>>> lock(xhci_lock )spin_on(xhci_lock)
>>> del_timer() fail, p (=1, nochange)
>>> no more commands, P (=1, nochange)
>>> unlock(xhci_lock)
>>>  lock(xhci_lock)
>>>  p-- (=0)
>>>  p == 0, continue, even if we
>>> should
>>> not.
>>>For this we still need to rely
>>> on
>>> checking cur_cmd == NULL in the timeout function.
>>> (Baolus patch sets it to NULL if there are no more commands pending)
>>
>>
>> As I pointed out in patch 1 of this patchset, this patchset is based
>> on Lu Baolu's new fix patch:
>> usb: xhci: fix possible wild pointer
>> https://www.spinics.net/lists/linux-usb/msg150219.html
>>
>> After applying Baolu's patch, after decrement

Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2016-12-19 Thread Alexei Starovoitov
On Mon, Dec 19, 2016 at 04:25:32PM -0800, Andy Lutomirski wrote:
> On Mon, Dec 19, 2016 at 4:02 PM, Alexei Starovoitov
>  wrote:
> > On Mon, Dec 19, 2016 at 01:23:50PM -0800, Andy Lutomirski wrote:
> >> On Mon, Dec 19, 2016 at 12:56 PM, Alexei Starovoitov
> >>  wrote:
> >> > On Sat, Dec 17, 2016 at 10:18:44AM -0800, Andy Lutomirski wrote:
> >> >> Hi all-
> >> >>
> >> >> I apologize for being rather late with this.  I didn't realize that
> >> >> cgroup-bpf was going to be submitted for Linux 4.10, and I didn't see
> >> >> it on the linux-api list, so I missed the discussion.
> >> >>
> >> >> I think that the inet ingress, egress etc filters are a neat feature,
> >> >> but I think the API has some issues that will bite us down the road
> >> >> if it becomes stable in its current form.
> >> >>
> >> >> Most of the problems I see are summarized in this transcript:
> >> >>
> >> >> # mkdir cg2
> >> >> # mount -t cgroup2 none cg2
> >> >> # mkdir cg2/nosockets
> >> >> # strace cgrp_socket_rule cg2/nosockets/ 0
> >> >> ...
> >> >> open("cg2/nosockets/", O_RDONLY|O_DIRECTORY) = 3
> >> >>
> >> >>  You can modify a cgroup after opening it O_RDONLY?
> >> >>
> >> >> bpf(BPF_PROG_LOAD, {prog_type=0x9 /* BPF_PROG_TYPE_??? */, insn_cnt=2,
> >> >> insns=0x7fffe3568c10, license="GPL", log_level=1, log_size=262144,
> >> >> log_buf=0x6020c0, kern_version=0}, 48) = 4
> >> >>
> >> >>  This is fine.  The bpf() syscall manipulates bpf objects.
> >> >>
> >> >> bpf(0x8 /* BPF_??? */, 0x7fffe3568bf0, 48) = 0
> >> >>
> >> >>  This is not so good:
> >> >> 
> >> >>  a) The bpf() syscall is supposed to manipulate bpf objects.  This
> >> >> is manipulating a cgroup.  There's no reason that a socket 
> >> >> creation
> >> >> filter couldn't be written in a different language (new iptables
> >> >> table?  Simple list of address families?), but if that happened,
> >> >> then using bpf() to install it would be entirely nonsensical.
> >> >
> >> > I don't see why it's _modifing_ the cgroup. I'm looking at it as
> >> > network stack is using cgroup as an application group that should
> >> > invoke bpf program at the certain point in the stack.
> >> > imo cgroup management is orthogonal.
> >>
> >> It is literally modifying the struct cgroup, and, as a practical
> >> matter, it's causing membership in the cgroup to have a certain
> >> effect.  But rather than pointless arguing, let me propose an
> >> alternative API that I think solves most of the problems here.
> >>
> >> In my model, BPF_PROG_ATTACH and BPF_PROG_DETACH go away completely.
> >> Instead, the cgroup gets three new control files:
> >> "net.ingress_filter", "net.egress_filter", and
> >> "net.socket_create_filter".  Initially, if you read these files, you
> >> see "none\n".
> >>
> >> To attach a bpf filter, you open the file for write and do an ioctl on
> >> it.  After doing the ioctl, if you read the file, you'll see
> >> "bpf:[hash]\n" where "[hash]" is exactly what you'd see in fdinfo for
> >> the bpf program.
> >>
> >> To detach any type of filter, bpf or otherwise, you open the file for
> >> write and write "none\n" (or just "none").
> >>
> >> If you write anything else to the file, you get -EINVAL.  But, if
> >> someone writes a new type of filter (perhaps a simple list of address
> >> families), maybe you can enable the filter by writing something
> >> appropriate to the file.
> >
> > I see no difference in what you're proposing vs what is already implemented
> > from feature set point of view, but the file approach is very ugly, since
> > it's a mismatch to FD style access that bpf is using everywhere.
> > In your proposal you'd also need to add bpf prefix everywhere.
> > So the control file names should be bpf_inet_ingress, bpf_inet_egress
> > and bpf_socket_create.
> 
> I think we're still talking past each other.  A big part of the point
> of changing it is that none of this is specific to bpf.  You could (in

the hooks and context passed into the program is very much bpf specific.
That's what I've been trying to convey all along.

> theory -- I'm not proposing implementing these until there's demand)
> have:
> 
> net.socket_create_filter = "none": no filter
> net.socket_create_filter = "bpf:baadf00d": bpf filter

i'm assuming 'baadf00d' is bpf program fd expressed a text string?
and kernel needs to parse above? will you allow capital and lower
case for 'bpf:' ? and mixed case too? spaces and tabs allowed or not?
can program fd expressed as decimal or hex or both?
how do you return the error? as a text string for user space
to parse?

> net.socket_create_filter = "iptables:foobar": some iptables thingy
> net.socket_create_filter = "nft:blahblahblah": some nft thingy

iptables/nft are not applicable to 'bpf_socket_create' which
looks into:
struct bpf_sock {
__u32 bound_dev_if;
__u32 family;
__u32 type;
__u32 protocol;
};
so don't fit as an example.

> net.socket_create_filter = "disallow": no so

[PATCH] sparc: use symbolic names for tsb indexing

2016-12-19 Thread Mike Kravetz
Use symbolic names MM_TSB_BASE and MM_TSB_HUGE instead of numeric values
0 and 1 in __tsb_context_switch.  Code cleanup only, no functional change.

Suggested-by: Sam Ravnborg 
Signed-off-by: Mike Kravetz 
---
 arch/sparc/include/asm/mmu_context_64.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/include/asm/mmu_context_64.h 
b/arch/sparc/include/asm/mmu_context_64.h
index b84be67..d031799 100644
--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -35,15 +35,15 @@ void __tsb_context_switch(unsigned long pgd_pa,
 static inline void tsb_context_switch(struct mm_struct *mm)
 {
__tsb_context_switch(__pa(mm->pgd),
-&mm->context.tsb_block[0],
+&mm->context.tsb_block[MM_TSB_BASE],
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
-(mm->context.tsb_block[1].tsb ?
- &mm->context.tsb_block[1] :
+(mm->context.tsb_block[MM_TSB_HUGE].tsb ?
+ &mm->context.tsb_block[MM_TSB_HUGE] :
  NULL)
 #else
 NULL
 #endif
-, __pa(&mm->context.tsb_descr[0]));
+, __pa(&mm->context.tsb_descr[MM_TSB_BASE]));
 }
 
 void tsb_grow(struct mm_struct *mm,
-- 
2.7.4



Re: [PATCH v3 2/5] lib: Add Sed-opal library

2016-12-19 Thread kbuild test robot
Hi Scott,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.9 next-20161219]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Scott-Bauer/include-Add-definitions-for-sed/20161220-110214
config: i386-tinyconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from lib/sed.c:20:0:
>> include/linux/sed-opal.h:37:40: warning: 'struct request_queue' declared 
>> inside parameter list will not be visible outside of this definition or 
>> declaration
struct opal_dev *alloc_opal_dev(struct request_queue *q);
   ^
   lib/sed.c: In function 'fdev_sed_ioctl':
>> lib/sed.c:161:12: error: dereferencing pointer to incomplete type 'struct 
>> file'
 if (!filep->f_sedctx || !filep->f_sedctx->ops || !filep->f_sedctx->dev)
   ^~
--
   In file included from lib/sed-opal.c:29:0:
>> include/linux/sed-opal.h:37:40: warning: 'struct request_queue' declared 
>> inside parameter list will not be visible outside of this definition or 
>> declaration
struct opal_dev *alloc_opal_dev(struct request_queue *q);
   ^
   lib/sed-opal.c: In function 'opal_discovery0_end':
   lib/sed-opal.c:276:6: warning: unused variable 'error' [-Wunused-variable]
 int error = 0;
 ^
   lib/sed-opal.c: In function 'response_parse':
>> lib/sed-opal.c:793:15: error: invalid storage class for function 
>> 'response_get_string'
static size_t response_get_string(const struct parsed_resp *resp, int n,
  ^~~
>> lib/sed-opal.c:793:1: warning: ISO C90 forbids mixed declarations and code 
>> [-Wdeclaration-after-statement]
static size_t response_get_string(const struct parsed_resp *resp, int n,
^~
>> lib/sed-opal.c:817:12: error: invalid storage class for function 
>> 'response_get_u64'
static u64 response_get_u64(const struct parsed_resp *resp, int n)
   ^~~~
>> lib/sed-opal.c:846:11: error: invalid storage class for function 
>> 'response_status'
static u8 response_status(const struct parsed_resp *resp)
  ^~~
>> lib/sed-opal.c:866:12: error: invalid storage class for function 
>> 'parse_and_check_status'
static int parse_and_check_status(struct opal_dev *dev)
   ^~
>> lib/sed-opal.c:883:13: error: invalid storage class for function 
>> 'clear_opal_cmd'
static void clear_opal_cmd(struct opal_cmd *cmd)
^~
>> lib/sed-opal.c:891:12: error: invalid storage class for function 
>> 'start_opal_session_cont'
static int start_opal_session_cont(struct opal_dev *dev)
   ^~~
>> lib/sed-opal.c:913:20: error: invalid storage class for function 
>> 'opal_dev_get'
static inline void opal_dev_get(struct opal_dev *dev)
   ^~~~
>> lib/sed-opal.c:918:20: error: invalid storage class for function 
>> 'opal_dev_put'
static inline void opal_dev_put(struct opal_dev *dev)
   ^~~~
>> lib/sed-opal.c:923:12: error: invalid storage class for function 
>> 'add_suspend_info'
static int add_suspend_info(struct opal_dev *dev, struct opal_suspend_data 
*sus)
   ^~~~
>> lib/sed-opal.c:949:12: error: invalid storage class for function 
>> 'end_session_cont'
static int end_session_cont(struct opal_dev *dev)
   ^~~~
>> lib/sed-opal.c:956:12: error: invalid storage class for function 
>> 'finalize_and_send'
static int finalize_and_send(struct opal_dev *dev, struct opal_cmd *cmd,
   ^
>> lib/sed-opal.c:972:12: error: invalid storage class for function 'gen_key'
static int gen_key(struct opal_dev *dev)
   ^~~
>> lib/sed-opal.c:1002:12: error: invalid storage class for function 
>> 'get_active_key_cont'
static int get_active_key_cont(struct opal_dev *dev)
   ^~~
>> lib/sed-opal.c:1027:12: error: invalid storage class for function 
>> 'get_active_key'
static int get_active_key(struct opal_dev *dev)
   ^~
>> lib/sed-opal.c:1067:12: error: invalid storage class for function 
>> 

Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2016-12-19 Thread Andy Lutomirski
On Mon, Dec 19, 2016 at 6:52 PM, David Ahern  wrote:
> On 12/19/16 6:56 PM, Andy Lutomirski wrote:
>> On Mon, Dec 19, 2016 at 5:44 PM, David Ahern  wrote:
>>> On 12/19/16 5:25 PM, Andy Lutomirski wrote:
 net.socket_create_filter = "none": no filter
 net.socket_create_filter = "bpf:baadf00d": bpf filter
 net.socket_create_filter = "disallow": no sockets created period
 net.socket_create_filter = "iptables:foobar": some iptables thingy
 net.socket_create_filter = "nft:blahblahblah": some nft thingy
 net.socket_create_filter = "address_family_list:1,2,3": allow AF 1, 2, and 
 3
>>>
>>> Such a scheme works for the socket create filter b/c it is a very simple 
>>> use case. It does not work for the ingress and egress which allow generic 
>>> bpf filters.
>>
>> Can you elaborate on what goes wrong?  (Obviously the
>> "address_family_list" example makes no sense in that context.)
>
> Being able to dump a filter or see that one exists would be a great add-on, 
> but I don't see how 'net.socket_create_filter = "bpf:baadf00d"' is a viable 
> API for loading generic BPF filters. Simple cases like "disallow" are easy -- 
> just return 0 in the filter, no complicated BPF code needed. The rest are 
> specific cases of the moment which goes against the intent of ebpf and 
> generic programmability.

Oh -- I'm not proposing that at all.  Let me clarify.  For the bpf
case, if you *read* the file, you'd see "bpf:baadf00d".  But writing
"bpf:baadf00d" is nonsense and would give you -EINVAL.  Instead you
install a bpf filter by opening the file for write (O_RDWR or
O_WRONLY) and doing ioctl(cgroup_socket_create_file_fd,
CGROUP_IOCTL_SET_BPF_FILTER, bpf_fd);  It's kind of like
BPF_PROG_ATTACH except that it respects filesystem permissions, it
isn't in the bpf() multiplexer, the filter being set is implied (by
the fd in use), and everything is nicely seccompable.

To remove the filter, you write "none" or "none\n" to the file.

As a future extension, if someone wanted more than one filter to be
able to coexist in the cgroup socket_create_filter slot, you could
plausibly do 'echo disallow >>net.socket_create_filter' or use a new
ioctl CGROUP_IOCTL_APPEND_BPF_FILTER, and then you'd read the file and
see more than one line.  But this would be a future extension and may
never be needed.

>>
>> a) sub-cgroups cannot have a filter at all of the parent has a filter.
>> (This is the "punt" approach -- it lets different semantics be
>> assigned later without breaking userspace.)
>>
>> b) sub-cgroups can have a filter if a parent does, too.  The semantics
>> are that the sub-cgroup filter runs first and all side-effects occur.
>> If that filter says "reject" then ancestor filters are skipped.  If
>> that filter says "accept", then the ancestor filter is run and its
>> side-effects happen as well.  (And so on, all the way up to the root.)
>
> That comes with a big performance hit for skb / data path cases.
>
> I'm riding my use case on Daniel's work, and as I understand it the nesting 
> case has been discussed. I'll defer to Daniel and Alexei on this part.

I'm not sure I buy the performance hit.  If you do it naively, then
performance will indeed suck.  But there's already a bunch of code in
there to pre-populate a filter list for faster use.  Currently, we
have:

struct cgroup_bpf {
/*
 * Store two sets of bpf_prog pointers, one for programs that are
 * pinned directly to this cgroup, and one for those that are effective
 * when this cgroup is accessed.
 */
struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE];
struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE];
};

in struct cgroup, there's a 'struct cgroup_bpf bpf;'.

This would change to something like:

struct cgroup_filter_slot {
  struct bpf_prog *effective;
  struct cgroup_filter_slot *next;
  struct bpf_prog *local;
}

local is NULL unless *this* cgroup has a filter.  effective points to
the bpf_prog that's active in this cgroup or the nearest ancestor that
has a filter.  next is NULL if there are no filters higher in the
chain or points to the next slot that has a filter.  struct cgroup
has:

struct cgroup_filter_slot filters[MAX_BPF_ATTACH_TYPE];

To evaluate it, you do:

struct cgroup_filter_slot *slot = &cgroup->slot[the index];

if (!slot->effective)
  return;

do {
  evaluate(slot->effective);
  slot = slot->next;
} while (unlikely(slot));

The old code was one branch per evaluation.  The new code is n+1
branches where n is the number of filters, so it's one extra branch in
the worst case.  But the extra branch is cache-hot (the variable is
right next to slot->effective, which is needed regardless) and highly
predictable (in the case where nesting isn't used, the branch is not
taken, and it's a backward branch which most CPUs will predict as not
taken).

I expect that, on x86, this adds at most a cycle or two and quite
possibly adds no measurable overhead at all.


Re: [RFC 10/10] kmod: add a sanity check on module loading

2016-12-19 Thread Rusty Russell
"Luis R. Rodriguez"  writes:
> On Dec 16, 2016 9:54 PM, "Rusty Russell"  wrote:
> > AFAICT the mistake here is that kmod is returning "done, OK" when the
> > module it is trying to load is already loading (but not finished
> > loading).  That's the root problem; it's an attempt at optimization by
> > kmod which goes awry.
>
> This is true! To be precise though the truth of the matter is that kmod'd
> respective usermode helper: modprobe can be buggy and may lie to us. It may
> allow request_module() to return 0 but since we don't validate it, any
> assumption we make can be deadly. In the case of get_fs_type() its a null
> dereference.

Wait, what??  I can't see that in get_fs_type, which hasn't changed
since 2013.  If a caller is assuming get_fs_type() doesn't return NULL,
they're broken and need fixing of course:

struct file_system_type *get_fs_type(const char *name)
{
struct file_system_type *fs;
const char *dot = strchr(name, '.');
int len = dot ? dot - name : strlen(name);

fs = __get_fs_type(name, len);
if (!fs && (request_module("fs-%.*s", len, name) == 0))
fs = __get_fs_type(name, len);

if (dot && fs && !(fs->fs_flags & FS_HAS_SUBTYPE)) {
put_filesystem(fs);
fs = NULL;
}
return fs;
}

Where does this NULL-deref is the module isn't correctly loaded?

> *Iff* we want a sanity check to verify kmod's umh is not lying to us we
> need to verify after 0 was returned that it was not lying to us. Since kmod
> accepts aliases but find_modules_all() only works on the real module name a
> validation check cannot happen when all you have are aliases.

request_module() should block until resolution, but that's fundamentally
a userspace problem.  Let's not paper over it in kernelspace.

> *Iff* we are sure we don't want a validation (or another earlier
> optimization to avoid calling out to modrobe if the alias requested is
> already present, which does the time shaving I mentioned on the tests) then
> naturally no request_module() calls returning 0 can assert information
> about the requested module. I think we might need to change more code if we
> accept we cannot trust request_module() calls, or we accept userspace
> telling the kernel something may mean we sometimes crash. This later
> predicament seems rather odd so hence the patch.
>
> Perhaps in some cases validation of work from a umh is not critical in
> kernel but for request_module() I can tell you that today get_fs_type code
> currently asserts the module found can never be NULL.

OK, what am I missing in the code above?  

> > Looking at the code in the kernel, we *already* get this right: block if
> > a module is still loading anyway.  Once it succeeds we return -EBUSY if
> >
> > it fails we'll proceed to try to load it again.
> >
> > I don't understand what you're trying to fix with adding aliases
> > in-kernel?
>
> Two fold now:
>
> a) validation on request_module() work when an alias is used

But why?

> b) since kmod accepts aliaes, if we get aliases support, it means we could
> *also* preemptively avoid calling out to userspace for modules already
> present.

No, because once we have a module we don't request it: requesting is the
fallback case.

> >> FWIW a few things did occur to me:
> >>
> >> a) list_add_rcu() is used so new modules get added first
> >
> > Only after we're sure that there are no duplicates.
> >
> >
> OK! This is a very critical assertion. I should be able to add a debug
> WARN_ON() should two modules be on the modules list for the same module
> then ?

Yes, names must be unique.

>> b) find_module_all() returns the last module which was added as it
> traverses
>>the module list
>
>> BTW should find_module_all() use rcu to traverse?
>
> Yes; the kallsyms code does this on Oops.  Not really a big issue in
> practice, but a nice fix.
>
> Ok, will bundle into my queue.

Please submit to Jessica for her module queue, as it's orthogonal
AFAICT.

> I will note though that I still think there's a bug in this code --
> upon a failure other "spinning" requests can fail, I believe this may
> be due to not having another state or informing pending modules too
> early of a failure but I haven't been able to prove this conjecture
> yet.

That's possible, but I can't see it from quickly re-checking the code.

The module should be fully usable at this point; the module's init has
been called successfully, so in the case of __get_fs_type() it should
now succeed.  The module cleans up its init section, but that should be
independent.

If there is a race, it's likely to be when some other caller wakes the
queue.  Moving the wakeup as soon as possible should make it easier to
trigger:

diff --git a/kernel/module.c b/kernel/module.c
index f57dd63186e6..78bd89d41a22 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ 

答复: Re: [kbuild-all] 答复: Re: [PATCH 1/2] ocfs2: add kobject for online file check

2016-12-19 Thread Gang He
Hi Fengguang and all,
I will reconstruct my patch set, to make sure each patch can follow this rule.

Thanks
Gang 

>>> Fengguang Wu  2016-12-20 上午 9:47 >>>
Hi Gang,

On Mon, Dec 19, 2016 at 06:43:48PM -0700, Gang He wrote:
>Hello Kbuild,
>Could you build my whole patch set (2 patch)? I think that the code is OK.

We test your whole patch as well as first-N patches, and noticed that
the first-1 patch breaks bisectibility:

>Note: the 
>linux-review/Gang-He/ocfs2-add-kobject-for-online-file-check/20161219-181858 
>HEAD 6ef9256cd25ef72a5e69490cc3dacde95b8e2ac4 builds fine.
>  It only hurts bisectibility.

Thanks,
Fengguang





[PATCH v3 3/3] Bluetooth: btusb: Configure Marvell to use one of the pins for oob wakeup

2016-12-19 Thread Rajat Jain
The Marvell devices may have many gpio pins, and hence for wakeup
on these out-of-band pins, the chip needs to be told which pin is
to be used for wakeup, using an hci command.

Thus, we read the pin number etc from the device tree node and send
a command to the chip.

Signed-off-by: Rajat Jain 
Reviewed-by: Brian Norris 
Acked-by: Rob Herring 
---
v3: * remove the Marvell specific id table and check
* Add reference to marvell-bt-8xxx.txt in btusb.txt
* Add "Reviewed-by" and "Acked-by"
v2: Fix the binding document to specify to use "wakeup" interrupt-name

 Documentation/devicetree/bindings/net/btusb.txt|  3 ++
 .../{marvell-bt-sd8xxx.txt => marvell-bt-8xxx.txt} | 46 +++
 drivers/bluetooth/btusb.c  | 51 ++
 3 files changed, 92 insertions(+), 8 deletions(-)
 rename Documentation/devicetree/bindings/net/{marvell-bt-sd8xxx.txt => 
marvell-bt-8xxx.txt} (50%)

diff --git a/Documentation/devicetree/bindings/net/btusb.txt 
b/Documentation/devicetree/bindings/net/btusb.txt
index 2c0355c..01fa2d4 100644
--- a/Documentation/devicetree/bindings/net/btusb.txt
+++ b/Documentation/devicetree/bindings/net/btusb.txt
@@ -10,6 +10,9 @@ Required properties:
 
  "usb1286,204e" (Marvell 8997)
 
+Also, vendors that use btusb may have device additional properties, e.g:
+Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
+
 Optional properties:
 
   - interrupt-parent: phandle of the parent interrupt controller
diff --git a/Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt 
b/Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
similarity index 50%
rename from Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt
rename to Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
index 6a9a63c..9be1059 100644
--- a/Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt
+++ b/Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
@@ -1,16 +1,21 @@
-Marvell 8897/8997 (sd8897/sd8997) bluetooth SDIO devices
+Marvell 8897/8997 (sd8897/sd8997) bluetooth devices (SDIO or USB based)
 --
+The 8997 devices supports multiple interfaces. When used on SDIO interfaces,
+the btmrvl driver is used and when used on USB interface, the btusb driver is
+used.
 
 Required properties:
 
   - compatible : should be one of the following:
-   * "marvell,sd8897-bt"
-   * "marvell,sd8997-bt"
+   * "marvell,sd8897-bt" (for SDIO)
+   * "marvell,sd8997-bt" (for SDIO)
+   * "usb1286,204e"  (for USB)
 
 Optional properties:
 
   - marvell,cal-data: Calibration data downloaded to the device during
  initialization. This is an array of 28 values(u8).
+ This is only applicable to SDIO devices.
 
   - marvell,wakeup-pin: It represents wakeup pin number of the bluetooth chip.
firmware will use the pin to wakeup host system (u16).
@@ -18,10 +23,15 @@ Optional properties:
  platform. The value will be configured to firmware. This
  is needed to work chip's sleep feature as expected (u16).
   - interrupt-parent: phandle of the parent interrupt controller
-  - interrupts : interrupt pin number to the cpu. Driver will request an irq 
based
-on this interrupt number. During system suspend, the irq will 
be
-enabled so that the bluetooth chip can wakeup host platform 
under
-certain condition. During system resume, the irq will be 
disabled
+  - interrupt-names: Used only for USB based devices (See below)
+  - interrupts : specifies the interrupt pin number to the cpu. For SDIO, the
+driver will use the first interrupt specified in the interrupt
+array. For USB based devices, the driver will use the interrupt
+named "wakeup" from the interrupt-names and interrupt arrays.
+The driver will request an irq based on this interrupt number.
+During system suspend, the irq will be enabled so that the
+bluetooth chip can wakeup host platform under certain
+conditions. During system resume, the irq will be disabled
 to make sure unnecessary interrupt is not received.
 
 Example:
@@ -29,7 +39,9 @@ Example:
 IRQ pin 119 is used as system wakeup source interrupt.
 wakeup pin 13 and gap 100ms are configured so that firmware can wakeup host
 using this device side pin and wakeup latency.
-calibration data is also available in below example.
+
+Example for SDIO device follows (calibration data is also available in
+below example).
 
 &mmc3 {
status = "okay";
@@ -54,3 +66,21 @@ calibration data is also available in below example.
marvell,wakeup-gap-ms = /bits/ 16 <0x64>;
};
 };
+
+Example for USB device:
+
+&usb_host1_ohci {
+status = "okay";
+#address-cells = <1>;
+#size-cells = <0>;
+
+mvl_bt1: bt@1 {
+   com

[PATCH v3 1/3] Bluetooth: btusb: Use an error label for error paths

2016-12-19 Thread Rajat Jain
Use a label to remove the repetetive cleanup, for error cases.

Signed-off-by: Rajat Jain 
Reviewed-by: Brian Norris 
---
v3: Added Brian's "Reviewed-by"
v2: same as v1

 drivers/bluetooth/btusb.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index 2f633df..ce22cef 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -2991,18 +2991,15 @@ static int btusb_probe(struct usb_interface *intf,
err = usb_set_interface(data->udev, 0, 0);
if (err < 0) {
BT_ERR("failed to set interface 0, alt 0 %d", err);
-   hci_free_dev(hdev);
-   return err;
+   goto out_free_dev;
}
}
 
if (data->isoc) {
err = usb_driver_claim_interface(&btusb_driver,
 data->isoc, data);
-   if (err < 0) {
-   hci_free_dev(hdev);
-   return err;
-   }
+   if (err < 0)
+   goto out_free_dev;
}
 
 #ifdef CONFIG_BT_HCIBTUSB_BCM
@@ -3016,14 +3013,16 @@ static int btusb_probe(struct usb_interface *intf,
 #endif
 
err = hci_register_dev(hdev);
-   if (err < 0) {
-   hci_free_dev(hdev);
-   return err;
-   }
+   if (err < 0)
+   goto out_free_dev;
 
usb_set_intfdata(intf, data);
 
return 0;
+
+out_free_dev:
+   hci_free_dev(hdev);
+   return err;
 }
 
 static void btusb_disconnect(struct usb_interface *intf)
-- 
2.8.0.rc3.226.g39d4020



[PATCH v3 2/3] Bluetooth: btusb: Add out-of-band wakeup support

2016-12-19 Thread Rajat Jain
Some onboard BT chips (e.g. Marvell 8997) contain a wakeup pin that
can be connected to a gpio on the CPU side, and can be used to wakeup
the host out-of-band. This can be useful in situations where the
in-band wakeup is not possible or not preferable (e.g. the in-band
wakeup may require the USB host controller to remain active, and
hence consuming more system power during system sleep).

The oob gpio interrupt to be used for wakeup on the CPU side, is
read from the device tree node, (using standard interrupt descriptors).
A devcie tree binding document is also added for the driver. The
compatible string is in compliance with
Documentation/devicetree/bindings/usb/usb-device.txt

Signed-off-by: Rajat Jain 
Reviewed-by: Brian Norris 
---
v3: Add Brian's "Reviewed-by"
v2: * Use interrupt-names ("wakeup") instead of assuming first interrupt.
* Leave it on device tree to specify IRQ flags (level /edge triggered)
* Mark the device as non wakeable on exit.

 Documentation/devicetree/bindings/net/btusb.txt | 40 
 drivers/bluetooth/btusb.c   | 84 +
 2 files changed, 124 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/btusb.txt

diff --git a/Documentation/devicetree/bindings/net/btusb.txt 
b/Documentation/devicetree/bindings/net/btusb.txt
new file mode 100644
index 000..2c0355c
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/btusb.txt
@@ -0,0 +1,40 @@
+Generic Bluetooth controller over USB (btusb driver)
+---
+
+Required properties:
+
+  - compatible : should comply with the format "usbVID,PID" specified in
+Documentation/devicetree/bindings/usb/usb-device.txt
+At the time of writing, the only OF supported devices
+(more may be added later) are:
+
+ "usb1286,204e" (Marvell 8997)
+
+Optional properties:
+
+  - interrupt-parent: phandle of the parent interrupt controller
+  - interrupt-names: (see below)
+  - interrupts : The interrupt specified by the name "wakeup" is the interrupt
+that shall be used for out-of-band wake-on-bt. Driver will
+request this interrupt for wakeup. During system suspend, the
+irq will be enabled so that the bluetooth chip can wakeup host
+platform out of band. During system resume, the irq will be
+disabled to make sure unnecessary interrupt is not received.
+
+Example:
+
+Following example uses irq pin number 3 of gpio0 for out of band wake-on-bt:
+
+&usb_host1_ehci {
+status = "okay";
+#address-cells = <1>;
+#size-cells = <0>;
+
+mvl_bt1: bt@1 {
+   compatible = "usb1286,204e";
+   reg = <1>;
+   interrupt-parent = <&gpio0>;
+   interrupt-name = "wakeup";
+   interrupts = <3 IRQ_TYPE_LEVEL_LOW>;
+};
+};
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index ce22cef..beca4e9 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -24,6 +24,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #include 
@@ -369,6 +371,7 @@ static const struct usb_device_id blacklist_table[] = {
 #define BTUSB_BOOTING  9
 #define BTUSB_RESET_RESUME 10
 #define BTUSB_DIAG_RUNNING 11
+#define BTUSB_OOB_WAKE_DISABLED12
 
 struct btusb_data {
struct hci_dev   *hdev;
@@ -416,6 +419,8 @@ struct btusb_data {
int (*recv_bulk)(struct btusb_data *data, void *buffer, int count);
 
int (*setup_on_usb)(struct hci_dev *hdev);
+
+   int oob_wake_irq;   /* irq for out-of-band wake-on-bt */
 };
 
 static inline void btusb_free_frags(struct btusb_data *data)
@@ -2728,6 +2733,65 @@ static int btusb_bcm_set_diag(struct hci_dev *hdev, bool 
enable)
 }
 #endif
 
+#ifdef CONFIG_PM
+static irqreturn_t btusb_oob_wake_handler(int irq, void *priv)
+{
+   struct btusb_data *data = priv;
+
+   /* Disable only if not already disabled (keep it balanced) */
+   if (!test_and_set_bit(BTUSB_OOB_WAKE_DISABLED, &data->flags)) {
+   disable_irq_nosync(irq);
+   disable_irq_wake(irq);
+   }
+   pm_wakeup_event(&data->udev->dev, 0);
+   return IRQ_HANDLED;
+}
+
+static const struct of_device_id btusb_match_table[] = {
+   { .compatible = "usb1286,204e" },
+   { }
+};
+MODULE_DEVICE_TABLE(of, btusb_match_table);
+
+/* Use an oob wakeup pin? */
+static int btusb_config_oob_wake(struct hci_dev *hdev)
+{
+   struct btusb_data *data = hci_get_drvdata(hdev);
+   struct device *dev = &data->udev->dev;
+   int irq, ret;
+
+   if (!of_match_device(btusb_match_table, dev))
+   return 0;
+
+   /* Move on if no IRQ specified */
+   irq = of_irq_get_byname(dev->of_node, "wakeup");
+   if (irq <= 0) {
+   bt_dev_dbg(hdev, "%s: no OOB Wakeup IRQ in DT", __func__);
+   return 0;
+   }
+
+   set_bit

  1   2   3   4   5   6   7   >