date:20160728

Re: [PATCH v3 0/6] power: add power sequence library

2016-07-28 Thread Peter Chen

On Thu, Jul 28, 2016 at 08:56:40AM -0700, Joshua Clayton wrote:
> Hi, Peter
> 
> On 07/20/2016 02:40 AM, Peter Chen wrote:
> > Hi all,
> >
> > This is a follow-up for my last power sequence framework patch set [1].
> > According to Rob Herring and Ulf Hansson's comments[2], I use a generic
> > power sequence library for parsing the power sequence elements on DT,
> > and implement generic power sequence on library. The host driver
> > can allocate power sequence instance, and calls pwrseq APIs accordingly.
> >
> > In future, if there are special power sequence requirements, the special
> > power sequence library can be created.
> >
> > This patch set is tested on i.mx6 sabresx evk using a dts change, I use
> > two hot-plug devices to simulate this use case, the related binding
> > change is updated at patch [1/6], The udoo board changes were tested
> > using my last power sequence patch set.[3]
> >
> > Except for hard-wired MMC and USB devices, I find the USB ULPI PHY also
> > need to power on itself before it can be found by ULPI bus.
> >
> > [1] http://www.spinics.net/lists/linux-usb/msg142755.html
> > [2] http://www.spinics.net/lists/linux-usb/msg143106.html
> > [3] http://www.spinics.net/lists/linux-usb/msg142815.html
> >
> > Changes for v3:
> > - Delete "power-sequence" property at binding-doc, and change related code
> >   at both library and user code.
> > - Change binding-doc example node name with Rob's comments
> > - of_get_named_gpio_flags only gets the gpio, but without setting gpio 
> > flags,
> >   add additional code request gpio with proper gpio flags
> > - Add Philipp Zabel's Ack and MAINTAINER's entry
> >
> > Changes for v2:
> > - Delete "pwrseq" prefix and clock-names for properties at dt binding
> > - Should use structure not but its pointer for kzalloc
> > - Since chipidea core has no of_node, let core's of_node equals glue
> >   layer's at core's probe
> >
> > Peter Chen (6):
> >   binding-doc: power: pwrseq-generic: add binding doc for generic power
> > sequence library
> >   power: add power sequence library
> >   binding-doc: usb: usb-device: add optional properties for power
> > sequencediff --git a/drivers/usb/core/Makefile 
> > b/drivers/usb/core/Makefile
> > index 9780877..da36b78 100644
> > --- a/drivers/usb/core/Makefile
> > +++ b/drivers/usb/core/Makefile
> > @@ -5,8 +5,9 @@
> >  usbcore-y := usb.o hub.o hcd.o urb.o message.o driver.o
> >  usbcore-y += config.o file.o buffer.o sysfs.o endpoint.o
> >  usbcore-y += devio.o notify.o generic.o quirks.o devices.o
> > -usbcore-y += port.o of.o
> > +usbcore-y += port.o
> >  
> > +usbcore-$(CONFIG_OF)   += of.o
> >  usbcore-$(CONFIG_PCI)  += hcd-pci.o
> >  usbcore-$(CONFIG_ACPI) += usb-acpi.o
> >  
> > diff --git a/drivers/usb/core/of.c b/drivers/usb/core/of.c
> > index 2289700..3de4f88 100644
> > --- a/drivers/usb/core/of.c
> > +++ b/drivers/usb/core/of.c
> > @@ -18,6 +18,7 @@
> >   */
> >  
> >  #include 
> > +#include 
> >   usb: core: add power sequence handling for USB devices
> >   usb: chipidea: let chipidea core device of_node equal's glue layer
> > device of_node
> >   ARM: dts: imx6qdl-udoo.dtsi: fix onboard USB HUB property
> >
> >  .../bindings/power/pwrseq/pwrseq-generic.txt   |  48 +++
> >  .../devicetree/bindings/usb/usb-device.txt |  10 +-
> >  MAINTAINERS|   9 ++
> >  arch/arm/boot/dts/imx6qdl-udoo.dtsi|  26 ++--
> >  drivers/power/Kconfig  |   1 +
> >  drivers/power/Makefile |   1 +
> >  drivers/power/pwrseq/Kconfig   |  20 +++
> >  drivers/power/pwrseq/Makefile  |   2 +
> >  drivers/power/pwrseq/core.c|  71 ++
> >  drivers/power/pwrseq/pwrseq_generic.c  | 151 
> > +
> >  drivers/usb/chipidea/core.c|  10 ++
> >  drivers/usb/core/Makefile  |   1 +
> >  drivers/usb/core/hub.c |  12 +-
> >  drivers/usb/core/hub.h |  12 ++
> >  drivers/usb/core/pwrseq.c  | 100 ++
> >  include/linux/power/pwrseq.h   |  50 +++
> >  16 files changed, 507 insertions(+), 17 deletions(-)
> >  create mode 100644 
> > Documentation/devicetree/bindings/power/pwrseq/pwrseq-generic.txt
> >  create mode 100644 drivers/power/pwrseq/Kconfig
> >  create mode 100644 drivers/power/pwrseq/Makefile
> >  create mode 100644 drivers/power/pwrseq/core.c
> >  create mode 100644 drivers/power/pwrseq/pwrseq_generic.c
> >  create mode 100644 drivers/usb/core/pwrseq.c
> >  create mode 100644 include/linux/power/pwrseq.h
> >
> 
> With these patches our on-board USB hub,
> a Microchip USB2513BI-AEZG,  works exactly as it does
> with the fake regulator kludge, so...
> Tested-by Joshua Clayton 
> 

Thanks.

[PATCH v4 1/2] tpm: devicetree: document properties for cr50

2016-07-28 Thread Andrey Pronin

Add TPM2.0 PTP FIFO compatible SPI interface for chips with Cr50
firmware.

Signed-off-by: Andrey Pronin 
---
 .../devicetree/bindings/security/tpm/cr50_spi.txt   | 21 +
 1 file changed, 21 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/security/tpm/cr50_spi.txt

diff --git a/Documentation/devicetree/bindings/security/tpm/cr50_spi.txt 
b/Documentation/devicetree/bindings/security/tpm/cr50_spi.txt
new file mode 100644
index 000..2fbebd3
--- /dev/null
+++ b/Documentation/devicetree/bindings/security/tpm/cr50_spi.txt
@@ -0,0 +1,21 @@
+* H1 Secure Microcontroller with Cr50 Firmware on SPI Bus.
+
+H1 Secure Microcontroller running Cr50 firmware provides several
+functions, including TPM-like functionality. It communicates over
+SPI using the FIFO protocol described in the PTP Spec, section 6.
+
+Required properties:
+- compatible: Should be "google,cr50".
+- spi-max-frequency: Maximum SPI frequency.
+
+Example:
+
+ {
+status = "okay";
+
+cr50@0 {
+compatible = "google,cr50";
+reg = <0>;
+spi-max-frequency = <80>;
+};
+};
-- 
2.6.6

Re: [PATCH v3 0/6] power: add power sequence library

2016-07-28 Thread Peter Chen

On Thu, Jul 28, 2016 at 08:56:40AM -0700, Joshua Clayton wrote:
> Hi, Peter
> 
> On 07/20/2016 02:40 AM, Peter Chen wrote:
> > Hi all,
> >
> > This is a follow-up for my last power sequence framework patch set [1].
> > According to Rob Herring and Ulf Hansson's comments[2], I use a generic
> > power sequence library for parsing the power sequence elements on DT,
> > and implement generic power sequence on library. The host driver
> > can allocate power sequence instance, and calls pwrseq APIs accordingly.
> >
> > In future, if there are special power sequence requirements, the special
> > power sequence library can be created.
> >
> > This patch set is tested on i.mx6 sabresx evk using a dts change, I use
> > two hot-plug devices to simulate this use case, the related binding
> > change is updated at patch [1/6], The udoo board changes were tested
> > using my last power sequence patch set.[3]
> >
> > Except for hard-wired MMC and USB devices, I find the USB ULPI PHY also
> > need to power on itself before it can be found by ULPI bus.
> >
> > [1] http://www.spinics.net/lists/linux-usb/msg142755.html
> > [2] http://www.spinics.net/lists/linux-usb/msg143106.html
> > [3] http://www.spinics.net/lists/linux-usb/msg142815.html
> >
> > Changes for v3:
> > - Delete "power-sequence" property at binding-doc, and change related code
> >   at both library and user code.
> > - Change binding-doc example node name with Rob's comments
> > - of_get_named_gpio_flags only gets the gpio, but without setting gpio 
> > flags,
> >   add additional code request gpio with proper gpio flags
> > - Add Philipp Zabel's Ack and MAINTAINER's entry
> >
> > Changes for v2:
> > - Delete "pwrseq" prefix and clock-names for properties at dt binding
> > - Should use structure not but its pointer for kzalloc
> > - Since chipidea core has no of_node, let core's of_node equals glue
> >   layer's at core's probe
> >
> > Peter Chen (6):
> >   binding-doc: power: pwrseq-generic: add binding doc for generic power
> > sequence library
> >   power: add power sequence library
> >   binding-doc: usb: usb-device: add optional properties for power
> > sequencediff --git a/drivers/usb/core/Makefile 
> > b/drivers/usb/core/Makefile
> > index 9780877..da36b78 100644
> > --- a/drivers/usb/core/Makefile
> > +++ b/drivers/usb/core/Makefile
> > @@ -5,8 +5,9 @@
> >  usbcore-y := usb.o hub.o hcd.o urb.o message.o driver.o
> >  usbcore-y += config.o file.o buffer.o sysfs.o endpoint.o
> >  usbcore-y += devio.o notify.o generic.o quirks.o devices.o
> > -usbcore-y += port.o of.o
> > +usbcore-y += port.o
> >  
> > +usbcore-$(CONFIG_OF)   += of.o
> >  usbcore-$(CONFIG_PCI)  += hcd-pci.o
> >  usbcore-$(CONFIG_ACPI) += usb-acpi.o
> >  
> > diff --git a/drivers/usb/core/of.c b/drivers/usb/core/of.c
> > index 2289700..3de4f88 100644
> > --- a/drivers/usb/core/of.c
> > +++ b/drivers/usb/core/of.c
> > @@ -18,6 +18,7 @@
> >   */
> >  
> >  #include 
> > +#include 
> >   usb: core: add power sequence handling for USB devices
> >   usb: chipidea: let chipidea core device of_node equal's glue layer
> > device of_node
> >   ARM: dts: imx6qdl-udoo.dtsi: fix onboard USB HUB property
> >
> >  .../bindings/power/pwrseq/pwrseq-generic.txt   |  48 +++
> >  .../devicetree/bindings/usb/usb-device.txt |  10 +-
> >  MAINTAINERS|   9 ++
> >  arch/arm/boot/dts/imx6qdl-udoo.dtsi|  26 ++--
> >  drivers/power/Kconfig  |   1 +
> >  drivers/power/Makefile |   1 +
> >  drivers/power/pwrseq/Kconfig   |  20 +++
> >  drivers/power/pwrseq/Makefile  |   2 +
> >  drivers/power/pwrseq/core.c|  71 ++
> >  drivers/power/pwrseq/pwrseq_generic.c  | 151 
> > +
> >  drivers/usb/chipidea/core.c|  10 ++
> >  drivers/usb/core/Makefile  |   1 +
> >  drivers/usb/core/hub.c |  12 +-
> >  drivers/usb/core/hub.h |  12 ++
> >  drivers/usb/core/pwrseq.c  | 100 ++
> >  include/linux/power/pwrseq.h   |  50 +++
> >  16 files changed, 507 insertions(+), 17 deletions(-)
> >  create mode 100644 
> > Documentation/devicetree/bindings/power/pwrseq/pwrseq-generic.txt
> >  create mode 100644 drivers/power/pwrseq/Kconfig
> >  create mode 100644 drivers/power/pwrseq/Makefile
> >  create mode 100644 drivers/power/pwrseq/core.c
> >  create mode 100644 drivers/power/pwrseq/pwrseq_generic.c
> >  create mode 100644 drivers/usb/core/pwrseq.c
> >  create mode 100644 include/linux/power/pwrseq.h
> >
> 
> With these patches our on-board USB hub,
> a Microchip USB2513BI-AEZG,  works exactly as it does
> with the fake regulator kludge, so...
> Tested-by Joshua Clayton 
> 

Thanks.

> I assume there is a v4

[PATCH v4 1/2] tpm: devicetree: document properties for cr50

2016-07-28 Thread Andrey Pronin

Add TPM2.0 PTP FIFO compatible SPI interface for chips with Cr50
firmware.

Signed-off-by: Andrey Pronin 
---
 .../devicetree/bindings/security/tpm/cr50_spi.txt   | 21 +
 1 file changed, 21 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/security/tpm/cr50_spi.txt

diff --git a/Documentation/devicetree/bindings/security/tpm/cr50_spi.txt 
b/Documentation/devicetree/bindings/security/tpm/cr50_spi.txt
new file mode 100644
index 000..2fbebd3
--- /dev/null
+++ b/Documentation/devicetree/bindings/security/tpm/cr50_spi.txt
@@ -0,0 +1,21 @@
+* H1 Secure Microcontroller with Cr50 Firmware on SPI Bus.
+
+H1 Secure Microcontroller running Cr50 firmware provides several
+functions, including TPM-like functionality. It communicates over
+SPI using the FIFO protocol described in the PTP Spec, section 6.
+
+Required properties:
+- compatible: Should be "google,cr50".
+- spi-max-frequency: Maximum SPI frequency.
+
+Example:
+
+ {
+status = "okay";
+
+cr50@0 {
+compatible = "google,cr50";
+reg = <0>;
+spi-max-frequency = <80>;
+};
+};
-- 
2.6.6

linux-next: manual merge of the drm tree with Linus' tree

2016-07-28 Thread Stephen Rothwell

Hi Dave,

Today's linux-next merge of the drm tree got a conflict in:

  drivers/gpu/drm/i915/i915_debugfs.c

between commit:

  194dc870a589 ("Add braces to avoid "ambiguous ‘else’" compiler warnings")

from Linus' tree and commit:

  24f1d3cc0997 ("drm/i915: Refactor execlists default context pinning")

from the drm tree.

I fixed it up (I just used the version from the drm tree) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

linux-next: manual merge of the drm tree with Linus' tree

2016-07-28 Thread Stephen Rothwell

Hi Dave,

Today's linux-next merge of the drm tree got a conflict in:

  drivers/gpu/drm/i915/i915_debugfs.c

between commit:

  194dc870a589 ("Add braces to avoid "ambiguous ‘else’" compiler warnings")

from Linus' tree and commit:

  24f1d3cc0997 ("drm/i915: Refactor execlists default context pinning")

from the drm tree.

I fixed it up (I just used the version from the drm tree) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

[PATCH v3 7/7] doc: bindings: act8945a-charger: Update properties

2016-07-28 Thread Wenyou Yang

Due the driver improvements, update the properties,
 - Remove "active-semi,check-battery-temperature" property.
 - Add the properties, "active-semi,irq_gpio"
   and "active-semi,lbo-gpios".

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2: None

 Documentation/devicetree/bindings/power/act8945a-charger.txt | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/power/act8945a-charger.txt 
b/Documentation/devicetree/bindings/power/act8945a-charger.txt
index bea254c..d7cf05b 100644
--- a/Documentation/devicetree/bindings/power/act8945a-charger.txt
+++ b/Documentation/devicetree/bindings/power/act8945a-charger.txt
@@ -4,10 +4,12 @@ Required properties:
  - compatible: "active-semi,act8945a", please refer to ../mfd/act8945a.txt.
  - active-semi,chglev-gpios: charge current level phandle with args
as described in ../gpio/gpio.txt.
+ - active-semi,irq_gpios: specify the charger interrupt input phanle
+   with args as as described in ../gpio/gpio.txt.
+ - active-semi,lbo-gpios: specify the low battery voltage detect phandle
+   with args as as described in ../gpio/gpio.txt.
 
 Optional properties:
- - active-semi,check-battery-temperature: boolean to check the battery
-   temperature or not.
  - active-semi,input-voltage-threshold-microvolt: unit: mV;
Specifies the charger's input over-voltage threshold value;
The value can be: 6600, 7000, 7500, 8000; default: 6600
-- 
2.7.4

[PATCH v3 7/7] doc: bindings: act8945a-charger: Update properties

2016-07-28 Thread Wenyou Yang

Due the driver improvements, update the properties,
 - Remove "active-semi,check-battery-temperature" property.
 - Add the properties, "active-semi,irq_gpio"
   and "active-semi,lbo-gpios".

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2: None

 Documentation/devicetree/bindings/power/act8945a-charger.txt | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/power/act8945a-charger.txt 
b/Documentation/devicetree/bindings/power/act8945a-charger.txt
index bea254c..d7cf05b 100644
--- a/Documentation/devicetree/bindings/power/act8945a-charger.txt
+++ b/Documentation/devicetree/bindings/power/act8945a-charger.txt
@@ -4,10 +4,12 @@ Required properties:
  - compatible: "active-semi,act8945a", please refer to ../mfd/act8945a.txt.
  - active-semi,chglev-gpios: charge current level phandle with args
as described in ../gpio/gpio.txt.
+ - active-semi,irq_gpios: specify the charger interrupt input phanle
+   with args as as described in ../gpio/gpio.txt.
+ - active-semi,lbo-gpios: specify the low battery voltage detect phandle
+   with args as as described in ../gpio/gpio.txt.
 
 Optional properties:
- - active-semi,check-battery-temperature: boolean to check the battery
-   temperature or not.
  - active-semi,input-voltage-threshold-microvolt: unit: mV;
Specifies the charger's input over-voltage threshold value;
The value can be: 6600, 7000, 7500, 8000; default: 6600
-- 
2.7.4

[PATCH v3 6/7] power: act8945a_charger: Add max current property

2016-07-28 Thread Wenyou Yang

Add the power supply's current max property,
POWER_SUPPLY_PROP_CURRENT_MAX.

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2: None

 drivers/power/act8945a_charger.c | 79 +++-
 1 file changed, 77 insertions(+), 2 deletions(-)

diff --git a/drivers/power/act8945a_charger.c b/drivers/power/act8945a_charger.c
index 74fd547..2294e9d 100644
--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -84,6 +84,7 @@ struct act8945a_charger {
bool init_done;
int irq_pin;
int lbo_pin;
+   int chgin_level;
 };
 
 static int act8945a_get_charger_state(struct regmap *regmap, int *val)
@@ -267,12 +268,79 @@ static int act8945a_get_capacity_level(struct 
act8945a_charger *charger,
return 0;
 }
 
+#define MAX_CURRENT_USB_HIGH   45
+#define MAX_CURRENT_USB_LOW9
+#define MAX_CURRENT_USB_PRE45000
+/*
+ * Riset(K) = 2336 * (1V/Ichg(mA)) - 0.205
+ * Riset = 2.43K
+ */
+#define MAX_CURRENT_AC_HIGH886527
+#define MAX_CURRENT_AC_LOW 117305
+#define MAX_CURRENT_AC_HIGH_PRE88653
+#define MAX_CURRENT_AC_LOW_PRE 11731
+
+static int act8945a_get_current_max(struct act8945a_charger *charger,
+   struct regmap *regmap, int *val)
+{
+   int ret;
+   unsigned int status, state;
+   unsigned int acin_state;
+
+   ret = regmap_read(regmap, ACT8945A_APCH_STATUS, );
+   if (ret < 0)
+   return ret;
+
+   ret = regmap_read(regmap, ACT8945A_APCH_STATE, );
+   if (ret < 0)
+   return ret;
+
+   acin_state = (state & APCH_STATE_ACINSTAT) >> 1;
+
+   state &= APCH_STATE_CSTATE;
+   state >>= APCH_STATE_CSTATE_SHIFT;
+
+   switch (state) {
+   case APCH_STATE_CSTATE_PRE:
+   if (acin_state) {
+   if (charger->chgin_level)
+   *val = MAX_CURRENT_AC_HIGH_PRE;
+   else
+   *val = MAX_CURRENT_AC_LOW_PRE;
+   } else {
+   *val = MAX_CURRENT_USB_PRE;
+   }
+   break;
+   case APCH_STATE_CSTATE_FAST:
+   if (acin_state) {
+   if (charger->chgin_level)
+   *val = MAX_CURRENT_AC_HIGH;
+   else
+   *val = MAX_CURRENT_AC_LOW;
+   } else {
+   if (charger->chgin_level)
+   *val = MAX_CURRENT_USB_HIGH;
+   else
+   *val = MAX_CURRENT_USB_LOW;
+   }
+   break;
+   case APCH_STATE_CSTATE_EOC:
+   case APCH_STATE_CSTATE_DISABLED:
+   default:
+   *val = 0;
+   break;
+   }
+
+   return 0;
+}
+
 static enum power_supply_property act8945a_charger_props[] = {
POWER_SUPPLY_PROP_STATUS,
POWER_SUPPLY_PROP_CHARGE_TYPE,
POWER_SUPPLY_PROP_TECHNOLOGY,
POWER_SUPPLY_PROP_HEALTH,
POWER_SUPPLY_PROP_CAPACITY_LEVEL,
+   POWER_SUPPLY_PROP_CURRENT_MAX,
POWER_SUPPLY_PROP_MODEL_NAME,
POWER_SUPPLY_PROP_MANUFACTURER
 };
@@ -302,6 +370,10 @@ static int act8945a_charger_get_property(struct 
power_supply *psy,
ret = act8945a_get_capacity_level(charger,
  regmap, >intval);
break;
+   case POWER_SUPPLY_PROP_CURRENT_MAX:
+   ret = act8945a_get_current_max(charger,
+  regmap, >intval);
+   break;
case POWER_SUPPLY_PROP_MODEL_NAME:
val->strval = act8945a_charger_model;
break;
@@ -446,8 +518,11 @@ static int act8945a_charger_config(struct device *dev,
"active-semi,chglev-gpios", 0, );
 
if (gpio_is_valid(chglev_pin)) {
-   gpio_set_value(chglev_pin,
-  ((flags == OF_GPIO_ACTIVE_LOW) ? 0 : 1));
+   if (!devm_gpio_request(dev, charger->irq_pin, "chglev-pin")) {
+   charger->chgin_level =
+   (flags == OF_GPIO_ACTIVE_LOW) ? 0 : 1;
+   gpio_set_value(chglev_pin, charger->chgin_level);
+   }
}
 
if (of_property_read_u32(np,
-- 
2.7.4

[PATCH v3 6/7] power: act8945a_charger: Add max current property

2016-07-28 Thread Wenyou Yang

Add the power supply's current max property,
POWER_SUPPLY_PROP_CURRENT_MAX.

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2: None

 drivers/power/act8945a_charger.c | 79 +++-
 1 file changed, 77 insertions(+), 2 deletions(-)

diff --git a/drivers/power/act8945a_charger.c b/drivers/power/act8945a_charger.c
index 74fd547..2294e9d 100644
--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -84,6 +84,7 @@ struct act8945a_charger {
bool init_done;
int irq_pin;
int lbo_pin;
+   int chgin_level;
 };
 
 static int act8945a_get_charger_state(struct regmap *regmap, int *val)
@@ -267,12 +268,79 @@ static int act8945a_get_capacity_level(struct 
act8945a_charger *charger,
return 0;
 }
 
+#define MAX_CURRENT_USB_HIGH   45
+#define MAX_CURRENT_USB_LOW9
+#define MAX_CURRENT_USB_PRE45000
+/*
+ * Riset(K) = 2336 * (1V/Ichg(mA)) - 0.205
+ * Riset = 2.43K
+ */
+#define MAX_CURRENT_AC_HIGH886527
+#define MAX_CURRENT_AC_LOW 117305
+#define MAX_CURRENT_AC_HIGH_PRE88653
+#define MAX_CURRENT_AC_LOW_PRE 11731
+
+static int act8945a_get_current_max(struct act8945a_charger *charger,
+   struct regmap *regmap, int *val)
+{
+   int ret;
+   unsigned int status, state;
+   unsigned int acin_state;
+
+   ret = regmap_read(regmap, ACT8945A_APCH_STATUS, );
+   if (ret < 0)
+   return ret;
+
+   ret = regmap_read(regmap, ACT8945A_APCH_STATE, );
+   if (ret < 0)
+   return ret;
+
+   acin_state = (state & APCH_STATE_ACINSTAT) >> 1;
+
+   state &= APCH_STATE_CSTATE;
+   state >>= APCH_STATE_CSTATE_SHIFT;
+
+   switch (state) {
+   case APCH_STATE_CSTATE_PRE:
+   if (acin_state) {
+   if (charger->chgin_level)
+   *val = MAX_CURRENT_AC_HIGH_PRE;
+   else
+   *val = MAX_CURRENT_AC_LOW_PRE;
+   } else {
+   *val = MAX_CURRENT_USB_PRE;
+   }
+   break;
+   case APCH_STATE_CSTATE_FAST:
+   if (acin_state) {
+   if (charger->chgin_level)
+   *val = MAX_CURRENT_AC_HIGH;
+   else
+   *val = MAX_CURRENT_AC_LOW;
+   } else {
+   if (charger->chgin_level)
+   *val = MAX_CURRENT_USB_HIGH;
+   else
+   *val = MAX_CURRENT_USB_LOW;
+   }
+   break;
+   case APCH_STATE_CSTATE_EOC:
+   case APCH_STATE_CSTATE_DISABLED:
+   default:
+   *val = 0;
+   break;
+   }
+
+   return 0;
+}
+
 static enum power_supply_property act8945a_charger_props[] = {
POWER_SUPPLY_PROP_STATUS,
POWER_SUPPLY_PROP_CHARGE_TYPE,
POWER_SUPPLY_PROP_TECHNOLOGY,
POWER_SUPPLY_PROP_HEALTH,
POWER_SUPPLY_PROP_CAPACITY_LEVEL,
+   POWER_SUPPLY_PROP_CURRENT_MAX,
POWER_SUPPLY_PROP_MODEL_NAME,
POWER_SUPPLY_PROP_MANUFACTURER
 };
@@ -302,6 +370,10 @@ static int act8945a_charger_get_property(struct 
power_supply *psy,
ret = act8945a_get_capacity_level(charger,
  regmap, >intval);
break;
+   case POWER_SUPPLY_PROP_CURRENT_MAX:
+   ret = act8945a_get_current_max(charger,
+  regmap, >intval);
+   break;
case POWER_SUPPLY_PROP_MODEL_NAME:
val->strval = act8945a_charger_model;
break;
@@ -446,8 +518,11 @@ static int act8945a_charger_config(struct device *dev,
"active-semi,chglev-gpios", 0, );
 
if (gpio_is_valid(chglev_pin)) {
-   gpio_set_value(chglev_pin,
-  ((flags == OF_GPIO_ACTIVE_LOW) ? 0 : 1));
+   if (!devm_gpio_request(dev, charger->irq_pin, "chglev-pin")) {
+   charger->chgin_level =
+   (flags == OF_GPIO_ACTIVE_LOW) ? 0 : 1;
+   gpio_set_value(chglev_pin, charger->chgin_level);
+   }
}
 
if (of_property_read_u32(np,
-- 
2.7.4

[PATCH v3 5/7] power: act8945a_charger: Add capacity level property

2016-07-28 Thread Wenyou Yang

Add the power supply capacity level property, it corresponds to
POWER_SUPPLY_CAPACITY_LEVEL_*.

It also utilizes the precision voltage detector function module
to catch the low battery voltage.

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2: None

 drivers/power/act8945a_charger.c | 78 
 1 file changed, 78 insertions(+)

diff --git a/drivers/power/act8945a_charger.c b/drivers/power/act8945a_charger.c
index 465e346..74fd547 100644
--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -83,6 +83,7 @@ struct act8945a_charger {
 
bool init_done;
int irq_pin;
+   int lbo_pin;
 };
 
 static int act8945a_get_charger_state(struct regmap *regmap, int *val)
@@ -208,11 +209,70 @@ static int act8945a_get_battery_health(struct regmap 
*regmap, int *val)
return 0;
 }
 
+static int act8945a_get_capacity_level(struct act8945a_charger *charger,
+  struct regmap *regmap, int *val)
+{
+   int ret;
+   unsigned int status, state, config;
+   int lbo_level = 1;
+
+   if (gpio_is_valid(charger->lbo_pin))
+   lbo_level = gpio_get_value(charger->lbo_pin);
+
+   ret = regmap_read(regmap, ACT8945A_APCH_STATUS, );
+   if (ret < 0)
+   return ret;
+
+   ret = regmap_read(regmap, ACT8945A_APCH_CFG, );
+   if (ret < 0)
+   return ret;
+
+   ret = regmap_read(regmap, ACT8945A_APCH_STATE, );
+   if (ret < 0)
+   return ret;
+
+   state &= APCH_STATE_CSTATE;
+   state >>= APCH_STATE_CSTATE_SHIFT;
+
+   switch (state) {
+   case APCH_STATE_CSTATE_PRE:
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_LOW;
+   break;
+   case APCH_STATE_CSTATE_FAST:
+   if (lbo_level)
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_HIGH;
+   else
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_LOW;
+   break;
+   case APCH_STATE_CSTATE_EOC:
+   if (status & APCH_STATUS_CHGDAT)
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_FULL;
+   else
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_NORMAL;
+   break;
+   case APCH_STATE_CSTATE_DISABLED:
+   default:
+   if (config & APCH_CFG_SUSCHG) {
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_UNKNOWN;
+   } else {
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_NORMAL;
+   if (!(status & APCH_STATUS_INDAT)) {
+   if (!lbo_level)
+   *val = 
POWER_SUPPLY_CAPACITY_LEVEL_CRITICAL;
+   }
+   }
+   break;
+   }
+
+   return 0;
+}
+
 static enum power_supply_property act8945a_charger_props[] = {
POWER_SUPPLY_PROP_STATUS,
POWER_SUPPLY_PROP_CHARGE_TYPE,
POWER_SUPPLY_PROP_TECHNOLOGY,
POWER_SUPPLY_PROP_HEALTH,
+   POWER_SUPPLY_PROP_CAPACITY_LEVEL,
POWER_SUPPLY_PROP_MODEL_NAME,
POWER_SUPPLY_PROP_MANUFACTURER
 };
@@ -238,6 +298,10 @@ static int act8945a_charger_get_property(struct 
power_supply *psy,
case POWER_SUPPLY_PROP_HEALTH:
ret = act8945a_get_battery_health(regmap, >intval);
break;
+   case POWER_SUPPLY_PROP_CAPACITY_LEVEL:
+   ret = act8945a_get_capacity_level(charger,
+ regmap, >intval);
+   break;
case POWER_SUPPLY_PROP_MODEL_NAME:
val->strval = act8945a_charger_model;
break;
@@ -364,6 +428,20 @@ static int act8945a_charger_config(struct device *dev,
}
}
 
+   charger->lbo_pin = of_get_named_gpio(np, "active-semi,lbo-gpios", 0);
+   if (gpio_is_valid(charger->lbo_pin)) {
+   if (!devm_gpio_request(dev, charger->lbo_pin, "lbo-detect")) {
+   ret = devm_request_irq(dev,
+  gpio_to_irq(charger->lbo_pin),
+  act8945a_status_changed,
+  IRQF_TRIGGER_FALLING |
+  IRQF_TRIGGER_RISING,
+  "lbo-detect", charger);
+   if (ret)
+   dev_dbg(dev, "failed to request LBO pin IRQ\n");
+   }
+   }
+
chglev_pin = of_get_named_gpio_flags(np,
"active-semi,chglev-gpios", 0, );
 
-- 
2.7.4

[PATCH v3 5/7] power: act8945a_charger: Add capacity level property

2016-07-28 Thread Wenyou Yang

Add the power supply capacity level property, it corresponds to
POWER_SUPPLY_CAPACITY_LEVEL_*.

It also utilizes the precision voltage detector function module
to catch the low battery voltage.

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2: None

 drivers/power/act8945a_charger.c | 78 
 1 file changed, 78 insertions(+)

diff --git a/drivers/power/act8945a_charger.c b/drivers/power/act8945a_charger.c
index 465e346..74fd547 100644
--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -83,6 +83,7 @@ struct act8945a_charger {
 
bool init_done;
int irq_pin;
+   int lbo_pin;
 };
 
 static int act8945a_get_charger_state(struct regmap *regmap, int *val)
@@ -208,11 +209,70 @@ static int act8945a_get_battery_health(struct regmap 
*regmap, int *val)
return 0;
 }
 
+static int act8945a_get_capacity_level(struct act8945a_charger *charger,
+  struct regmap *regmap, int *val)
+{
+   int ret;
+   unsigned int status, state, config;
+   int lbo_level = 1;
+
+   if (gpio_is_valid(charger->lbo_pin))
+   lbo_level = gpio_get_value(charger->lbo_pin);
+
+   ret = regmap_read(regmap, ACT8945A_APCH_STATUS, );
+   if (ret < 0)
+   return ret;
+
+   ret = regmap_read(regmap, ACT8945A_APCH_CFG, );
+   if (ret < 0)
+   return ret;
+
+   ret = regmap_read(regmap, ACT8945A_APCH_STATE, );
+   if (ret < 0)
+   return ret;
+
+   state &= APCH_STATE_CSTATE;
+   state >>= APCH_STATE_CSTATE_SHIFT;
+
+   switch (state) {
+   case APCH_STATE_CSTATE_PRE:
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_LOW;
+   break;
+   case APCH_STATE_CSTATE_FAST:
+   if (lbo_level)
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_HIGH;
+   else
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_LOW;
+   break;
+   case APCH_STATE_CSTATE_EOC:
+   if (status & APCH_STATUS_CHGDAT)
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_FULL;
+   else
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_NORMAL;
+   break;
+   case APCH_STATE_CSTATE_DISABLED:
+   default:
+   if (config & APCH_CFG_SUSCHG) {
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_UNKNOWN;
+   } else {
+   *val = POWER_SUPPLY_CAPACITY_LEVEL_NORMAL;
+   if (!(status & APCH_STATUS_INDAT)) {
+   if (!lbo_level)
+   *val = 
POWER_SUPPLY_CAPACITY_LEVEL_CRITICAL;
+   }
+   }
+   break;
+   }
+
+   return 0;
+}
+
 static enum power_supply_property act8945a_charger_props[] = {
POWER_SUPPLY_PROP_STATUS,
POWER_SUPPLY_PROP_CHARGE_TYPE,
POWER_SUPPLY_PROP_TECHNOLOGY,
POWER_SUPPLY_PROP_HEALTH,
+   POWER_SUPPLY_PROP_CAPACITY_LEVEL,
POWER_SUPPLY_PROP_MODEL_NAME,
POWER_SUPPLY_PROP_MANUFACTURER
 };
@@ -238,6 +298,10 @@ static int act8945a_charger_get_property(struct 
power_supply *psy,
case POWER_SUPPLY_PROP_HEALTH:
ret = act8945a_get_battery_health(regmap, >intval);
break;
+   case POWER_SUPPLY_PROP_CAPACITY_LEVEL:
+   ret = act8945a_get_capacity_level(charger,
+ regmap, >intval);
+   break;
case POWER_SUPPLY_PROP_MODEL_NAME:
val->strval = act8945a_charger_model;
break;
@@ -364,6 +428,20 @@ static int act8945a_charger_config(struct device *dev,
}
}
 
+   charger->lbo_pin = of_get_named_gpio(np, "active-semi,lbo-gpios", 0);
+   if (gpio_is_valid(charger->lbo_pin)) {
+   if (!devm_gpio_request(dev, charger->lbo_pin, "lbo-detect")) {
+   ret = devm_request_irq(dev,
+  gpio_to_irq(charger->lbo_pin),
+  act8945a_status_changed,
+  IRQF_TRIGGER_FALLING |
+  IRQF_TRIGGER_RISING,
+  "lbo-detect", charger);
+   if (ret)
+   dev_dbg(dev, "failed to request LBO pin IRQ\n");
+   }
+   }
+
chglev_pin = of_get_named_gpio_flags(np,
"active-semi,chglev-gpios", 0, );
 
-- 
2.7.4

[PATCH v3 4/7] power: act8945a_charger: Fix the power supply type

2016-07-28 Thread Wenyou Yang

The power supply type property is varying as the external power
supply changes. It is not a constant.

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2: None

 drivers/power/act8945a_charger.c | 48 
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/drivers/power/act8945a_charger.c b/drivers/power/act8945a_charger.c
index bbeb13cb..465e346 100644
--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -77,6 +77,7 @@ static const char *act8945a_charger_manufacturer = 
"Active-semi";
 
 struct act8945a_charger {
struct power_supply *psy;
+   struct power_supply_desc desc;
struct regmap *regmap;
struct work_struct work;
 
@@ -250,14 +251,6 @@ static int act8945a_charger_get_property(struct 
power_supply *psy,
return ret;
 }
 
-static const struct power_supply_desc act8945a_charger_desc = {
-   .name   = "act8945a-charger",
-   .type   = POWER_SUPPLY_TYPE_BATTERY,
-   .get_property   = act8945a_charger_get_property,
-   .properties = act8945a_charger_props,
-   .num_properties = ARRAY_SIZE(act8945a_charger_props),
-};
-
 static int act8945a_enable_interrupt(struct act8945a_charger *charger)
 {
struct regmap *regmap = charger->regmap;
@@ -281,11 +274,39 @@ static int act8945a_enable_interrupt(struct 
act8945a_charger *charger)
return 0;
 }
 
+static unsigned int act8945a_set_supply_type(struct act8945a_charger *charger,
+unsigned int *type)
+{
+   unsigned int status, state;
+   int ret;
+
+   ret = regmap_read(charger->regmap, ACT8945A_APCH_STATUS, );
+   if (ret < 0)
+   return ret;
+
+   ret = regmap_read(charger->regmap, ACT8945A_APCH_STATE, );
+   if (ret < 0)
+   return ret;
+
+   if (status & APCH_STATUS_INDAT) {
+   if (state & APCH_STATE_ACINSTAT)
+   *type = POWER_SUPPLY_TYPE_MAINS;
+   else
+   *type = POWER_SUPPLY_TYPE_USB;
+   } else {
+   *type = POWER_SUPPLY_TYPE_BATTERY;
+   }
+
+   return 0;
+}
+
 static void act8945a_work(struct work_struct *work)
 {
struct act8945a_charger *charger =
container_of(work, struct act8945a_charger, work);
 
+   act8945a_set_supply_type(charger, >desc.type);
+
power_supply_changed(charger->psy);
 }
 
@@ -436,11 +457,20 @@ static int act8945a_charger_probe(struct platform_device 
*pdev)
if (ret)
return ret;
 
+   charger->desc.name = "act8945a-charger";
+   charger->desc.get_property = act8945a_charger_get_property;
+   charger->desc.properties = act8945a_charger_props;
+   charger->desc.num_properties = ARRAY_SIZE(act8945a_charger_props);
+
+   ret = act8945a_set_supply_type(charger, >desc.type);
+   if (ret)
+   return -EINVAL;
+
psy_cfg.of_node = pdev->dev.parent->of_node;
psy_cfg.drv_data = charger;
 
charger->psy = devm_power_supply_register(>dev,
- _charger_desc,
+ >desc,
  _cfg);
if (IS_ERR(charger->psy)) {
dev_err(>dev, "failed to register power supply\n");
-- 
2.7.4

[PATCH v3 4/7] power: act8945a_charger: Fix the power supply type

2016-07-28 Thread Wenyou Yang

The power supply type property is varying as the external power
supply changes. It is not a constant.

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2: None

 drivers/power/act8945a_charger.c | 48 
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/drivers/power/act8945a_charger.c b/drivers/power/act8945a_charger.c
index bbeb13cb..465e346 100644
--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -77,6 +77,7 @@ static const char *act8945a_charger_manufacturer = 
"Active-semi";
 
 struct act8945a_charger {
struct power_supply *psy;
+   struct power_supply_desc desc;
struct regmap *regmap;
struct work_struct work;
 
@@ -250,14 +251,6 @@ static int act8945a_charger_get_property(struct 
power_supply *psy,
return ret;
 }
 
-static const struct power_supply_desc act8945a_charger_desc = {
-   .name   = "act8945a-charger",
-   .type   = POWER_SUPPLY_TYPE_BATTERY,
-   .get_property   = act8945a_charger_get_property,
-   .properties = act8945a_charger_props,
-   .num_properties = ARRAY_SIZE(act8945a_charger_props),
-};
-
 static int act8945a_enable_interrupt(struct act8945a_charger *charger)
 {
struct regmap *regmap = charger->regmap;
@@ -281,11 +274,39 @@ static int act8945a_enable_interrupt(struct 
act8945a_charger *charger)
return 0;
 }
 
+static unsigned int act8945a_set_supply_type(struct act8945a_charger *charger,
+unsigned int *type)
+{
+   unsigned int status, state;
+   int ret;
+
+   ret = regmap_read(charger->regmap, ACT8945A_APCH_STATUS, );
+   if (ret < 0)
+   return ret;
+
+   ret = regmap_read(charger->regmap, ACT8945A_APCH_STATE, );
+   if (ret < 0)
+   return ret;
+
+   if (status & APCH_STATUS_INDAT) {
+   if (state & APCH_STATE_ACINSTAT)
+   *type = POWER_SUPPLY_TYPE_MAINS;
+   else
+   *type = POWER_SUPPLY_TYPE_USB;
+   } else {
+   *type = POWER_SUPPLY_TYPE_BATTERY;
+   }
+
+   return 0;
+}
+
 static void act8945a_work(struct work_struct *work)
 {
struct act8945a_charger *charger =
container_of(work, struct act8945a_charger, work);
 
+   act8945a_set_supply_type(charger, >desc.type);
+
power_supply_changed(charger->psy);
 }
 
@@ -436,11 +457,20 @@ static int act8945a_charger_probe(struct platform_device 
*pdev)
if (ret)
return ret;
 
+   charger->desc.name = "act8945a-charger";
+   charger->desc.get_property = act8945a_charger_get_property;
+   charger->desc.properties = act8945a_charger_props;
+   charger->desc.num_properties = ARRAY_SIZE(act8945a_charger_props);
+
+   ret = act8945a_set_supply_type(charger, >desc.type);
+   if (ret)
+   return -EINVAL;
+
psy_cfg.of_node = pdev->dev.parent->of_node;
psy_cfg.drv_data = charger;
 
charger->psy = devm_power_supply_register(>dev,
- _charger_desc,
+ >desc,
  _cfg);
if (IS_ERR(charger->psy)) {
dev_err(>dev, "failed to register power supply\n");
-- 
2.7.4

[PATCH v3 3/7] power: act8945a_charger: Add status change update support

2016-07-28 Thread Wenyou Yang

Add the charger status change interrupt support, it will report
the power supply changed event.

This interrupt is generated by one of the conditions as below:
 - the state machine jumps out of or into the EOC state
 - the CHGIN input voltage goes out of or into the valid range.
 - the battery temperature goes out of or into the valid range.
 - the PRECHARGE time-out occurs.
 - the total charge time-out occurs.

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2: None

 drivers/power/act8945a_charger.c | 78 
 1 file changed, 72 insertions(+), 6 deletions(-)

diff --git a/drivers/power/act8945a_charger.c b/drivers/power/act8945a_charger.c
index 6ddfc1d..bbeb13cb 100644
--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -10,6 +10,7 @@
  * published by the Free Software Foundation.
  *
  */
+#include 
 #include 
 #include 
 #include 
@@ -75,7 +76,12 @@ static const char *act8945a_charger_manufacturer = 
"Active-semi";
 #define APCH_STATE_CSTATE_PRE  0x03
 
 struct act8945a_charger {
+   struct power_supply *psy;
struct regmap *regmap;
+   struct work_struct work;
+
+   bool init_done;
+   int irq_pin;
 };
 
 static int act8945a_get_charger_state(struct regmap *regmap, int *val)
@@ -252,6 +258,47 @@ static const struct power_supply_desc 
act8945a_charger_desc = {
.num_properties = ARRAY_SIZE(act8945a_charger_props),
 };
 
+static int act8945a_enable_interrupt(struct act8945a_charger *charger)
+{
+   struct regmap *regmap = charger->regmap;
+   unsigned char ctrl;
+   int ret;
+
+   ctrl = APCH_CTRL_CHGEOCOUT | APCH_CTRL_CHGEOCIN |
+  APCH_CTRL_INDIS | APCH_CTRL_INCON |
+  APCH_CTRL_TEMPOUT | APCH_CTRL_TEMPIN |
+  APCH_CTRL_TIMRPRE | APCH_CTRL_TIMRTOT;
+   ret = regmap_write(regmap, ACT8945A_APCH_CTRL, ctrl);
+   if (ret)
+   return ret;
+
+   ctrl = APCH_STATUS_CHGSTAT | APCH_STATUS_INSTAT |
+  APCH_STATUS_TEMPSTAT | APCH_STATUS_TIMRSTAT;
+   ret = regmap_write(regmap, ACT8945A_APCH_STATUS, ctrl);
+   if (ret)
+   return ret;
+
+   return 0;
+}
+
+static void act8945a_work(struct work_struct *work)
+{
+   struct act8945a_charger *charger =
+   container_of(work, struct act8945a_charger, work);
+
+   power_supply_changed(charger->psy);
+}
+
+static irqreturn_t act8945a_status_changed(int irq, void *dev_id)
+{
+   struct act8945a_charger *charger = dev_id;
+
+   if (charger->init_done)
+   schedule_work(>work);
+
+   return IRQ_HANDLED;
+}
+
 #define DEFAULT_TOTAL_TIME_OUT 3
 #define DEFAULT_PRE_TIME_OUT   40
 #define DEFAULT_INPUT_OVP_THRESHOLD6600
@@ -283,6 +330,18 @@ static int act8945a_charger_config(struct device *dev,
 
if (tmp & APCH_CFG_SUSCHG)
value |= APCH_CFG_SUSCHG;
+   charger->irq_pin = of_get_named_gpio(np, "active-semi,irq_gpios", 0);
+   if (gpio_is_valid(charger->irq_pin)) {
+   if (!devm_gpio_request(dev, charger->irq_pin, "irq-pin")) {
+   ret = devm_request_irq(dev,
+  gpio_to_irq(charger->irq_pin),
+  act8945a_status_changed,
+  IRQF_TRIGGER_FALLING,
+  "irq-pin", charger);
+   if (ret)
+   dev_dbg(dev, "failed to request nIRQ pin 
IRQ\n");
+   }
+   }
 
chglev_pin = of_get_named_gpio_flags(np,
"active-semi,chglev-gpios", 0, );
@@ -360,7 +419,6 @@ static int act8945a_charger_config(struct device *dev,
 static int act8945a_charger_probe(struct platform_device *pdev)
 {
struct act8945a_charger *charger;
-   struct power_supply *psy;
struct power_supply_config psy_cfg = {};
int ret;
 
@@ -381,14 +439,22 @@ static int act8945a_charger_probe(struct platform_device 
*pdev)
psy_cfg.of_node = pdev->dev.parent->of_node;
psy_cfg.drv_data = charger;
 
-   psy = devm_power_supply_register(>dev,
-_charger_desc,
-_cfg);
-   if (IS_ERR(psy)) {
+   charger->psy = devm_power_supply_register(>dev,
+ _charger_desc,
+ _cfg);
+   if (IS_ERR(charger->psy)) {
dev_err(>dev, "failed to register power supply\n");
-   return PTR_ERR(psy);
+   return PTR_ERR(charger->psy);
}
 
+   INIT_WORK(>work, act8945a_work);
+
+   ret = act8945a_enable_interrupt(charger);
+   if (ret)
+   return -EIO;
+
+   charger->init_done = true;
+
return 0;
 }

[PATCH v3 3/7] power: act8945a_charger: Add status change update support

2016-07-28 Thread Wenyou Yang

Add the charger status change interrupt support, it will report
the power supply changed event.

This interrupt is generated by one of the conditions as below:
 - the state machine jumps out of or into the EOC state
 - the CHGIN input voltage goes out of or into the valid range.
 - the battery temperature goes out of or into the valid range.
 - the PRECHARGE time-out occurs.
 - the total charge time-out occurs.

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2: None

 drivers/power/act8945a_charger.c | 78 
 1 file changed, 72 insertions(+), 6 deletions(-)

diff --git a/drivers/power/act8945a_charger.c b/drivers/power/act8945a_charger.c
index 6ddfc1d..bbeb13cb 100644
--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -10,6 +10,7 @@
  * published by the Free Software Foundation.
  *
  */
+#include 
 #include 
 #include 
 #include 
@@ -75,7 +76,12 @@ static const char *act8945a_charger_manufacturer = 
"Active-semi";
 #define APCH_STATE_CSTATE_PRE  0x03
 
 struct act8945a_charger {
+   struct power_supply *psy;
struct regmap *regmap;
+   struct work_struct work;
+
+   bool init_done;
+   int irq_pin;
 };
 
 static int act8945a_get_charger_state(struct regmap *regmap, int *val)
@@ -252,6 +258,47 @@ static const struct power_supply_desc 
act8945a_charger_desc = {
.num_properties = ARRAY_SIZE(act8945a_charger_props),
 };
 
+static int act8945a_enable_interrupt(struct act8945a_charger *charger)
+{
+   struct regmap *regmap = charger->regmap;
+   unsigned char ctrl;
+   int ret;
+
+   ctrl = APCH_CTRL_CHGEOCOUT | APCH_CTRL_CHGEOCIN |
+  APCH_CTRL_INDIS | APCH_CTRL_INCON |
+  APCH_CTRL_TEMPOUT | APCH_CTRL_TEMPIN |
+  APCH_CTRL_TIMRPRE | APCH_CTRL_TIMRTOT;
+   ret = regmap_write(regmap, ACT8945A_APCH_CTRL, ctrl);
+   if (ret)
+   return ret;
+
+   ctrl = APCH_STATUS_CHGSTAT | APCH_STATUS_INSTAT |
+  APCH_STATUS_TEMPSTAT | APCH_STATUS_TIMRSTAT;
+   ret = regmap_write(regmap, ACT8945A_APCH_STATUS, ctrl);
+   if (ret)
+   return ret;
+
+   return 0;
+}
+
+static void act8945a_work(struct work_struct *work)
+{
+   struct act8945a_charger *charger =
+   container_of(work, struct act8945a_charger, work);
+
+   power_supply_changed(charger->psy);
+}
+
+static irqreturn_t act8945a_status_changed(int irq, void *dev_id)
+{
+   struct act8945a_charger *charger = dev_id;
+
+   if (charger->init_done)
+   schedule_work(>work);
+
+   return IRQ_HANDLED;
+}
+
 #define DEFAULT_TOTAL_TIME_OUT 3
 #define DEFAULT_PRE_TIME_OUT   40
 #define DEFAULT_INPUT_OVP_THRESHOLD6600
@@ -283,6 +330,18 @@ static int act8945a_charger_config(struct device *dev,
 
if (tmp & APCH_CFG_SUSCHG)
value |= APCH_CFG_SUSCHG;
+   charger->irq_pin = of_get_named_gpio(np, "active-semi,irq_gpios", 0);
+   if (gpio_is_valid(charger->irq_pin)) {
+   if (!devm_gpio_request(dev, charger->irq_pin, "irq-pin")) {
+   ret = devm_request_irq(dev,
+  gpio_to_irq(charger->irq_pin),
+  act8945a_status_changed,
+  IRQF_TRIGGER_FALLING,
+  "irq-pin", charger);
+   if (ret)
+   dev_dbg(dev, "failed to request nIRQ pin 
IRQ\n");
+   }
+   }
 
chglev_pin = of_get_named_gpio_flags(np,
"active-semi,chglev-gpios", 0, );
@@ -360,7 +419,6 @@ static int act8945a_charger_config(struct device *dev,
 static int act8945a_charger_probe(struct platform_device *pdev)
 {
struct act8945a_charger *charger;
-   struct power_supply *psy;
struct power_supply_config psy_cfg = {};
int ret;
 
@@ -381,14 +439,22 @@ static int act8945a_charger_probe(struct platform_device 
*pdev)
psy_cfg.of_node = pdev->dev.parent->of_node;
psy_cfg.drv_data = charger;
 
-   psy = devm_power_supply_register(>dev,
-_charger_desc,
-_cfg);
-   if (IS_ERR(psy)) {
+   charger->psy = devm_power_supply_register(>dev,
+ _charger_desc,
+ _cfg);
+   if (IS_ERR(charger->psy)) {
dev_err(>dev, "failed to register power supply\n");
-   return PTR_ERR(psy);
+   return PTR_ERR(charger->psy);
}
 
+   INIT_WORK(>work, act8945a_work);
+
+   ret = act8945a_enable_interrupt(charger);
+   if (ret)
+   return -EIO;
+
+   charger->init_done = true;
+
return 0;
 }
 
-- 
2.7.4

[PATCH v3 2/7] power: act8945a_charger: Improve

2016-07-28 Thread Wenyou Yang

When get the property, first check the charger state machine,
then check the status bit to decide what value is assigned to
the corresponding property.

Retain the SUSCHG bit of REG 0x71 when configure the timers to
avoid losting the charger suspending info after boot.

Signed-off-by: Wenyou Yang 
Signed-off-by: Fengguang Wu 
---

Changes in v3:
 - Remove unneeded semicolon to fix semicolon.cocci warning.

Changes in v2:
 - Add missing ret declaration.

 drivers/power/act8945a_charger.c | 84 
 1 file changed, 68 insertions(+), 16 deletions(-)

diff --git a/drivers/power/act8945a_charger.c b/drivers/power/act8945a_charger.c
index 72d39ba..6ddfc1d 100644
--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -94,16 +94,24 @@ static int act8945a_get_charger_state(struct regmap 
*regmap, int *val)
state &= APCH_STATE_CSTATE;
state >>= APCH_STATE_CSTATE_SHIFT;
 
-   if (state == APCH_STATE_CSTATE_EOC) {
+   switch (state) {
+   case APCH_STATE_CSTATE_PRE:
+   case APCH_STATE_CSTATE_FAST:
+   *val = POWER_SUPPLY_STATUS_CHARGING;
+   break;
+   case APCH_STATE_CSTATE_EOC:
if (status & APCH_STATUS_CHGDAT)
*val = POWER_SUPPLY_STATUS_FULL;
else
+   *val = POWER_SUPPLY_STATUS_CHARGING;
+   break;
+   case APCH_STATE_CSTATE_DISABLED:
+   default:
+   if (!(status & APCH_STATUS_INDAT))
+   *val = POWER_SUPPLY_STATUS_DISCHARGING;
+   else
*val = POWER_SUPPLY_STATUS_NOT_CHARGING;
-   } else if ((state == APCH_STATE_CSTATE_FAST) ||
-  (state == APCH_STATE_CSTATE_PRE)) {
-   *val = POWER_SUPPLY_STATUS_CHARGING;
-   } else {
-   *val = POWER_SUPPLY_STATUS_NOT_CHARGING;
+   break;
}
 
return 0;
@@ -112,7 +120,11 @@ static int act8945a_get_charger_state(struct regmap 
*regmap, int *val)
 static int act8945a_get_charge_type(struct regmap *regmap, int *val)
 {
int ret;
-   unsigned int state;
+   unsigned int status, state;
+
+   ret = regmap_read(regmap, ACT8945A_APCH_STATUS, );
+   if (ret < 0)
+   return ret;
 
ret = regmap_read(regmap, ACT8945A_APCH_STATE, );
if (ret < 0)
@@ -129,9 +141,15 @@ static int act8945a_get_charge_type(struct regmap *regmap, 
int *val)
*val = POWER_SUPPLY_CHARGE_TYPE_FAST;
break;
case APCH_STATE_CSTATE_EOC:
+   *val = POWER_SUPPLY_CHARGE_TYPE_NONE;
+   break;
case APCH_STATE_CSTATE_DISABLED:
default:
-   *val = POWER_SUPPLY_CHARGE_TYPE_NONE;
+   if (!(status & APCH_STATUS_INDAT))
+   *val = POWER_SUPPLY_CHARGE_TYPE_NONE;
+   else
+   *val = POWER_SUPPLY_CHARGE_TYPE_UNKNOWN;
+   break;
}
 
return 0;
@@ -140,20 +158,45 @@ static int act8945a_get_charge_type(struct regmap 
*regmap, int *val)
 static int act8945a_get_battery_health(struct regmap *regmap, int *val)
 {
int ret;
-   unsigned int status;
+   unsigned int status, state, config;
 
ret = regmap_read(regmap, ACT8945A_APCH_STATUS, );
if (ret < 0)
return ret;
 
-   if (status & APCH_STATUS_TEMPDAT)
-   *val = POWER_SUPPLY_HEALTH_OVERHEAT;
-   else if (!(status & APCH_STATUS_INDAT))
-   *val = POWER_SUPPLY_HEALTH_OVERVOLTAGE;
-   else if (status & APCH_STATUS_TIMRDAT)
-   *val = POWER_SUPPLY_HEALTH_SAFETY_TIMER_EXPIRE;
-   else
+   ret = regmap_read(regmap, ACT8945A_APCH_CFG, );
+   if (ret < 0)
+   return ret;
+
+   ret = regmap_read(regmap, ACT8945A_APCH_STATE, );
+   if (ret < 0)
+   return ret;
+
+   state &= APCH_STATE_CSTATE;
+   state >>= APCH_STATE_CSTATE_SHIFT;
+
+   switch (state) {
+   case APCH_STATE_CSTATE_DISABLED:
+   if (config & APCH_CFG_SUSCHG) {
+   *val = POWER_SUPPLY_HEALTH_UNKNOWN;
+   } else if (status & APCH_STATUS_INDAT) {
+   if (!(status & APCH_STATUS_TEMPDAT))
+   *val = POWER_SUPPLY_HEALTH_OVERHEAT;
+   else if (status & APCH_STATUS_TIMRDAT)
+   *val = POWER_SUPPLY_HEALTH_SAFETY_TIMER_EXPIRE;
+   else
+   *val = POWER_SUPPLY_HEALTH_OVERVOLTAGE;
+   } else {
+   *val = POWER_SUPPLY_HEALTH_GOOD;
+   }
+   break;
+   case APCH_STATE_CSTATE_PRE:
+   case APCH_STATE_CSTATE_FAST:
+   case APCH_STATE_CSTATE_EOC:
+   default:
*val = POWER_SUPPLY_HEALTH_GOOD;
+

[PATCH v3 1/7] power: act8945a_charger: Remove "battery_temperature"

2016-07-28 Thread Wenyou Yang

Remove "battery_temperature" member, it is redundant, it is the
hardware's responsibility to handle TH pin properly.
It is unnecessary to use the dt property to check if there is
a battery temperature monitor or not.

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2: None

 drivers/power/act8945a_charger.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/power/act8945a_charger.c b/drivers/power/act8945a_charger.c
index b5c00e4..72d39ba 100644
--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -76,7 +76,6 @@ static const char *act8945a_charger_manufacturer = 
"Active-semi";
 
 struct act8945a_charger {
struct regmap *regmap;
-   bool battery_temperature;
 };
 
 static int act8945a_get_charger_state(struct regmap *regmap, int *val)
@@ -138,8 +137,7 @@ static int act8945a_get_charge_type(struct regmap *regmap, 
int *val)
return 0;
 }
 
-static int act8945a_get_battery_health(struct act8945a_charger *charger,
-  struct regmap *regmap, int *val)
+static int act8945a_get_battery_health(struct regmap *regmap, int *val)
 {
int ret;
unsigned int status;
@@ -148,7 +146,7 @@ static int act8945a_get_battery_health(struct 
act8945a_charger *charger,
if (ret < 0)
return ret;
 
-   if (charger->battery_temperature && !(status & APCH_STATUS_TEMPDAT))
+   if (status & APCH_STATUS_TEMPDAT)
*val = POWER_SUPPLY_HEALTH_OVERHEAT;
else if (!(status & APCH_STATUS_INDAT))
*val = POWER_SUPPLY_HEALTH_OVERVOLTAGE;
@@ -188,8 +186,7 @@ static int act8945a_charger_get_property(struct 
power_supply *psy,
val->intval = POWER_SUPPLY_TECHNOLOGY_LION;
break;
case POWER_SUPPLY_PROP_HEALTH:
-   ret = act8945a_get_battery_health(charger,
- regmap, >intval);
+   ret = act8945a_get_battery_health(regmap, >intval);
break;
case POWER_SUPPLY_PROP_MODEL_NAME:
val->strval = act8945a_charger_model;
@@ -235,9 +232,6 @@ static int act8945a_charger_config(struct device *dev,
return -EINVAL;
}
 
-   charger->battery_temperature = of_property_read_bool(np,
-   "active-semi,check-battery-temperature");
-
chglev_pin = of_get_named_gpio_flags(np,
"active-semi,chglev-gpios", 0, );
 
-- 
2.7.4

[PATCH v3 2/7] power: act8945a_charger: Improve

2016-07-28 Thread Wenyou Yang

When get the property, first check the charger state machine,
then check the status bit to decide what value is assigned to
the corresponding property.

Retain the SUSCHG bit of REG 0x71 when configure the timers to
avoid losting the charger suspending info after boot.

Signed-off-by: Wenyou Yang 
Signed-off-by: Fengguang Wu 
---

Changes in v3:
 - Remove unneeded semicolon to fix semicolon.cocci warning.

Changes in v2:
 - Add missing ret declaration.

 drivers/power/act8945a_charger.c | 84 
 1 file changed, 68 insertions(+), 16 deletions(-)

diff --git a/drivers/power/act8945a_charger.c b/drivers/power/act8945a_charger.c
index 72d39ba..6ddfc1d 100644
--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -94,16 +94,24 @@ static int act8945a_get_charger_state(struct regmap 
*regmap, int *val)
state &= APCH_STATE_CSTATE;
state >>= APCH_STATE_CSTATE_SHIFT;
 
-   if (state == APCH_STATE_CSTATE_EOC) {
+   switch (state) {
+   case APCH_STATE_CSTATE_PRE:
+   case APCH_STATE_CSTATE_FAST:
+   *val = POWER_SUPPLY_STATUS_CHARGING;
+   break;
+   case APCH_STATE_CSTATE_EOC:
if (status & APCH_STATUS_CHGDAT)
*val = POWER_SUPPLY_STATUS_FULL;
else
+   *val = POWER_SUPPLY_STATUS_CHARGING;
+   break;
+   case APCH_STATE_CSTATE_DISABLED:
+   default:
+   if (!(status & APCH_STATUS_INDAT))
+   *val = POWER_SUPPLY_STATUS_DISCHARGING;
+   else
*val = POWER_SUPPLY_STATUS_NOT_CHARGING;
-   } else if ((state == APCH_STATE_CSTATE_FAST) ||
-  (state == APCH_STATE_CSTATE_PRE)) {
-   *val = POWER_SUPPLY_STATUS_CHARGING;
-   } else {
-   *val = POWER_SUPPLY_STATUS_NOT_CHARGING;
+   break;
}
 
return 0;
@@ -112,7 +120,11 @@ static int act8945a_get_charger_state(struct regmap 
*regmap, int *val)
 static int act8945a_get_charge_type(struct regmap *regmap, int *val)
 {
int ret;
-   unsigned int state;
+   unsigned int status, state;
+
+   ret = regmap_read(regmap, ACT8945A_APCH_STATUS, );
+   if (ret < 0)
+   return ret;
 
ret = regmap_read(regmap, ACT8945A_APCH_STATE, );
if (ret < 0)
@@ -129,9 +141,15 @@ static int act8945a_get_charge_type(struct regmap *regmap, 
int *val)
*val = POWER_SUPPLY_CHARGE_TYPE_FAST;
break;
case APCH_STATE_CSTATE_EOC:
+   *val = POWER_SUPPLY_CHARGE_TYPE_NONE;
+   break;
case APCH_STATE_CSTATE_DISABLED:
default:
-   *val = POWER_SUPPLY_CHARGE_TYPE_NONE;
+   if (!(status & APCH_STATUS_INDAT))
+   *val = POWER_SUPPLY_CHARGE_TYPE_NONE;
+   else
+   *val = POWER_SUPPLY_CHARGE_TYPE_UNKNOWN;
+   break;
}
 
return 0;
@@ -140,20 +158,45 @@ static int act8945a_get_charge_type(struct regmap 
*regmap, int *val)
 static int act8945a_get_battery_health(struct regmap *regmap, int *val)
 {
int ret;
-   unsigned int status;
+   unsigned int status, state, config;
 
ret = regmap_read(regmap, ACT8945A_APCH_STATUS, );
if (ret < 0)
return ret;
 
-   if (status & APCH_STATUS_TEMPDAT)
-   *val = POWER_SUPPLY_HEALTH_OVERHEAT;
-   else if (!(status & APCH_STATUS_INDAT))
-   *val = POWER_SUPPLY_HEALTH_OVERVOLTAGE;
-   else if (status & APCH_STATUS_TIMRDAT)
-   *val = POWER_SUPPLY_HEALTH_SAFETY_TIMER_EXPIRE;
-   else
+   ret = regmap_read(regmap, ACT8945A_APCH_CFG, );
+   if (ret < 0)
+   return ret;
+
+   ret = regmap_read(regmap, ACT8945A_APCH_STATE, );
+   if (ret < 0)
+   return ret;
+
+   state &= APCH_STATE_CSTATE;
+   state >>= APCH_STATE_CSTATE_SHIFT;
+
+   switch (state) {
+   case APCH_STATE_CSTATE_DISABLED:
+   if (config & APCH_CFG_SUSCHG) {
+   *val = POWER_SUPPLY_HEALTH_UNKNOWN;
+   } else if (status & APCH_STATUS_INDAT) {
+   if (!(status & APCH_STATUS_TEMPDAT))
+   *val = POWER_SUPPLY_HEALTH_OVERHEAT;
+   else if (status & APCH_STATUS_TIMRDAT)
+   *val = POWER_SUPPLY_HEALTH_SAFETY_TIMER_EXPIRE;
+   else
+   *val = POWER_SUPPLY_HEALTH_OVERVOLTAGE;
+   } else {
+   *val = POWER_SUPPLY_HEALTH_GOOD;
+   }
+   break;
+   case APCH_STATE_CSTATE_PRE:
+   case APCH_STATE_CSTATE_FAST:
+   case APCH_STATE_CSTATE_EOC:
+   default:
*val = POWER_SUPPLY_HEALTH_GOOD;
+   break;
+   }
 
return 0;

[PATCH v3 1/7] power: act8945a_charger: Remove "battery_temperature"

2016-07-28 Thread Wenyou Yang

Remove "battery_temperature" member, it is redundant, it is the
hardware's responsibility to handle TH pin properly.
It is unnecessary to use the dt property to check if there is
a battery temperature monitor or not.

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2: None

 drivers/power/act8945a_charger.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/power/act8945a_charger.c b/drivers/power/act8945a_charger.c
index b5c00e4..72d39ba 100644
--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -76,7 +76,6 @@ static const char *act8945a_charger_manufacturer = 
"Active-semi";
 
 struct act8945a_charger {
struct regmap *regmap;
-   bool battery_temperature;
 };
 
 static int act8945a_get_charger_state(struct regmap *regmap, int *val)
@@ -138,8 +137,7 @@ static int act8945a_get_charge_type(struct regmap *regmap, 
int *val)
return 0;
 }
 
-static int act8945a_get_battery_health(struct act8945a_charger *charger,
-  struct regmap *regmap, int *val)
+static int act8945a_get_battery_health(struct regmap *regmap, int *val)
 {
int ret;
unsigned int status;
@@ -148,7 +146,7 @@ static int act8945a_get_battery_health(struct 
act8945a_charger *charger,
if (ret < 0)
return ret;
 
-   if (charger->battery_temperature && !(status & APCH_STATUS_TEMPDAT))
+   if (status & APCH_STATUS_TEMPDAT)
*val = POWER_SUPPLY_HEALTH_OVERHEAT;
else if (!(status & APCH_STATUS_INDAT))
*val = POWER_SUPPLY_HEALTH_OVERVOLTAGE;
@@ -188,8 +186,7 @@ static int act8945a_charger_get_property(struct 
power_supply *psy,
val->intval = POWER_SUPPLY_TECHNOLOGY_LION;
break;
case POWER_SUPPLY_PROP_HEALTH:
-   ret = act8945a_get_battery_health(charger,
- regmap, >intval);
+   ret = act8945a_get_battery_health(regmap, >intval);
break;
case POWER_SUPPLY_PROP_MODEL_NAME:
val->strval = act8945a_charger_model;
@@ -235,9 +232,6 @@ static int act8945a_charger_config(struct device *dev,
return -EINVAL;
}
 
-   charger->battery_temperature = of_property_read_bool(np,
-   "active-semi,check-battery-temperature");
-
chglev_pin = of_get_named_gpio_flags(np,
"active-semi,chglev-gpios", 0, );
 
-- 
2.7.4

[PATCH v3 0/7] power: act8945a_charger: Improvements

2016-07-28 Thread Wenyou Yang

This patch series is used to improve the act8945a-charger, such as
improve the way to check the status, fix the power supply type
property, add the status change update, and add more properties:
capacity level property and max current property.

Changes in v3:
 - Remove unneeded semicolon to fix semicolon.cocci warning.

Changes in v2:
 - Add missing ret declaration.

Wenyou Yang (7):
  power: act8945a_charger: Remove "battery_temperature"
  power: act8945a_charger: Improve
  power: act8945a_charger: Add status change update support
  power: act8945a_charger: Fix the power supply type
  power: act8945a_charger: Add capacity level property
  power: act8945a_charger: Add max current property
  doc: bindings: act8945a-charger: Update properties

 .../devicetree/bindings/power/act8945a-charger.txt |   6 +-
 drivers/power/act8945a_charger.c   | 369 ++---
 2 files changed, 336 insertions(+), 39 deletions(-)

-- 
2.7.4

[PATCH v3 0/7] power: act8945a_charger: Improvements

2016-07-28 Thread Wenyou Yang

This patch series is used to improve the act8945a-charger, such as
improve the way to check the status, fix the power supply type
property, add the status change update, and add more properties:
capacity level property and max current property.

Changes in v3:
 - Remove unneeded semicolon to fix semicolon.cocci warning.

Changes in v2:
 - Add missing ret declaration.

Wenyou Yang (7):
  power: act8945a_charger: Remove "battery_temperature"
  power: act8945a_charger: Improve
  power: act8945a_charger: Add status change update support
  power: act8945a_charger: Fix the power supply type
  power: act8945a_charger: Add capacity level property
  power: act8945a_charger: Add max current property
  doc: bindings: act8945a-charger: Update properties

 .../devicetree/bindings/power/act8945a-charger.txt |   6 +-
 drivers/power/act8945a_charger.c   | 369 ++---
 2 files changed, 336 insertions(+), 39 deletions(-)

-- 
2.7.4

[PATCH v4 2/2] phy: add a driver for the Rockchip SoC internal PCIe PHY

2016-07-28 Thread Shawn Lin

This patch to add a generic PHY driver for rockchip PCIe PHY.
Access the PHY via registers provided by GRF (general register
files) module.

Signed-off-by: Shawn Lin 

---

Changes in v4:
- remove laneoff symbol as we still fail to get a workable solution
  except for exporting symbol. But I will be back with some other possible
  routine once finished.

Changes in v3: None
Changes in v2: None

 drivers/phy/Kconfig |   7 +
 drivers/phy/Makefile|   1 +
 drivers/phy/phy-rockchip-pcie.c | 357 
 3 files changed, 365 insertions(+)
 create mode 100644 drivers/phy/phy-rockchip-pcie.c

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index 02afc624..824b568 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -371,6 +371,13 @@ config PHY_ROCKCHIP_DP
help
  Enable this to support the Rockchip Display Port PHY.
 
+config PHY_ROCKCHIP_PCIE
+   tristate "Rockchip PCIe PHY Driver"
+   depends on ARCH_ROCKCHIP && OF
+   select GENERIC_PHY
+   help
+ Enable this to support the Rockchip PCIe PHY.
+
 config PHY_ST_SPEAR1310_MIPHY
tristate "ST SPEAR1310-MIPHY driver"
select GENERIC_PHY
diff --git a/drivers/phy/Makefile b/drivers/phy/Makefile
index fa8480e..0a9dcd8 100644
--- a/drivers/phy/Makefile
+++ b/drivers/phy/Makefile
@@ -40,6 +40,7 @@ obj-$(CONFIG_PHY_EXYNOS5_USBDRD)  += phy-exynos5-usbdrd.o
 obj-$(CONFIG_PHY_QCOM_APQ8064_SATA)+= phy-qcom-apq8064-sata.o
 obj-$(CONFIG_PHY_ROCKCHIP_USB) += phy-rockchip-usb.o
 obj-$(CONFIG_PHY_ROCKCHIP_EMMC) += phy-rockchip-emmc.o
+obj-$(CONFIG_PHY_ROCKCHIP_PCIE) += phy-rockchip-pcie.o
 obj-$(CONFIG_PHY_ROCKCHIP_DP)  += phy-rockchip-dp.o
 obj-$(CONFIG_PHY_QCOM_IPQ806X_SATA)+= phy-qcom-ipq806x-sata.o
 obj-$(CONFIG_PHY_ST_SPEAR1310_MIPHY)   += phy-spear1310-miphy.o
diff --git a/drivers/phy/phy-rockchip-pcie.c b/drivers/phy/phy-rockchip-pcie.c
new file mode 100644
index 000..63cd200
--- /dev/null
+++ b/drivers/phy/phy-rockchip-pcie.c
@@ -0,0 +1,357 @@
+/*
+ * Rockchip PCIe PHY driver
+ *
+ * Copyright (C) 2016 Shawn Lin 
+ * Copyright (C) 2016 ROCKCHIP, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * The higher 16-bit of this register is used for write protection
+ * only if BIT(x + 16) set to 1 the BIT(x) can be written.
+ */
+#define HIWORD_UPDATE(val, mask, shift) \
+   ((val) << (shift) | (mask) << ((shift) + 16))
+
+#define PHY_MAX_LANE_NUM  4
+#define PHY_CFG_DATA_SHIFT7
+#define PHY_CFG_ADDR_SHIFT1
+#define PHY_CFG_DATA_MASK 0xf
+#define PHY_CFG_ADDR_MASK 0x3f
+#define PHY_CFG_RD_MASK   0x3ff
+#define PHY_CFG_WR_ENABLE 1
+#define PHY_CFG_WR_DISABLE1
+#define PHY_CFG_WR_SHIFT  0
+#define PHY_CFG_WR_MASK   1
+#define PHY_CFG_PLL_LOCK  0x10
+#define PHY_CFG_CLK_TEST  0x10
+#define PHY_CFG_CLK_SCC   0x12
+#define PHY_CFG_SEPE_RATE BIT(3)
+#define PHY_CFG_PLL_100M  BIT(3)
+#define PHY_PLL_LOCKEDBIT(9)
+#define PHY_PLL_OUTPUTBIT(10)
+#define PHY_LANE_A_STATUS 0x30
+#define PHY_LANE_B_STATUS 0x31
+#define PHY_LANE_C_STATUS 0x32
+#define PHY_LANE_D_STATUS 0x33
+#define PHY_LANE_RX_DET_SHIFT 11
+#define PHY_LANE_RX_DET_TH0x1
+#define PHY_LANE_IDLE_OFF 0x1
+#define PHY_LANE_IDLE_MASK0x1
+#define PHY_LANE_IDLE_A_SHIFT 3
+#define PHY_LANE_IDLE_B_SHIFT 4
+#define PHY_LANE_IDLE_C_SHIFT 5
+#define PHY_LANE_IDLE_D_SHIFT 6
+
+struct rockchip_pcie_data {
+   unsigned int pcie_conf;
+   unsigned int pcie_status;
+   unsigned int pcie_laneoff;
+};
+
+struct rockchip_pcie_phy {
+   struct rockchip_pcie_data *phy_data;
+   struct regmap *reg_base;
+   struct reset_control *phy_rst;
+   struct clk *clk_pciephy_ref;
+};
+
+static inline void phy_wr_cfg(struct rockchip_pcie_phy *rk_phy,
+ u32 addr, u32 data)
+{
+   regmap_write(rk_phy->reg_base, rk_phy->phy_data->pcie_conf,
+HIWORD_UPDATE(data,
+  PHY_CFG_DATA_MASK,
+  PHY_CFG_DATA_SHIFT) |
+HIWORD_UPDATE(addr,
+  PHY_CFG_ADDR_MASK,
+  PHY_CFG_ADDR_SHIFT));
+   udelay(1);
+   regmap_write(rk_phy->reg_base, rk_phy->phy_data->pcie_conf,
+

[PATCH v4 2/2] phy: add a driver for the Rockchip SoC internal PCIe PHY

2016-07-28 Thread Shawn Lin

This patch to add a generic PHY driver for rockchip PCIe PHY.
Access the PHY via registers provided by GRF (general register
files) module.

Signed-off-by: Shawn Lin 

---

Changes in v4:
- remove laneoff symbol as we still fail to get a workable solution
  except for exporting symbol. But I will be back with some other possible
  routine once finished.

Changes in v3: None
Changes in v2: None

 drivers/phy/Kconfig |   7 +
 drivers/phy/Makefile|   1 +
 drivers/phy/phy-rockchip-pcie.c | 357 
 3 files changed, 365 insertions(+)
 create mode 100644 drivers/phy/phy-rockchip-pcie.c

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index 02afc624..824b568 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -371,6 +371,13 @@ config PHY_ROCKCHIP_DP
help
  Enable this to support the Rockchip Display Port PHY.
 
+config PHY_ROCKCHIP_PCIE
+   tristate "Rockchip PCIe PHY Driver"
+   depends on ARCH_ROCKCHIP && OF
+   select GENERIC_PHY
+   help
+ Enable this to support the Rockchip PCIe PHY.
+
 config PHY_ST_SPEAR1310_MIPHY
tristate "ST SPEAR1310-MIPHY driver"
select GENERIC_PHY
diff --git a/drivers/phy/Makefile b/drivers/phy/Makefile
index fa8480e..0a9dcd8 100644
--- a/drivers/phy/Makefile
+++ b/drivers/phy/Makefile
@@ -40,6 +40,7 @@ obj-$(CONFIG_PHY_EXYNOS5_USBDRD)  += phy-exynos5-usbdrd.o
 obj-$(CONFIG_PHY_QCOM_APQ8064_SATA)+= phy-qcom-apq8064-sata.o
 obj-$(CONFIG_PHY_ROCKCHIP_USB) += phy-rockchip-usb.o
 obj-$(CONFIG_PHY_ROCKCHIP_EMMC) += phy-rockchip-emmc.o
+obj-$(CONFIG_PHY_ROCKCHIP_PCIE) += phy-rockchip-pcie.o
 obj-$(CONFIG_PHY_ROCKCHIP_DP)  += phy-rockchip-dp.o
 obj-$(CONFIG_PHY_QCOM_IPQ806X_SATA)+= phy-qcom-ipq806x-sata.o
 obj-$(CONFIG_PHY_ST_SPEAR1310_MIPHY)   += phy-spear1310-miphy.o
diff --git a/drivers/phy/phy-rockchip-pcie.c b/drivers/phy/phy-rockchip-pcie.c
new file mode 100644
index 000..63cd200
--- /dev/null
+++ b/drivers/phy/phy-rockchip-pcie.c
@@ -0,0 +1,357 @@
+/*
+ * Rockchip PCIe PHY driver
+ *
+ * Copyright (C) 2016 Shawn Lin 
+ * Copyright (C) 2016 ROCKCHIP, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * The higher 16-bit of this register is used for write protection
+ * only if BIT(x + 16) set to 1 the BIT(x) can be written.
+ */
+#define HIWORD_UPDATE(val, mask, shift) \
+   ((val) << (shift) | (mask) << ((shift) + 16))
+
+#define PHY_MAX_LANE_NUM  4
+#define PHY_CFG_DATA_SHIFT7
+#define PHY_CFG_ADDR_SHIFT1
+#define PHY_CFG_DATA_MASK 0xf
+#define PHY_CFG_ADDR_MASK 0x3f
+#define PHY_CFG_RD_MASK   0x3ff
+#define PHY_CFG_WR_ENABLE 1
+#define PHY_CFG_WR_DISABLE1
+#define PHY_CFG_WR_SHIFT  0
+#define PHY_CFG_WR_MASK   1
+#define PHY_CFG_PLL_LOCK  0x10
+#define PHY_CFG_CLK_TEST  0x10
+#define PHY_CFG_CLK_SCC   0x12
+#define PHY_CFG_SEPE_RATE BIT(3)
+#define PHY_CFG_PLL_100M  BIT(3)
+#define PHY_PLL_LOCKEDBIT(9)
+#define PHY_PLL_OUTPUTBIT(10)
+#define PHY_LANE_A_STATUS 0x30
+#define PHY_LANE_B_STATUS 0x31
+#define PHY_LANE_C_STATUS 0x32
+#define PHY_LANE_D_STATUS 0x33
+#define PHY_LANE_RX_DET_SHIFT 11
+#define PHY_LANE_RX_DET_TH0x1
+#define PHY_LANE_IDLE_OFF 0x1
+#define PHY_LANE_IDLE_MASK0x1
+#define PHY_LANE_IDLE_A_SHIFT 3
+#define PHY_LANE_IDLE_B_SHIFT 4
+#define PHY_LANE_IDLE_C_SHIFT 5
+#define PHY_LANE_IDLE_D_SHIFT 6
+
+struct rockchip_pcie_data {
+   unsigned int pcie_conf;
+   unsigned int pcie_status;
+   unsigned int pcie_laneoff;
+};
+
+struct rockchip_pcie_phy {
+   struct rockchip_pcie_data *phy_data;
+   struct regmap *reg_base;
+   struct reset_control *phy_rst;
+   struct clk *clk_pciephy_ref;
+};
+
+static inline void phy_wr_cfg(struct rockchip_pcie_phy *rk_phy,
+ u32 addr, u32 data)
+{
+   regmap_write(rk_phy->reg_base, rk_phy->phy_data->pcie_conf,
+HIWORD_UPDATE(data,
+  PHY_CFG_DATA_MASK,
+  PHY_CFG_DATA_SHIFT) |
+HIWORD_UPDATE(addr,
+  PHY_CFG_ADDR_MASK,
+  PHY_CFG_ADDR_SHIFT));
+   udelay(1);
+   regmap_write(rk_phy->reg_base, rk_phy->phy_data->pcie_conf,
+HIWORD_UPDATE(PHY_CFG_WR_ENABLE,
+

Re: warning: calling ‘__builtin_return_address’ with a nonzero argument is unsafe

2016-07-28 Thread Linus Torvalds

On Wed, Jul 27, 2016 at 8:52 PM, Steven Rostedt  wrote:
>
> I just looked at your patch. Would this work if you moved that
> KBUILD_CFLAGS to the tracing directory? Something like the below (never
> compiled or tested).

I tried something like that, but the CFLAGS games the tracing code
does made my thing result in

  kernel/trace/Makefile:20: *** Recursive variable 'KBUILD_CFLAGS'
references itself (eventually).  Stop.

but yes, if you have the magic fingers to make it work, limiting the
-Wno-frame-address to just the tracing code sounds like the
RightThing(tm).

  Linus

[PATCH v4 1/2] Documentation: bindings: add dt documentation for Rockchip PCIe PHY

2016-07-28 Thread Shawn Lin

This patch adds a binding that describes the Rockchip PCIe PHY
found on Rockchip SoCs PCIe interface.

Signed-off-by: Shawn Lin 

---

Changes in v4: None
Changes in v3:
- rename the node to pcie_phy: pcie-phy suggested by Doug

Changes in v2:
- add clk and reset description
- remove unit-address

 .../devicetree/bindings/phy/rockchip-pcie-phy.txt  | 32 ++
 1 file changed, 32 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/phy/rockchip-pcie-phy.txt

diff --git a/Documentation/devicetree/bindings/phy/rockchip-pcie-phy.txt 
b/Documentation/devicetree/bindings/phy/rockchip-pcie-phy.txt
new file mode 100644
index 000..aedca29
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/rockchip-pcie-phy.txt
@@ -0,0 +1,32 @@
+Rockchip PCIE PHY
+---
+
+Required properties:
+ - compatible: rockchip,rk3399-pcie-phy
+ - #phy-cells: must be 0
+ - clocks: Must contain an entry in clock-names.
+   See ../clocks/clock-bindings.txt for details.
+ - clock-names: Must be "refclk"
+ - resets: Must contain an entry in reset-names.
+   See ../reset/reset.txt for details.
+ - reset-names: Must be "phy"
+
+Example:
+
+grf: syscon@ff77 {
+   compatible = "rockchip,rk3399-grf", "syscon", "simple-mfd";
+   #address-cells = <1>;
+   #size-cells = <1>;
+
+   ...
+
+   pcie_phy: pcie-phy {
+   compatible = "rockchip,rk3399-pcie-phy";
+   #phy-cells = <0>;
+   clocks = < SCLK_PCIEPHY_REF>;
+   clock-names = "refclk";
+   resets = < SRST_PCIEPHY>;
+   reset-names = "phy";
+   };
+};
+
-- 
2.3.7

Re: warning: calling ‘__builtin_return_address’ with a nonzero argument is unsafe

2016-07-28 Thread Linus Torvalds

On Wed, Jul 27, 2016 at 8:52 PM, Steven Rostedt  wrote:
>
> I just looked at your patch. Would this work if you moved that
> KBUILD_CFLAGS to the tracing directory? Something like the below (never
> compiled or tested).

I tried something like that, but the CFLAGS games the tracing code
does made my thing result in

  kernel/trace/Makefile:20: *** Recursive variable 'KBUILD_CFLAGS'
references itself (eventually).  Stop.

but yes, if you have the magic fingers to make it work, limiting the
-Wno-frame-address to just the tracing code sounds like the
RightThing(tm).

  Linus

[PATCH v4 1/2] Documentation: bindings: add dt documentation for Rockchip PCIe PHY

2016-07-28 Thread Shawn Lin

This patch adds a binding that describes the Rockchip PCIe PHY
found on Rockchip SoCs PCIe interface.

Signed-off-by: Shawn Lin 

---

Changes in v4: None
Changes in v3:
- rename the node to pcie_phy: pcie-phy suggested by Doug

Changes in v2:
- add clk and reset description
- remove unit-address

 .../devicetree/bindings/phy/rockchip-pcie-phy.txt  | 32 ++
 1 file changed, 32 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/phy/rockchip-pcie-phy.txt

diff --git a/Documentation/devicetree/bindings/phy/rockchip-pcie-phy.txt 
b/Documentation/devicetree/bindings/phy/rockchip-pcie-phy.txt
new file mode 100644
index 000..aedca29
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/rockchip-pcie-phy.txt
@@ -0,0 +1,32 @@
+Rockchip PCIE PHY
+---
+
+Required properties:
+ - compatible: rockchip,rk3399-pcie-phy
+ - #phy-cells: must be 0
+ - clocks: Must contain an entry in clock-names.
+   See ../clocks/clock-bindings.txt for details.
+ - clock-names: Must be "refclk"
+ - resets: Must contain an entry in reset-names.
+   See ../reset/reset.txt for details.
+ - reset-names: Must be "phy"
+
+Example:
+
+grf: syscon@ff77 {
+   compatible = "rockchip,rk3399-grf", "syscon", "simple-mfd";
+   #address-cells = <1>;
+   #size-cells = <1>;
+
+   ...
+
+   pcie_phy: pcie-phy {
+   compatible = "rockchip,rk3399-pcie-phy";
+   #phy-cells = <0>;
+   clocks = < SCLK_PCIEPHY_REF>;
+   clock-names = "refclk";
+   resets = < SRST_PCIEPHY>;
+   reset-names = "phy";
+   };
+};
+
-- 
2.3.7

Re: [PATCH] sched/core: add taint on "BUG: sleeping function called from invalid context"

2016-07-28 Thread Rusty Russell

Vegard Nossum  writes:
> Seeing this, it occurs to me that we should probably add a taint here:

Taint has traditionally meant "the user did something unsupported, take
the bug report with a grain of salt".  Such as force removing a module.

So this seems wrong...

Cheers,
Rusty.


>
> BUG: sleeping function called from invalid context at mm/slab.h:388
> in_atomic(): 0, irqs_disabled(): 0, pid: 32211, name: trinity-c3
> Preemption disabled at:[] console_unlock+0x2f7/0x930
>
> CPU: 3 PID: 32211 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
>^^^
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
>   8800b8a17160 81971441 88011a3c4c80
>  88011a3c4c80 8800b8a17198 81158067 0de6
>  88011a3c4c80 8390e07c 0184 
> Call Trace:
> [...]
>
> BUG: sleeping function called from invalid context at 
> arch/x86/mm/fault.c:1309
> in_atomic(): 0, irqs_disabled(): 0, pid: 32211, name: trinity-c3
> Preemption disabled at:[] down_trylock+0x13/0x80
>
> CPU: 3 PID: 32211 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
>^^^
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
>   8800b8a17e08 81971441 88011a3c4c80
>  88011a3c4c80 8800b8a17e40 81158067 
>  88011a3c4c80 83437b20 051d 
> Call Trace:
> [...]
>
> Cc: Peter Zijlstra 
> Cc: Paul E. McKenney 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Rusty Russel 
> Signed-off-by: Vegard Nossum 
> ---
>  kernel/sched/core.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 97ee9ac..7171cf9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7573,6 +7573,7 @@ void ___might_sleep(const char *file, int line, int 
> preempt_offset)
>   }
>  #endif
>   dump_stack();
> + add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
>  }
>  EXPORT_SYMBOL(___might_sleep);
>  #endif
> -- 
> 1.9.1

Re: [PATCH] sched/core: add taint on "BUG: sleeping function called from invalid context"

2016-07-28 Thread Rusty Russell

Vegard Nossum  writes:
> Seeing this, it occurs to me that we should probably add a taint here:

Taint has traditionally meant "the user did something unsupported, take
the bug report with a grain of salt".  Such as force removing a module.

So this seems wrong...

Cheers,
Rusty.


>
> BUG: sleeping function called from invalid context at mm/slab.h:388
> in_atomic(): 0, irqs_disabled(): 0, pid: 32211, name: trinity-c3
> Preemption disabled at:[] console_unlock+0x2f7/0x930
>
> CPU: 3 PID: 32211 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
>^^^
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
>   8800b8a17160 81971441 88011a3c4c80
>  88011a3c4c80 8800b8a17198 81158067 0de6
>  88011a3c4c80 8390e07c 0184 
> Call Trace:
> [...]
>
> BUG: sleeping function called from invalid context at 
> arch/x86/mm/fault.c:1309
> in_atomic(): 0, irqs_disabled(): 0, pid: 32211, name: trinity-c3
> Preemption disabled at:[] down_trylock+0x13/0x80
>
> CPU: 3 PID: 32211 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
>^^^
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
>   8800b8a17e08 81971441 88011a3c4c80
>  88011a3c4c80 8800b8a17e40 81158067 
>  88011a3c4c80 83437b20 051d 
> Call Trace:
> [...]
>
> Cc: Peter Zijlstra 
> Cc: Paul E. McKenney 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Rusty Russel 
> Signed-off-by: Vegard Nossum 
> ---
>  kernel/sched/core.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 97ee9ac..7171cf9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7573,6 +7573,7 @@ void ___might_sleep(const char *file, int line, int 
> preempt_offset)
>   }
>  #endif
>   dump_stack();
> + add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
>  }
>  EXPORT_SYMBOL(___might_sleep);
>  #endif
> -- 
> 1.9.1

Re: [PATCH] ses: Fix racy cleanup of /sys in remove_dev()

2016-07-28 Thread Martin K. Petersen

> "Calvin" == Calvin Owens  writes:

>> Any thoughts? Squinting at this more it still seems racy, but a
>> narrow race is surely better than just blatantly freeing everything
>> while the file is still exposed in /sys? Is there a better way you'd
>> prefer I accomplish this?
>> 
>> (I have boxes that OOPS all the time from monitoring code reading the
>> /sys files, with this patch I haven't seen a single one.)

Calvin> Ping? Thoughts, comments?

James: This is your puppy...

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] ses: Fix racy cleanup of /sys in remove_dev()

2016-07-28 Thread Martin K. Petersen

> "Calvin" == Calvin Owens  writes:

>> Any thoughts? Squinting at this more it still seems racy, but a
>> narrow race is surely better than just blatantly freeing everything
>> while the file is still exposed in /sys? Is there a better way you'd
>> prefer I accomplish this?
>> 
>> (I have boxes that OOPS all the time from monitoring code reading the
>> /sys files, with this patch I haven't seen a single one.)

Calvin> Ping? Thoughts, comments?

James: This is your puppy...

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] lpfc: Fix possible NULL pointer dereference

2016-07-28 Thread Martin K. Petersen

> "James" == James Smart  writes:

James> This patch is good.

Johannes: You were going to tweak a few things and resubmit. Please do.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] lpfc: Fix possible NULL pointer dereference

2016-07-28 Thread Martin K. Petersen

> "James" == James Smart  writes:

James> This patch is good.

Johannes: You were going to tweak a few things and resubmit. Please do.

-- 
Martin K. Petersen  Oracle Linux Engineering

linux-next: build failure after merge of the vfs tree

2016-07-28 Thread Stephen Rothwell

Hi Al,

After merging the vfs tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

fs/fuse/dir.c: In function 'fuse_reverse_inval_entry':
fs/fuse/dir.c:958:13: error: assignment of member 'hash' in read-only object
  name->hash = full_name_hash(dir, name->name, name->len);
 ^

Caused by commit

  8387ff2577eb ("vfs: make the string hashes salt the hash")

from Linus' tree interacting with commit

  5e70178ae20b ("qstr: constify instances in fuse")

from the vfs tree.

I added this merge fix patch:

From: Stephen Rothwell 
Date: Fri, 29 Jul 2016 11:07:45 +1000
Subject: [PATCH] qstr: unconstify fuse_reverse_inval_entry parameter

Signed-off-by: Stephen Rothwell 
---
 fs/fuse/dir.c| 2 +-
 fs/fuse/fuse_i.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index f910578e51ba..c47b7780ce37 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -935,7 +935,7 @@ int fuse_update_attributes(struct inode *inode, struct 
kstat *stat,
 }
 
 int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid,
-u64 child_nodeid, const struct qstr *name)
+u64 child_nodeid, struct qstr *name)
 {
int err = -ENOTDIR;
struct inode *parent;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 6df761726a53..d98d8cc84def 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -929,7 +929,7 @@ int fuse_reverse_inval_inode(struct super_block *sb, u64 
nodeid,
  * then the dentry is unhashed (d_delete()).
  */
 int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid,
-u64 child_nodeid, const struct qstr *name);
+u64 child_nodeid, struct qstr *name);
 
 int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
 bool isdir);
-- 
2.8.1

-- 
Cheers,
Stephen Rothwell

linux-next: build failure after merge of the vfs tree

2016-07-28 Thread Stephen Rothwell

Hi Al,

After merging the vfs tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

fs/fuse/dir.c: In function 'fuse_reverse_inval_entry':
fs/fuse/dir.c:958:13: error: assignment of member 'hash' in read-only object
  name->hash = full_name_hash(dir, name->name, name->len);
 ^

Caused by commit

  8387ff2577eb ("vfs: make the string hashes salt the hash")

from Linus' tree interacting with commit

  5e70178ae20b ("qstr: constify instances in fuse")

from the vfs tree.

I added this merge fix patch:

From: Stephen Rothwell 
Date: Fri, 29 Jul 2016 11:07:45 +1000
Subject: [PATCH] qstr: unconstify fuse_reverse_inval_entry parameter

Signed-off-by: Stephen Rothwell 
---
 fs/fuse/dir.c| 2 +-
 fs/fuse/fuse_i.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index f910578e51ba..c47b7780ce37 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -935,7 +935,7 @@ int fuse_update_attributes(struct inode *inode, struct 
kstat *stat,
 }
 
 int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid,
-u64 child_nodeid, const struct qstr *name)
+u64 child_nodeid, struct qstr *name)
 {
int err = -ENOTDIR;
struct inode *parent;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 6df761726a53..d98d8cc84def 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -929,7 +929,7 @@ int fuse_reverse_inval_inode(struct super_block *sb, u64 
nodeid,
  * then the dentry is unhashed (d_delete()).
  */
 int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid,
-u64 child_nodeid, const struct qstr *name);
+u64 child_nodeid, struct qstr *name);
 
 int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
 bool isdir);
-- 
2.8.1

-- 
Cheers,
Stephen Rothwell

Re: To add, or not to add, a bio REQ_ROTATIONAL flag

2016-07-28 Thread Martin K. Petersen

> "Eric" == Eric Wheeler  writes:

Eric,

Eric> However, just because FADV_SEQUENTIAL is flagged doesn't mean the
Eric> cache should bypass.  Filesystems can fragment, and while the file
Eric> being read may be read sequentially, the blocks on which it
Eric> resides may not be.  Same thing for higher-level block devices
Eric> such as dm-thinp where one might sequentially read a thin volume
Eric> but its _tdata might not be in linear order.  This may imply that
Eric> we need a new way to flag cache bypass from userspace that is
Eric> neither io-priority nor fadvise driven.

Why conflate the two? Something being a background task is orthogonal to
whether it is being read sequentially or not.

Eric> So what are our options?  What might be the best way to do this?

For the SCSI I/O hints I use the idle I/O priority to classify
backups. Works fine.

Eric> Are FADV_NOREUSE/FADV_DONTNEED reasonable candidates?

FADV_DONTNEED was intended for this. There have been patches posted in
the past that tied the loop between the fadvise flags and the bio. I
would like to see those revived.

Eric> Perhaps ionice could be used used, but the concept of "priority"
Eric> doesn't exactly encompass the concept of cache-bypass---so is
Eric> something else needed?

The idle class explicitly does not have a priority.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: To add, or not to add, a bio REQ_ROTATIONAL flag

2016-07-28 Thread Martin K. Petersen

> "Eric" == Eric Wheeler  writes:

Eric,

Eric> However, just because FADV_SEQUENTIAL is flagged doesn't mean the
Eric> cache should bypass.  Filesystems can fragment, and while the file
Eric> being read may be read sequentially, the blocks on which it
Eric> resides may not be.  Same thing for higher-level block devices
Eric> such as dm-thinp where one might sequentially read a thin volume
Eric> but its _tdata might not be in linear order.  This may imply that
Eric> we need a new way to flag cache bypass from userspace that is
Eric> neither io-priority nor fadvise driven.

Why conflate the two? Something being a background task is orthogonal to
whether it is being read sequentially or not.

Eric> So what are our options?  What might be the best way to do this?

For the SCSI I/O hints I use the idle I/O priority to classify
backups. Works fine.

Eric> Are FADV_NOREUSE/FADV_DONTNEED reasonable candidates?

FADV_DONTNEED was intended for this. There have been patches posted in
the past that tied the loop between the fadvise flags and the bio. I
would like to see those revived.

Eric> Perhaps ionice could be used used, but the concept of "priority"
Eric> doesn't exactly encompass the concept of cache-bypass---so is
Eric> something else needed?

The idle class explicitly does not have a priority.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: To add, or not to add, a bio REQ_ROTATIONAL flag

2016-07-28 Thread Wols Lists

On 29/07/16 01:50, Eric Wheeler wrote:
> Hello all,
> 
> With the many SSD caching layers being developed (bcache, dm-cache, 
> dm-writeboost, etc), how could we flag a bio from userspace to indicate 
> whether the bio is preferred to hit spinning disks instead of an SSD?
> 
> Unnecessary promotions, evections, and writeback increase the write burden 
> on the caching layer and burns out SSDs too fast (TBW), thus requring 
> equipment replacement.

What's the spec of these devices? How long are they expected to last?

Other recent posts on this (linux-raid) mailing list refer to tests on
SSDs that indicates their typical life is way beyond their nominal life,
and that in normal usage they are actually likely to outlive "spinning
rust".

http://techreport.com/review/24841/introducing-the-ssd-endurance-experiment

http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead/3

Looking at the results, the FIRST drives only started failing once
they'd written some 700 Terabytes. How long is it going to take you to
write that much data over a SATA3 link?

Cheers,
Wol

Re: To add, or not to add, a bio REQ_ROTATIONAL flag

2016-07-28 Thread Wols Lists

On 29/07/16 01:50, Eric Wheeler wrote:
> Hello all,
> 
> With the many SSD caching layers being developed (bcache, dm-cache, 
> dm-writeboost, etc), how could we flag a bio from userspace to indicate 
> whether the bio is preferred to hit spinning disks instead of an SSD?
> 
> Unnecessary promotions, evections, and writeback increase the write burden 
> on the caching layer and burns out SSDs too fast (TBW), thus requring 
> equipment replacement.

What's the spec of these devices? How long are they expected to last?

Other recent posts on this (linux-raid) mailing list refer to tests on
SSDs that indicates their typical life is way beyond their nominal life,
and that in normal usage they are actually likely to outlive "spinning
rust".

http://techreport.com/review/24841/introducing-the-ssd-endurance-experiment

http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead/3

Looking at the results, the FIRST drives only started failing once
they'd written some 700 Terabytes. How long is it going to take you to
write that much data over a SATA3 link?

Cheers,
Wol

RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process

2016-07-28 Thread Li, Liang Z

> > > On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > > > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > > > + vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > > > + vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > > > + vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > > > +  BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > > > + hdr_len = sizeof(struct balloon_bmap_hdr);
> > > > > + vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len,
> GFP_KERNEL);
> > > >
> > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > How big was the pfn buffer before?
> > >
> > >
> > > Yes I would limit this to 1G memory in a go, will result in a 32KByte 
> > > bitmap.
> > >
> > > --
> > > MST
> >
> > Limit to 1G is bad for the performance, I sent you the test result several
> weeks ago.
> >
> > Paste it bellow:
> > --
> > --
> > About the size of page bitmap, I have test the performance of filling
> > the balloon to 15GB with a  16GB RAM VM.
> >
> > ===
> > 32K Byte (cover 1GB of RAM)
> >
> > Time spends on inflating: 2031ms
> > -
> > 64K Byte (cover 2GB of RAM)
> >
> > Time spends on inflating: 1507ms
> > 
> > 512K Byte (cover 16GB of RAM)
> >
> > Time spends on inflating: 1237ms
> > 
> >
> > If possible, a big bitmap is better for performance.
> >
> > Liang
> 
> Earlier you said:
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> Here sending PFNs to host with 512K Byte map should be almost free.
> 
> So is something else taking up the time?
> 
I just want to show you the benefits of using a big bitmap. :)
I did not measure the time spend on each stage after optimization(I will do it 
later),
but I have tried to allocate the page with big chunk and found it can make 
things faster.
Without allocating big chunk page, the performance improvement is about 85%, 
and with
 allocating big  chunk page, the improvement is about 94%.

Liang

> 
> --
> MST

RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process

2016-07-28 Thread Li, Liang Z

> > > On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > > > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > > > + vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > > > + vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > > > + vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > > > +  BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > > > + hdr_len = sizeof(struct balloon_bmap_hdr);
> > > > > + vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len,
> GFP_KERNEL);
> > > >
> > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > How big was the pfn buffer before?
> > >
> > >
> > > Yes I would limit this to 1G memory in a go, will result in a 32KByte 
> > > bitmap.
> > >
> > > --
> > > MST
> >
> > Limit to 1G is bad for the performance, I sent you the test result several
> weeks ago.
> >
> > Paste it bellow:
> > --
> > --
> > About the size of page bitmap, I have test the performance of filling
> > the balloon to 15GB with a  16GB RAM VM.
> >
> > ===
> > 32K Byte (cover 1GB of RAM)
> >
> > Time spends on inflating: 2031ms
> > -
> > 64K Byte (cover 2GB of RAM)
> >
> > Time spends on inflating: 1507ms
> > 
> > 512K Byte (cover 16GB of RAM)
> >
> > Time spends on inflating: 1237ms
> > 
> >
> > If possible, a big bitmap is better for performance.
> >
> > Liang
> 
> Earlier you said:
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> Here sending PFNs to host with 512K Byte map should be almost free.
> 
> So is something else taking up the time?
> 
I just want to show you the benefits of using a big bitmap. :)
I did not measure the time spend on each stage after optimization(I will do it 
later),
but I have tried to allocate the page with big chunk and found it can make 
things faster.
Without allocating big chunk page, the performance improvement is about 85%, 
and with
 allocating big  chunk page, the improvement is about 94%.

Liang

> 
> --
> MST

Re: [kbuild-all] [PATCH] power: act8945a_charger: fix semicolon.cocci warnings

2016-07-28 Thread Fengguang Wu


Hi Wenyou,

On Fri, Jul 29, 2016 at 12:57:20AM +, Yang, Wenyou wrote:

Hi Fengguang,

I would like to merge this patch and add your Signed-off-by, do you agree?

I think it is better.


Yes that'd be fine, too.

Thanks,
Fengguang


From: kbuild test robot [l...@intel.com]
Sent: Friday, June 24, 2016 20:43
To: Yang, Wenyou
Cc: kbuild-...@01.org; Sebastian Reichel; Dmitry Eremin-Solenikov; David 
Woodhouse; Rob Herring; Pawel Moll; Mark Brown; Ian Campbell; Kumar Gala; 
linux-kernel@vger.kernel.org; devicet...@vger.kernel.org; 
linux-arm-ker...@lists.infradead.org; Ferre, Nicolas; linux...@vger.kernel.org; 
Yang, Wenyou
Subject: [PATCH] power: act8945a_charger: fix semicolon.cocci warnings

drivers/power/act8945a_charger.c:115:2-3: Unneeded semicolon


Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

CC: Wenyou Yang 
Signed-off-by: Fengguang Wu 
---

act8945a_charger.c |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -112,7 +112,7 @@ static int act8945a_get_charger_state(st
   else
   *val = POWER_SUPPLY_STATUS_NOT_CHARGING;
   break;
-   };
+   }

   return 0;
}
___
kbuild-all mailing list
kbuild-...@lists.01.org
https://lists.01.org/mailman/listinfo/kbuild-all

Re: [kbuild-all] [PATCH] power: act8945a_charger: fix semicolon.cocci warnings

2016-07-28 Thread Fengguang Wu


Hi Wenyou,

On Fri, Jul 29, 2016 at 12:57:20AM +, Yang, Wenyou wrote:

Hi Fengguang,

I would like to merge this patch and add your Signed-off-by, do you agree?

I think it is better.


Yes that'd be fine, too.

Thanks,
Fengguang


From: kbuild test robot [l...@intel.com]
Sent: Friday, June 24, 2016 20:43
To: Yang, Wenyou
Cc: kbuild-...@01.org; Sebastian Reichel; Dmitry Eremin-Solenikov; David 
Woodhouse; Rob Herring; Pawel Moll; Mark Brown; Ian Campbell; Kumar Gala; 
linux-kernel@vger.kernel.org; devicet...@vger.kernel.org; 
linux-arm-ker...@lists.infradead.org; Ferre, Nicolas; linux...@vger.kernel.org; 
Yang, Wenyou
Subject: [PATCH] power: act8945a_charger: fix semicolon.cocci warnings

drivers/power/act8945a_charger.c:115:2-3: Unneeded semicolon


Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

CC: Wenyou Yang 
Signed-off-by: Fengguang Wu 
---

act8945a_charger.c |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -112,7 +112,7 @@ static int act8945a_get_charger_state(st
   else
   *val = POWER_SUPPLY_STATUS_NOT_CHARGING;
   break;
-   };
+   }

   return 0;
}
___
kbuild-all mailing list
kbuild-...@lists.01.org
https://lists.01.org/mailman/listinfo/kbuild-all

RE: [PATCH] power: act8945a_charger: fix semicolon.cocci warnings

2016-07-28 Thread Yang, Wenyou

Hi Fengguang,

I would like to merge this patch and add your Signed-off-by, do you agree?

I think it is better.

Best Regards,
Wenyou Yang


From: kbuild test robot [l...@intel.com]
Sent: Friday, June 24, 2016 20:43
To: Yang, Wenyou
Cc: kbuild-...@01.org; Sebastian Reichel; Dmitry Eremin-Solenikov; David 
Woodhouse; Rob Herring; Pawel Moll; Mark Brown; Ian Campbell; Kumar Gala; 
linux-kernel@vger.kernel.org; devicet...@vger.kernel.org; 
linux-arm-ker...@lists.infradead.org; Ferre, Nicolas; linux...@vger.kernel.org; 
Yang, Wenyou
Subject: [PATCH] power: act8945a_charger: fix semicolon.cocci warnings

drivers/power/act8945a_charger.c:115:2-3: Unneeded semicolon


 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

CC: Wenyou Yang 
Signed-off-by: Fengguang Wu 
---

 act8945a_charger.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -112,7 +112,7 @@ static int act8945a_get_charger_state(st
else
*val = POWER_SUPPLY_STATUS_NOT_CHARGING;
break;
-   };
+   }

return 0;
 }

RE: [PATCH] power: act8945a_charger: fix semicolon.cocci warnings

2016-07-28 Thread Yang, Wenyou

Hi Fengguang,

I would like to merge this patch and add your Signed-off-by, do you agree?

I think it is better.

Best Regards,
Wenyou Yang


From: kbuild test robot [l...@intel.com]
Sent: Friday, June 24, 2016 20:43
To: Yang, Wenyou
Cc: kbuild-...@01.org; Sebastian Reichel; Dmitry Eremin-Solenikov; David 
Woodhouse; Rob Herring; Pawel Moll; Mark Brown; Ian Campbell; Kumar Gala; 
linux-kernel@vger.kernel.org; devicet...@vger.kernel.org; 
linux-arm-ker...@lists.infradead.org; Ferre, Nicolas; linux...@vger.kernel.org; 
Yang, Wenyou
Subject: [PATCH] power: act8945a_charger: fix semicolon.cocci warnings

drivers/power/act8945a_charger.c:115:2-3: Unneeded semicolon


 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

CC: Wenyou Yang 
Signed-off-by: Fengguang Wu 
---

 act8945a_charger.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/power/act8945a_charger.c
+++ b/drivers/power/act8945a_charger.c
@@ -112,7 +112,7 @@ static int act8945a_get_charger_state(st
else
*val = POWER_SUPPLY_STATUS_NOT_CHARGING;
break;
-   };
+   }

return 0;
 }

To add, or not to add, a bio REQ_ROTATIONAL flag

2016-07-28 Thread Eric Wheeler

Hello all,

With the many SSD caching layers being developed (bcache, dm-cache, 
dm-writeboost, etc), how could we flag a bio from userspace to indicate 
whether the bio is preferred to hit spinning disks instead of an SSD?

Unnecessary promotions, evections, and writeback increase the write burden 
on the caching layer and burns out SSDs too fast (TBW), thus requring 
equipment replacement.

Is there already a mechanism for this that could be added to the various 
caching mechanisms' promote/demote/bypass logic?

For example, I would like to prevent backups from influencing the cache 
eviction logic. Neither do I wish to evict cache due to a bio from a 
backup process, nor do I wish a bio from the backup process to be cached 
on the SSD.  


We would want to bypass the cache for IO that is somehow flagged to bypass 
block-layer caches and use the rotational disk unless the referenced block 
already exists on the SSD.

There might be two cases here that would be ideal to unify without 
touching filesystem code:

  1) open() of a block device

  2) open() on a file such that a filesystem must flag the bio

I had considered writing something to detect FADV_SEQUENTIAL/FADV_NOREUSE 
or `ionice -c3` on a process hitting bcache and modifying 
check_should_bypass()/should_writeback() to behave as such.

However, just because FADV_SEQUENTIAL is flagged doesn't mean the cache 
should bypass.  Filesystems can fragment, and while the file being read 
may be read sequentially, the blocks on which it resides may not be.  
Same thing for higher-level block devices such as dm-thinp where one might 
sequentially read a thin volume but its _tdata might not be in linear 
order.  This may imply that we need a new way to flag cache bypass from 
userspace that is neither io-priority nor fadvise driven.

So what are our options?  What might be the best way to do this?

If fadvise is the better option, how can a block device driver lookup the 
fadvise advice from a given bio struct?  Can we add an FADV_NOSSD flag 
since FADV_SEQUENTIAL may be insufficent?  Are FADV_NOREUSE/FADV_DONTNEED 
reasonable candidates?

Perhaps ionice could be used used, but the concept of "priority" 
doesn't exactly encompass the concept of cache-bypass---so is something 
else needed?

Other ideas?  


--
Eric Wheeler

To add, or not to add, a bio REQ_ROTATIONAL flag

2016-07-28 Thread Eric Wheeler

Hello all,

With the many SSD caching layers being developed (bcache, dm-cache, 
dm-writeboost, etc), how could we flag a bio from userspace to indicate 
whether the bio is preferred to hit spinning disks instead of an SSD?

Unnecessary promotions, evections, and writeback increase the write burden 
on the caching layer and burns out SSDs too fast (TBW), thus requring 
equipment replacement.

Is there already a mechanism for this that could be added to the various 
caching mechanisms' promote/demote/bypass logic?

For example, I would like to prevent backups from influencing the cache 
eviction logic. Neither do I wish to evict cache due to a bio from a 
backup process, nor do I wish a bio from the backup process to be cached 
on the SSD.  


We would want to bypass the cache for IO that is somehow flagged to bypass 
block-layer caches and use the rotational disk unless the referenced block 
already exists on the SSD.

There might be two cases here that would be ideal to unify without 
touching filesystem code:

  1) open() of a block device

  2) open() on a file such that a filesystem must flag the bio

I had considered writing something to detect FADV_SEQUENTIAL/FADV_NOREUSE 
or `ionice -c3` on a process hitting bcache and modifying 
check_should_bypass()/should_writeback() to behave as such.

However, just because FADV_SEQUENTIAL is flagged doesn't mean the cache 
should bypass.  Filesystems can fragment, and while the file being read 
may be read sequentially, the blocks on which it resides may not be.  
Same thing for higher-level block devices such as dm-thinp where one might 
sequentially read a thin volume but its _tdata might not be in linear 
order.  This may imply that we need a new way to flag cache bypass from 
userspace that is neither io-priority nor fadvise driven.

So what are our options?  What might be the best way to do this?

If fadvise is the better option, how can a block device driver lookup the 
fadvise advice from a given bio struct?  Can we add an FADV_NOSSD flag 
since FADV_SEQUENTIAL may be insufficent?  Are FADV_NOREUSE/FADV_DONTNEED 
reasonable candidates?

Perhaps ionice could be used used, but the concept of "priority" 
doesn't exactly encompass the concept of cache-bypass---so is something 
else needed?

Other ideas?  


--
Eric Wheeler

[RESEND PATCH 2/2] ARM: cache-l2x0.c: Do not clear bit 23 in prefetch control register

2016-07-28 Thread Andrey Smirnov

As per L2C-310 TRM[1]:

"... You can control this feature using bits 30,27 and 23 of the
Prefetch Control Register. Bit 23 and 27 are only used if you set bit 30
HIGH..."

which means there is no need to clear bit 23 if bit 30 is being cleared.

[1] 
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0246e/CJAJACBJ.html

Acked-by: Arnd Bergmann 
Signed-off-by: Andrey Smirnov 
---
 arch/arm/mm/cache-l2x0.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 30e2012..12c1ba7 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -715,11 +715,8 @@ static void __init l2c310_fixup(void __iomem *base, u32 
cache_id,
if (revision >= L310_CACHE_ID_RTL_R3P0 &&
revision < L310_CACHE_ID_RTL_R3P2) {
u32 val = l2x0_saved_regs.prefetch_ctrl;
-   /* I don't think bit23 is required here... but iMX6 does so */
-   if (val & (L310_PREFETCH_CTRL_DBL_LINEFILL |
-  L310_PREFETCH_CTRL_DBL_LINEFILL_INCR)) {
-   val &= ~(L310_PREFETCH_CTRL_DBL_LINEFILL |
-L310_PREFETCH_CTRL_DBL_LINEFILL_INCR);
+   if (val & L310_PREFETCH_CTRL_DBL_LINEFILL) {
+   val &= ~L310_PREFETCH_CTRL_DBL_LINEFILL;
l2x0_saved_regs.prefetch_ctrl = val;
errata[n++] = "752271";
}
-- 
2.5.5

[RESEND PATCH 1/2] ARM: cache-l2x0.c: Replace magic numbers

2016-07-28 Thread Andrey Smirnov

Replace magic numbers used for L310 Prefetch Control Register

Acked-by: Arnd Bergmann 
Signed-off-by: Andrey Smirnov 
---
 arch/arm/mm/cache-l2x0.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 9f9d542..30e2012 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -716,8 +716,10 @@ static void __init l2c310_fixup(void __iomem *base, u32 
cache_id,
revision < L310_CACHE_ID_RTL_R3P2) {
u32 val = l2x0_saved_regs.prefetch_ctrl;
/* I don't think bit23 is required here... but iMX6 does so */
-   if (val & (BIT(30) | BIT(23))) {
-   val &= ~(BIT(30) | BIT(23));
+   if (val & (L310_PREFETCH_CTRL_DBL_LINEFILL |
+  L310_PREFETCH_CTRL_DBL_LINEFILL_INCR)) {
+   val &= ~(L310_PREFETCH_CTRL_DBL_LINEFILL |
+L310_PREFETCH_CTRL_DBL_LINEFILL_INCR);
l2x0_saved_regs.prefetch_ctrl = val;
errata[n++] = "752271";
}
-- 
2.5.5

[RESEND PATCH 2/2] ARM: cache-l2x0.c: Do not clear bit 23 in prefetch control register

2016-07-28 Thread Andrey Smirnov

As per L2C-310 TRM[1]:

"... You can control this feature using bits 30,27 and 23 of the
Prefetch Control Register. Bit 23 and 27 are only used if you set bit 30
HIGH..."

which means there is no need to clear bit 23 if bit 30 is being cleared.

[1] 
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0246e/CJAJACBJ.html

Acked-by: Arnd Bergmann 
Signed-off-by: Andrey Smirnov 
---
 arch/arm/mm/cache-l2x0.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 30e2012..12c1ba7 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -715,11 +715,8 @@ static void __init l2c310_fixup(void __iomem *base, u32 
cache_id,
if (revision >= L310_CACHE_ID_RTL_R3P0 &&
revision < L310_CACHE_ID_RTL_R3P2) {
u32 val = l2x0_saved_regs.prefetch_ctrl;
-   /* I don't think bit23 is required here... but iMX6 does so */
-   if (val & (L310_PREFETCH_CTRL_DBL_LINEFILL |
-  L310_PREFETCH_CTRL_DBL_LINEFILL_INCR)) {
-   val &= ~(L310_PREFETCH_CTRL_DBL_LINEFILL |
-L310_PREFETCH_CTRL_DBL_LINEFILL_INCR);
+   if (val & L310_PREFETCH_CTRL_DBL_LINEFILL) {
+   val &= ~L310_PREFETCH_CTRL_DBL_LINEFILL;
l2x0_saved_regs.prefetch_ctrl = val;
errata[n++] = "752271";
}
-- 
2.5.5

[RESEND PATCH 1/2] ARM: cache-l2x0.c: Replace magic numbers

2016-07-28 Thread Andrey Smirnov

Replace magic numbers used for L310 Prefetch Control Register

Acked-by: Arnd Bergmann 
Signed-off-by: Andrey Smirnov 
---
 arch/arm/mm/cache-l2x0.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 9f9d542..30e2012 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -716,8 +716,10 @@ static void __init l2c310_fixup(void __iomem *base, u32 
cache_id,
revision < L310_CACHE_ID_RTL_R3P2) {
u32 val = l2x0_saved_regs.prefetch_ctrl;
/* I don't think bit23 is required here... but iMX6 does so */
-   if (val & (BIT(30) | BIT(23))) {
-   val &= ~(BIT(30) | BIT(23));
+   if (val & (L310_PREFETCH_CTRL_DBL_LINEFILL |
+  L310_PREFETCH_CTRL_DBL_LINEFILL_INCR)) {
+   val &= ~(L310_PREFETCH_CTRL_DBL_LINEFILL |
+L310_PREFETCH_CTRL_DBL_LINEFILL_INCR);
l2x0_saved_regs.prefetch_ctrl = val;
errata[n++] = "752271";
}
-- 
2.5.5

Re: [PATCH v2 0/3] ntb: Asynchronous NTB devices support

2016-07-28 Thread Serge Semin

Hello, Allen.
Thanks for the message. I see your point. Yes, I've seen a lot of cruel
threads in mailing threads in lkml.org , so it's not my intention to
argue about basic things like Coding Style. That's why I left most of
the warnings discussable. While you a digging into the Patch 1/3, I'll
do my best to fix the checkpatch warnings of the rest of the code. Regarding
the last checkpatch error message, I need to spend some more time to
find a way to set it free of the warnings. I hope I'll come up with
something good, at least I'll give it a try. Otherwise I'll have to
redesign the driver regmap subsystem.(

Regards,
-Sergey

On Thu, Jul 28, 2016 at 10:42:30AM -0400, Allen Hubbe  
wrote:
> From: Serge Semin
> > Please, find the general patchset description in the cover letter of the 
> > first
> > patchset (see the very first message in thread).
> > 
> > Changes in v2:
> >  - Fix sparc64 compilation warning in drivers/ntb/hw/idt/ntb_hw_idt.c :
> >warning: right shift count >= width of type
> >  - Fix sparc64 compilation warnings in drivers/ntb/test/ntb_mw_test.c :
> >warning: right shift count >= width of type
> >warning: cast to pointer from integer of different size
> 
> Thanks for reacting to the test robot so quickly.  Since nobody else has 
> responded yet, I would like to assure you that the patches are not being 
> ignored.  Please be patient.  The IDT driver will be a valuable contribution 
> to the ntb subsystem.  I am working carefully through patch 1/3 first, since 
> it affects existing drivers and interface.
> 
> A word of caution regarding your statement, "There are a some types of 
> checkpatch warnings I left unfixed."  Coding style can be a touchy subject, 
> leading to some recent rants^H^H^H^H^Hdiscussion on some of the same topics 
> that are included in that list of unfixed warnings.  Be prepared to adhere to 
> the style guide, even if it is inconvenient and against your own logic, 
> because that is almost always the easier and more practical approach than 
> asking for changes or exceptions, and better for your mental health not to be 
> on the To: list of something like https://lkml.org/lkml/2016/7/8/625.
> 
> "Of course all of these warnings are discussable, except the last one."  Be 
> prepared, even if it will require significant changes to the code.  For 
> really inconvenient changes, we can talk about other more readily acceptable 
> approaches to keep the code short and elegant, as is obviously your intent.  
> Please be patient with the review.
>

Re: [PATCH v2 0/3] ntb: Asynchronous NTB devices support

2016-07-28 Thread Serge Semin

Hello, Allen.
Thanks for the message. I see your point. Yes, I've seen a lot of cruel
threads in mailing threads in lkml.org , so it's not my intention to
argue about basic things like Coding Style. That's why I left most of
the warnings discussable. While you a digging into the Patch 1/3, I'll
do my best to fix the checkpatch warnings of the rest of the code. Regarding
the last checkpatch error message, I need to spend some more time to
find a way to set it free of the warnings. I hope I'll come up with
something good, at least I'll give it a try. Otherwise I'll have to
redesign the driver regmap subsystem.(

Regards,
-Sergey

On Thu, Jul 28, 2016 at 10:42:30AM -0400, Allen Hubbe  
wrote:
> From: Serge Semin
> > Please, find the general patchset description in the cover letter of the 
> > first
> > patchset (see the very first message in thread).
> > 
> > Changes in v2:
> >  - Fix sparc64 compilation warning in drivers/ntb/hw/idt/ntb_hw_idt.c :
> >warning: right shift count >= width of type
> >  - Fix sparc64 compilation warnings in drivers/ntb/test/ntb_mw_test.c :
> >warning: right shift count >= width of type
> >warning: cast to pointer from integer of different size
> 
> Thanks for reacting to the test robot so quickly.  Since nobody else has 
> responded yet, I would like to assure you that the patches are not being 
> ignored.  Please be patient.  The IDT driver will be a valuable contribution 
> to the ntb subsystem.  I am working carefully through patch 1/3 first, since 
> it affects existing drivers and interface.
> 
> A word of caution regarding your statement, "There are a some types of 
> checkpatch warnings I left unfixed."  Coding style can be a touchy subject, 
> leading to some recent rants^H^H^H^H^Hdiscussion on some of the same topics 
> that are included in that list of unfixed warnings.  Be prepared to adhere to 
> the style guide, even if it is inconvenient and against your own logic, 
> because that is almost always the easier and more practical approach than 
> asking for changes or exceptions, and better for your mental health not to be 
> on the To: list of something like https://lkml.org/lkml/2016/7/8/625.
> 
> "Of course all of these warnings are discussable, except the last one."  Be 
> prepared, even if it will require significant changes to the code.  For 
> really inconvenient changes, we can talk about other more readily acceptable 
> approaches to keep the code short and elegant, as is obviously your intent.  
> Please be patient with the review.
>

RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process

2016-07-28 Thread Li, Liang Z

> On Thu, Jul 28, 2016 at 06:36:18AM +, Li, Liang Z wrote:
> > > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > > How big was the pfn buffer before?
> > > >
> > > > Yes, it is if the max pfn is more than 32GB.
> > > > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's
> > > > too small, and it's the main reason for bad performance.
> > > > Use the max 1MB kmalloc is a balance between performance and
> > > > flexibility, a large page bitmap covers the range of all the
> > > > memory is no good for a system with huge amount of memory. If the
> > > > bitmap is too small, it means we have to traverse a long list for
> > > > many times, and it's bad
> > > for performance.
> > > >
> > > > Thanks!
> > > > Liang
> > >
> > > There are all your implementation decisions though.
> > >
> > > If guest memory is so fragmented that you only have order 0 4k
> > > pages, then allocating a huge 1M contigious chunk is very problematic in
> and of itself.
> > >
> >
> > The memory is allocated in the probe stage. This will not happen if
> > the driver is  loaded when booting the guest.
> >
> > > Most people rarely migrate and do not care how fast that happens.
> > > Wasting a large chunk of memory (and it's zeroed for no good reason,
> > > so you actually request host memory for it) for everyone to speed it
> > > up when it does happen is not really an option.
> > >
> > If people don't plan to do inflating/deflating, they should not enable
> > the virtio-balloon at the beginning, once they decide to use it, the
> > driver should provide better performance as much as possible.
> 
> The reason people inflate/deflate is so they can overcommit memory.
> Do they need to overcommit very quickly? I don't see why.
> So let's get what we can for free but I don't really believe people would want
> to pay for it.
> 
> > 1MB is a very small portion for a VM with more than 32GB memory and
> > it's the *worst case*, for VM with less than 32GB memory, the amount
> > of RAM depends on VM's memory size and will be less than 1MB.
> 
> It's guest memmory so might all be in swap and never touched, your memset
> at probe time will fault it in and make hypervisor actually pay for it.
> 
> > If 1MB is too big, how about 512K, or 256K?  32K seems too small.
> >
> > Liang
> 
> It's only small because it makes you rescan the free list.
> So maybe you should do something else.
> I looked at it a bit. Instead of scanning the free list, how about scanning 
> actual
> page structures? If page is unused, pass it to host.
> Solves the problem of rescanning multiple times, does it not?
> 

Yes, agree.
> 
> Another idea: allocate a small bitmap at probe time (e.g. for deflate), 
> allocate
> a bunch more on each request. Use something like GFP_ATOMIC and a
> scatter/gather, if that fails use the smaller bitmap.
> 

So, the aim of v3 is to use a smaller bitmap without too heavy performance 
penalty.
Thanks a lot!

Liang

RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process

2016-07-28 Thread Li, Liang Z

> On Thu, Jul 28, 2016 at 06:36:18AM +, Li, Liang Z wrote:
> > > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > > How big was the pfn buffer before?
> > > >
> > > > Yes, it is if the max pfn is more than 32GB.
> > > > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's
> > > > too small, and it's the main reason for bad performance.
> > > > Use the max 1MB kmalloc is a balance between performance and
> > > > flexibility, a large page bitmap covers the range of all the
> > > > memory is no good for a system with huge amount of memory. If the
> > > > bitmap is too small, it means we have to traverse a long list for
> > > > many times, and it's bad
> > > for performance.
> > > >
> > > > Thanks!
> > > > Liang
> > >
> > > There are all your implementation decisions though.
> > >
> > > If guest memory is so fragmented that you only have order 0 4k
> > > pages, then allocating a huge 1M contigious chunk is very problematic in
> and of itself.
> > >
> >
> > The memory is allocated in the probe stage. This will not happen if
> > the driver is  loaded when booting the guest.
> >
> > > Most people rarely migrate and do not care how fast that happens.
> > > Wasting a large chunk of memory (and it's zeroed for no good reason,
> > > so you actually request host memory for it) for everyone to speed it
> > > up when it does happen is not really an option.
> > >
> > If people don't plan to do inflating/deflating, they should not enable
> > the virtio-balloon at the beginning, once they decide to use it, the
> > driver should provide better performance as much as possible.
> 
> The reason people inflate/deflate is so they can overcommit memory.
> Do they need to overcommit very quickly? I don't see why.
> So let's get what we can for free but I don't really believe people would want
> to pay for it.
> 
> > 1MB is a very small portion for a VM with more than 32GB memory and
> > it's the *worst case*, for VM with less than 32GB memory, the amount
> > of RAM depends on VM's memory size and will be less than 1MB.
> 
> It's guest memmory so might all be in swap and never touched, your memset
> at probe time will fault it in and make hypervisor actually pay for it.
> 
> > If 1MB is too big, how about 512K, or 256K?  32K seems too small.
> >
> > Liang
> 
> It's only small because it makes you rescan the free list.
> So maybe you should do something else.
> I looked at it a bit. Instead of scanning the free list, how about scanning 
> actual
> page structures? If page is unused, pass it to host.
> Solves the problem of rescanning multiple times, does it not?
> 

Yes, agree.
> 
> Another idea: allocate a small bitmap at probe time (e.g. for deflate), 
> allocate
> a bunch more on each request. Use something like GFP_ATOMIC and a
> scatter/gather, if that fails use the smaller bitmap.
> 

So, the aim of v3 is to use a smaller bitmap without too heavy performance 
penalty.
Thanks a lot!

Liang

RE: [PATCH] mtd: spi-nor: Add at25df321 spi-nor flash support

2016-07-28 Thread Wenyou.Yang



> -Original Message-
> From: Jagan Teki [mailto:jt...@openedev.com]
> Sent: 2016年7月26日 16:38
> To: linux-...@lists.infradead.org
> Cc: David Woodhouse ; linux-kernel@vger.kernel.org;
> Jagan Teki ; Brian Norris
> ; Wenyou Yang 
> Subject: [PATCH] mtd: spi-nor: Add at25df321 spi-nor flash support
> 
> Add Atmel at25df321 spi-nor flash to the list of spi_nor_ids.
> 
> Cc: Brian Norris 
> Cc: Wenyou Yang 
> Signed-off-by: Jagan Teki 

Acked-by Wenyou Yang 

> ---
>  drivers/mtd/spi-nor/spi-nor.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c 
> index
> d0fc165..ec47451 100644
> --- a/drivers/mtd/spi-nor/spi-nor.c
> +++ b/drivers/mtd/spi-nor/spi-nor.c
> @@ -799,6 +799,7 @@ static const struct flash_info spi_nor_ids[] = {
>   { "at25fs040",  INFO(0x1f6604, 0, 64 * 1024,   8, SECT_4K) },
> 
>   { "at25df041a", INFO(0x1f4401, 0, 64 * 1024,   8, SECT_4K) },
> + { "at25df321",  INFO(0x1f4700, 0, 64 * 1024,  64, SECT_4K) },
>   { "at25df321a", INFO(0x1f4701, 0, 64 * 1024,  64, SECT_4K) },
>   { "at25df641",  INFO(0x1f4800, 0, 64 * 1024, 128, SECT_4K) },
> 
> --
> 2.7.4


Best Regards,
Wenyou Yang

RE: [PATCH] mtd: spi-nor: Add at25df321 spi-nor flash support

2016-07-28 Thread Wenyou.Yang



> -Original Message-
> From: Jagan Teki [mailto:jt...@openedev.com]
> Sent: 2016年7月26日 16:38
> To: linux-...@lists.infradead.org
> Cc: David Woodhouse ; linux-kernel@vger.kernel.org;
> Jagan Teki ; Brian Norris
> ; Wenyou Yang 
> Subject: [PATCH] mtd: spi-nor: Add at25df321 spi-nor flash support
> 
> Add Atmel at25df321 spi-nor flash to the list of spi_nor_ids.
> 
> Cc: Brian Norris 
> Cc: Wenyou Yang 
> Signed-off-by: Jagan Teki 

Acked-by Wenyou Yang 

> ---
>  drivers/mtd/spi-nor/spi-nor.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c 
> index
> d0fc165..ec47451 100644
> --- a/drivers/mtd/spi-nor/spi-nor.c
> +++ b/drivers/mtd/spi-nor/spi-nor.c
> @@ -799,6 +799,7 @@ static const struct flash_info spi_nor_ids[] = {
>   { "at25fs040",  INFO(0x1f6604, 0, 64 * 1024,   8, SECT_4K) },
> 
>   { "at25df041a", INFO(0x1f4401, 0, 64 * 1024,   8, SECT_4K) },
> + { "at25df321",  INFO(0x1f4700, 0, 64 * 1024,  64, SECT_4K) },
>   { "at25df321a", INFO(0x1f4701, 0, 64 * 1024,  64, SECT_4K) },
>   { "at25df641",  INFO(0x1f4800, 0, 64 * 1024, 128, SECT_4K) },
> 
> --
> 2.7.4


Best Regards,
Wenyou Yang

RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process

2016-07-28 Thread Li, Liang Z

> On Thu, Jul 28, 2016 at 03:06:37AM +, Li, Liang Z wrote:
> > > > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page
> > > > +bitmap
> > > > + * to prevent a very large page bitmap, there are two reasons for this:
> > > > + * 1) to save memory.
> > > > + * 2) allocate a large bitmap may fail.
> > > > + *
> > > > + * The actual limit of pfn is determined by:
> > > > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > > > + *
> > > > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we
> > > > +will scan
> > > > + * the page list and send the PFNs with several times. To reduce
> > > > +the
> > > > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > > > +should
> > > > + * be set with a value which can cover most cases.
> > >
> > > So what if it covers 1/32 of the memory? We'll do 32 exits and not
> > > 1, still not a big deal for a big guest.
> > >
> >
> > The issue here is the overhead is too high for scanning the page list for 32
> times.
> > Limit the page bitmap size to a fixed value is better for a big guest?
> >
> 
> I'd say avoid scanning free lists completely. Scan pages themselves and check
> the refcount to see whether they are free.
> This way each page needs to be tested once.
> 
> And skip the whole optimization if less than e.g. 10% is free.

That's better than rescanning the free list. Will change in next version.

Thanks!

Liang

RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process

2016-07-28 Thread Li, Liang Z

> On Thu, Jul 28, 2016 at 03:06:37AM +, Li, Liang Z wrote:
> > > > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page
> > > > +bitmap
> > > > + * to prevent a very large page bitmap, there are two reasons for this:
> > > > + * 1) to save memory.
> > > > + * 2) allocate a large bitmap may fail.
> > > > + *
> > > > + * The actual limit of pfn is determined by:
> > > > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > > > + *
> > > > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we
> > > > +will scan
> > > > + * the page list and send the PFNs with several times. To reduce
> > > > +the
> > > > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > > > +should
> > > > + * be set with a value which can cover most cases.
> > >
> > > So what if it covers 1/32 of the memory? We'll do 32 exits and not
> > > 1, still not a big deal for a big guest.
> > >
> >
> > The issue here is the overhead is too high for scanning the page list for 32
> times.
> > Limit the page bitmap size to a fixed value is better for a big guest?
> >
> 
> I'd say avoid scanning free lists completely. Scan pages themselves and check
> the refcount to see whether they are free.
> This way each page needs to be tested once.
> 
> And skip the whole optimization if less than e.g. 10% is free.

That's better than rescanning the free list. Will change in next version.

Thanks!

Liang

[PATCH 0/3] objtool warning fixes

2016-07-28 Thread Josh Poimboeuf

Fixes for some objtool-related warnings reported by Linus.

Josh Poimboeuf (3):
  objtool: support new gcc 6 switch jump table pattern
  objtool: resync x86 instruction decoder with the kernel's
  objtool: un-capitalize "Warning" for out-of-sync instruction decoder

 tools/objtool/Makefile|   2 +-
 tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk |  11 +-
 tools/objtool/arch/x86/insn/inat.h|  17 +-
 tools/objtool/arch/x86/insn/insn.c|  18 +-
 tools/objtool/arch/x86/insn/insn.h|  12 +-
 tools/objtool/arch/x86/insn/x86-opcode-map.txt| 263 ++
 tools/objtool/builtin-check.c | 140 +++-
 7 files changed, 309 insertions(+), 154 deletions(-)

-- 
2.7.4

[PATCH 2/3] objtool: resync x86 instruction decoder with the kernel's

2016-07-28 Thread Josh Poimboeuf

This fixes the following warning:

  Warning: objtool: x86 instruction decoder differs from kernel

Unfortunately we have three identical copies of the x86 instruction
decoder in the kernel tree that have to be manually kept in sync.

It's on my TODO list to at least library-ize the ones in the tools
subdir so we'd only have two of them instead of three.  In the meantime,
here's another manual sync.

Fixes: c61f4d5ebaf0 ("perf tools: Add AVX-512 support to the instruction 
decoder used by Intel PT")
Reported-by: Linus Torvalds 
Signed-off-by: Josh Poimboeuf 
---
 tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk |  11 +-
 tools/objtool/arch/x86/insn/inat.h|  17 +-
 tools/objtool/arch/x86/insn/insn.c|  18 +-
 tools/objtool/arch/x86/insn/insn.h|  12 +-
 tools/objtool/arch/x86/insn/x86-opcode-map.txt| 263 ++
 5 files changed, 220 insertions(+), 101 deletions(-)

diff --git a/tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk 
b/tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk
index 093a892..a3d2c62 100644
--- a/tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk
+++ b/tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk
@@ -72,12 +72,14 @@ BEGIN {
lprefix_expr = "\\((66|F2|F3)\\)"
max_lprefix = 4
 
-   # All opcodes starting with lower-case 'v' or with (v1) superscript
+   # All opcodes starting with lower-case 'v', 'k' or with (v1) superscript
# accepts VEX prefix
-   vexok_opcode_expr = "^v.*"
+   vexok_opcode_expr = "^[vk].*"
vexok_expr = "\\(v1\\)"
# All opcodes with (v) superscript supports *only* VEX prefix
vexonly_expr = "\\(v\\)"
+   # All opcodes with (ev) superscript supports *only* EVEX prefix
+   evexonly_expr = "\\(ev\\)"
 
prefix_expr = "\\(Prefix\\)"
prefix_num["Operand-Size"] = "INAT_PFX_OPNDSZ"
@@ -95,6 +97,7 @@ BEGIN {
prefix_num["Address-Size"] = "INAT_PFX_ADDRSZ"
prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
+   prefix_num["EVEX"] = "INAT_PFX_EVEX"
 
clear_vars()
 }
@@ -319,7 +322,9 @@ function convert_operands(count,opnd,   i,j,imm,mod)
flags = add_flags(flags, "INAT_MODRM")
 
# check VEX codes
-   if (match(ext, vexonly_expr))
+   if (match(ext, evexonly_expr))
+   flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY")
+   else if (match(ext, vexonly_expr))
flags = add_flags(flags, "INAT_VEXOK | INAT_VEXONLY")
else if (match(ext, vexok_expr) || match(opcode, 
vexok_opcode_expr))
flags = add_flags(flags, "INAT_VEXOK")
diff --git a/tools/objtool/arch/x86/insn/inat.h 
b/tools/objtool/arch/x86/insn/inat.h
index 611645e..125ecd2 100644
--- a/tools/objtool/arch/x86/insn/inat.h
+++ b/tools/objtool/arch/x86/insn/inat.h
@@ -48,6 +48,7 @@
 /* AVX VEX prefixes */
 #define INAT_PFX_VEX2  13  /* 2-bytes VEX prefix */
 #define INAT_PFX_VEX3  14  /* 3-bytes VEX prefix */
+#define INAT_PFX_EVEX  15  /* EVEX prefix */
 
 #define INAT_LSTPFX_MAX3
 #define INAT_LGCPFX_MAX11
@@ -89,6 +90,7 @@
 #define INAT_VARIANT   (1 << (INAT_FLAG_OFFS + 4))
 #define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY   (1 << (INAT_FLAG_OFFS + 6))
+#define INAT_EVEXONLY  (1 << (INAT_FLAG_OFFS + 7))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)  (pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)  (esc << INAT_ESC_OFFS)
@@ -141,7 +143,13 @@ static inline int inat_last_prefix_id(insn_attr_t attr)
 static inline int inat_is_vex_prefix(insn_attr_t attr)
 {
attr &= INAT_PFX_MASK;
-   return attr == INAT_PFX_VEX2 || attr == INAT_PFX_VEX3;
+   return attr == INAT_PFX_VEX2 || attr == INAT_PFX_VEX3 ||
+  attr == INAT_PFX_EVEX;
+}
+
+static inline int inat_is_evex_prefix(insn_attr_t attr)
+{
+   return (attr & INAT_PFX_MASK) == INAT_PFX_EVEX;
 }
 
 static inline int inat_is_vex3_prefix(insn_attr_t attr)
@@ -216,6 +224,11 @@ static inline int inat_accept_vex(insn_attr_t attr)
 
 static inline int inat_must_vex(insn_attr_t attr)
 {
-   return attr & INAT_VEXONLY;
+   return attr & (INAT_VEXONLY | INAT_EVEXONLY);
+}
+
+static inline int inat_must_evex(insn_attr_t attr)
+{
+   return attr & INAT_EVEXONLY;
 }
 #endif
diff --git a/tools/objtool/arch/x86/insn/insn.c 
b/tools/objtool/arch/x86/insn/insn.c
index 9f26eae..ca983e2 100644
--- a/tools/objtool/arch/x86/insn/insn.c
+++ b/tools/objtool/arch/x86/insn/insn.c
@@ -155,14 +155,24 @@ found:
/*
 * In 32-bits mode, if the [7:6] bits (mod bits of
 * ModRM) on the second byte are not 11b, it is
-* LDS or LES.
+*

[PATCH 0/3] objtool warning fixes

2016-07-28 Thread Josh Poimboeuf

Fixes for some objtool-related warnings reported by Linus.

Josh Poimboeuf (3):
  objtool: support new gcc 6 switch jump table pattern
  objtool: resync x86 instruction decoder with the kernel's
  objtool: un-capitalize "Warning" for out-of-sync instruction decoder

 tools/objtool/Makefile|   2 +-
 tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk |  11 +-
 tools/objtool/arch/x86/insn/inat.h|  17 +-
 tools/objtool/arch/x86/insn/insn.c|  18 +-
 tools/objtool/arch/x86/insn/insn.h|  12 +-
 tools/objtool/arch/x86/insn/x86-opcode-map.txt| 263 ++
 tools/objtool/builtin-check.c | 140 +++-
 7 files changed, 309 insertions(+), 154 deletions(-)

-- 
2.7.4

[PATCH 2/3] objtool: resync x86 instruction decoder with the kernel's

2016-07-28 Thread Josh Poimboeuf

This fixes the following warning:

  Warning: objtool: x86 instruction decoder differs from kernel

Unfortunately we have three identical copies of the x86 instruction
decoder in the kernel tree that have to be manually kept in sync.

It's on my TODO list to at least library-ize the ones in the tools
subdir so we'd only have two of them instead of three.  In the meantime,
here's another manual sync.

Fixes: c61f4d5ebaf0 ("perf tools: Add AVX-512 support to the instruction 
decoder used by Intel PT")
Reported-by: Linus Torvalds 
Signed-off-by: Josh Poimboeuf 
---
 tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk |  11 +-
 tools/objtool/arch/x86/insn/inat.h|  17 +-
 tools/objtool/arch/x86/insn/insn.c|  18 +-
 tools/objtool/arch/x86/insn/insn.h|  12 +-
 tools/objtool/arch/x86/insn/x86-opcode-map.txt| 263 ++
 5 files changed, 220 insertions(+), 101 deletions(-)

diff --git a/tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk 
b/tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk
index 093a892..a3d2c62 100644
--- a/tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk
+++ b/tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk
@@ -72,12 +72,14 @@ BEGIN {
lprefix_expr = "\\((66|F2|F3)\\)"
max_lprefix = 4
 
-   # All opcodes starting with lower-case 'v' or with (v1) superscript
+   # All opcodes starting with lower-case 'v', 'k' or with (v1) superscript
# accepts VEX prefix
-   vexok_opcode_expr = "^v.*"
+   vexok_opcode_expr = "^[vk].*"
vexok_expr = "\\(v1\\)"
# All opcodes with (v) superscript supports *only* VEX prefix
vexonly_expr = "\\(v\\)"
+   # All opcodes with (ev) superscript supports *only* EVEX prefix
+   evexonly_expr = "\\(ev\\)"
 
prefix_expr = "\\(Prefix\\)"
prefix_num["Operand-Size"] = "INAT_PFX_OPNDSZ"
@@ -95,6 +97,7 @@ BEGIN {
prefix_num["Address-Size"] = "INAT_PFX_ADDRSZ"
prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
+   prefix_num["EVEX"] = "INAT_PFX_EVEX"
 
clear_vars()
 }
@@ -319,7 +322,9 @@ function convert_operands(count,opnd,   i,j,imm,mod)
flags = add_flags(flags, "INAT_MODRM")
 
# check VEX codes
-   if (match(ext, vexonly_expr))
+   if (match(ext, evexonly_expr))
+   flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY")
+   else if (match(ext, vexonly_expr))
flags = add_flags(flags, "INAT_VEXOK | INAT_VEXONLY")
else if (match(ext, vexok_expr) || match(opcode, 
vexok_opcode_expr))
flags = add_flags(flags, "INAT_VEXOK")
diff --git a/tools/objtool/arch/x86/insn/inat.h 
b/tools/objtool/arch/x86/insn/inat.h
index 611645e..125ecd2 100644
--- a/tools/objtool/arch/x86/insn/inat.h
+++ b/tools/objtool/arch/x86/insn/inat.h
@@ -48,6 +48,7 @@
 /* AVX VEX prefixes */
 #define INAT_PFX_VEX2  13  /* 2-bytes VEX prefix */
 #define INAT_PFX_VEX3  14  /* 3-bytes VEX prefix */
+#define INAT_PFX_EVEX  15  /* EVEX prefix */
 
 #define INAT_LSTPFX_MAX3
 #define INAT_LGCPFX_MAX11
@@ -89,6 +90,7 @@
 #define INAT_VARIANT   (1 << (INAT_FLAG_OFFS + 4))
 #define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY   (1 << (INAT_FLAG_OFFS + 6))
+#define INAT_EVEXONLY  (1 << (INAT_FLAG_OFFS + 7))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)  (pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)  (esc << INAT_ESC_OFFS)
@@ -141,7 +143,13 @@ static inline int inat_last_prefix_id(insn_attr_t attr)
 static inline int inat_is_vex_prefix(insn_attr_t attr)
 {
attr &= INAT_PFX_MASK;
-   return attr == INAT_PFX_VEX2 || attr == INAT_PFX_VEX3;
+   return attr == INAT_PFX_VEX2 || attr == INAT_PFX_VEX3 ||
+  attr == INAT_PFX_EVEX;
+}
+
+static inline int inat_is_evex_prefix(insn_attr_t attr)
+{
+   return (attr & INAT_PFX_MASK) == INAT_PFX_EVEX;
 }
 
 static inline int inat_is_vex3_prefix(insn_attr_t attr)
@@ -216,6 +224,11 @@ static inline int inat_accept_vex(insn_attr_t attr)
 
 static inline int inat_must_vex(insn_attr_t attr)
 {
-   return attr & INAT_VEXONLY;
+   return attr & (INAT_VEXONLY | INAT_EVEXONLY);
+}
+
+static inline int inat_must_evex(insn_attr_t attr)
+{
+   return attr & INAT_EVEXONLY;
 }
 #endif
diff --git a/tools/objtool/arch/x86/insn/insn.c 
b/tools/objtool/arch/x86/insn/insn.c
index 9f26eae..ca983e2 100644
--- a/tools/objtool/arch/x86/insn/insn.c
+++ b/tools/objtool/arch/x86/insn/insn.c
@@ -155,14 +155,24 @@ found:
/*
 * In 32-bits mode, if the [7:6] bits (mod bits of
 * ModRM) on the second byte are not 11b, it is
-* LDS or LES.
+* LDS or LES or BOUND.
 */

[PATCH 3/3] objtool: un-capitalize "Warning" for out-of-sync instruction decoder

2016-07-28 Thread Josh Poimboeuf

Change "Warning" to "warning" to make it look more like a gcc warning.
Hopefully that will be enough to help the 0-day bot or other automated
tools catch this warning earlier before it ends up in Linus's tree.

Reported-by: Linus Torvalds 
Reviewed-by: Josh Poimboeuf 
---
 tools/objtool/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile
index 0b43770..041b493 100644
--- a/tools/objtool/Makefile
+++ b/tools/objtool/Makefile
@@ -51,7 +51,7 @@ $(OBJTOOL): $(LIBSUBCMD) $(OBJTOOL_IN)
diff -I'^#include' arch/x86/insn/insn.h 
../../arch/x86/include/asm/insn.h >/dev/null && \
diff -I'^#include' arch/x86/insn/inat.h 
../../arch/x86/include/asm/inat.h >/dev/null && \
diff -I'^#include' arch/x86/insn/inat_types.h 
../../arch/x86/include/asm/inat_types.h >/dev/null) \
-   || echo "Warning: objtool: x86 instruction decoder differs from kernel" 
>&2 )) || true
+   || echo "warning: objtool: x86 instruction decoder differs from kernel" 
>&2 )) || true
$(QUIET_LINK)$(CC) $(OBJTOOL_IN) $(LDFLAGS) -o $@
 
 
-- 
2.7.4

[PATCH 3/3] objtool: un-capitalize "Warning" for out-of-sync instruction decoder

2016-07-28 Thread Josh Poimboeuf

Change "Warning" to "warning" to make it look more like a gcc warning.
Hopefully that will be enough to help the 0-day bot or other automated
tools catch this warning earlier before it ends up in Linus's tree.

Reported-by: Linus Torvalds 
Reviewed-by: Josh Poimboeuf 
---
 tools/objtool/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile
index 0b43770..041b493 100644
--- a/tools/objtool/Makefile
+++ b/tools/objtool/Makefile
@@ -51,7 +51,7 @@ $(OBJTOOL): $(LIBSUBCMD) $(OBJTOOL_IN)
diff -I'^#include' arch/x86/insn/insn.h 
../../arch/x86/include/asm/insn.h >/dev/null && \
diff -I'^#include' arch/x86/insn/inat.h 
../../arch/x86/include/asm/inat.h >/dev/null && \
diff -I'^#include' arch/x86/insn/inat_types.h 
../../arch/x86/include/asm/inat_types.h >/dev/null) \
-   || echo "Warning: objtool: x86 instruction decoder differs from kernel" 
>&2 )) || true
+   || echo "warning: objtool: x86 instruction decoder differs from kernel" 
>&2 )) || true
$(QUIET_LINK)$(CC) $(OBJTOOL_IN) $(LDFLAGS) -o $@
 
 
-- 
2.7.4

[PATCH 1/3] objtool: support new gcc 6 switch jump table pattern

2016-07-28 Thread Josh Poimboeuf

This fixes some false positive objtool warnings seen with gcc 6.1.1:

  kernel/trace/ring_buffer.o: warning: objtool: ring_buffer_read_page()+0x36c: 
sibling call from callable instruction with changed frame pointer
  arch/x86/kernel/reboot.o: warning: objtool: 
native_machine_emergency_restart()+0x139: sibling call from callable 
instruction with changed frame pointer
  lib/xz/xz_dec_stream.o: warning: objtool: xz_dec_run()+0xc2: sibling call 
from callable instruction with changed frame pointer

With gcc 6, a new code pattern is sometimes used to access a switch
statement jump table in .rodata, which objtool doesn't yet recognize:

  mov [rodata addr],%reg1
  ... some instructions ...
  jmpq *(%reg1,%reg2,8)

Add support for detecting that pattern.  The detection code is rather
crude, but it's still effective at weeding out false positives and
catching real warnings.  It can be refined later once objtool starts
reading DWARF CFI.

Reported-by: Linus Torvalds 
Signed-off-by: Josh Poimboeuf 
---
 tools/objtool/builtin-check.c | 140 ++
 1 file changed, 88 insertions(+), 52 deletions(-)

diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c
index 17fa7fc..8d1296a 100644
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -107,6 +107,12 @@ static struct instruction *next_insn_same_sec(struct 
objtool_file *file,
insn->offset < func->offset + func->len;\
 insn = list_next_entry(insn, list))
 
+#define func_for_each_insn_continue_reverse(file, func, insn)  \
+   for (insn = list_prev_entry(insn, list);\
+>list != >insn_list && \
+   insn->sec == func->sec && insn->offset >= func->offset; \
+insn = list_prev_entry(insn, list))
+
 #define sec_for_each_insn_from(file, insn) \
for (; insn; insn = next_insn_same_sec(file, insn))
 
@@ -664,65 +670,95 @@ static int add_switch_table(struct objtool_file *file, 
struct symbol *func,
return 0;
 }
 
-static int add_func_switch_tables(struct objtool_file *file,
- struct symbol *func)
+/*
+ * find_switch_table() - Given a dynamic jump, find the switch jump table in
+ * .rodata associated with it.
+ *
+ * There are 3 basic patterns:
+ *
+ * 1. jmpq *[rodata addr](,%reg,8)
+ *
+ *This is the most common case by far.  It jumps to an address in a simple
+ *jump table which is stored in .rodata.
+ *
+ * 2. jmpq *[rodata addr](%rip)
+ *
+ *This is caused by a rare gcc quirk, currently only seen in three driver
+ *functions in the kernel, only with certain obscure non-distro configs.
+ *
+ *As part of an optimization, gcc makes a copy of an existing switch jump
+ *table, modifies it, and then hard-codes the jump (albeit with an indirect
+ *jump) to use a single entry in the table.  The rest of the jump table and
+ *some of its jump targets remain as dead code.
+ *
+ *In such a case we can just crudely ignore all unreachable instruction
+ *warnings for the entire object file.  Ideally we would just ignore them
+ *for the function, but that would require redesigning the code quite a
+ *bit.  And honestly that's just not worth doing: unreachable instruction
+ *warnings are of questionable value anyway, and this is such a rare issue.
+ *
+ * 3. mov [rodata addr],%reg1
+ *... some instructions ...
+ *jmpq *(%reg1,%reg2,8)
+ *
+ *This is a fairly uncommon pattern which is new for gcc 6.  As of this
+ *writing, there are 11 occurrences of it in the allmodconfig kernel.
+ *
+ *TODO: Once we have DWARF CFI and smarter instruction decoding logic,
+ *ensure the same register is used in the mov and jump instructions.
+ */
+static struct rela *find_switch_table(struct objtool_file *file,
+ struct symbol *func,
+ struct instruction *insn)
 {
-   struct instruction *insn, *prev_jump;
-   struct rela *text_rela, *rodata_rela, *prev_rela = NULL;
-   int ret;
+   struct rela *text_rela, *rodata_rela;
 
-   prev_jump = NULL;
+   text_rela = find_rela_by_dest_range(insn->sec, insn->offset, insn->len);
+   if (text_rela && text_rela->sym == file->rodata->sym) {
+   /* case 1 */
+   rodata_rela = find_rela_by_dest(file->rodata,
+   text_rela->addend);
+   if (rodata_rela)
+   return rodata_rela;
 
-   func_for_each_insn(file, func, insn) {
-   if (insn->type != INSN_JUMP_DYNAMIC)
-   continue;
+   /* case 2 */
+   rodata_rela = find_rela_by_dest(file->rodata,
+   text_rela->addend +

[PATCH 1/3] objtool: support new gcc 6 switch jump table pattern

2016-07-28 Thread Josh Poimboeuf

This fixes some false positive objtool warnings seen with gcc 6.1.1:

  kernel/trace/ring_buffer.o: warning: objtool: ring_buffer_read_page()+0x36c: 
sibling call from callable instruction with changed frame pointer
  arch/x86/kernel/reboot.o: warning: objtool: 
native_machine_emergency_restart()+0x139: sibling call from callable 
instruction with changed frame pointer
  lib/xz/xz_dec_stream.o: warning: objtool: xz_dec_run()+0xc2: sibling call 
from callable instruction with changed frame pointer

With gcc 6, a new code pattern is sometimes used to access a switch
statement jump table in .rodata, which objtool doesn't yet recognize:

  mov [rodata addr],%reg1
  ... some instructions ...
  jmpq *(%reg1,%reg2,8)

Add support for detecting that pattern.  The detection code is rather
crude, but it's still effective at weeding out false positives and
catching real warnings.  It can be refined later once objtool starts
reading DWARF CFI.

Reported-by: Linus Torvalds 
Signed-off-by: Josh Poimboeuf 
---
 tools/objtool/builtin-check.c | 140 ++
 1 file changed, 88 insertions(+), 52 deletions(-)

diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c
index 17fa7fc..8d1296a 100644
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -107,6 +107,12 @@ static struct instruction *next_insn_same_sec(struct 
objtool_file *file,
insn->offset < func->offset + func->len;\
 insn = list_next_entry(insn, list))
 
+#define func_for_each_insn_continue_reverse(file, func, insn)  \
+   for (insn = list_prev_entry(insn, list);\
+>list != >insn_list && \
+   insn->sec == func->sec && insn->offset >= func->offset; \
+insn = list_prev_entry(insn, list))
+
 #define sec_for_each_insn_from(file, insn) \
for (; insn; insn = next_insn_same_sec(file, insn))
 
@@ -664,65 +670,95 @@ static int add_switch_table(struct objtool_file *file, 
struct symbol *func,
return 0;
 }
 
-static int add_func_switch_tables(struct objtool_file *file,
- struct symbol *func)
+/*
+ * find_switch_table() - Given a dynamic jump, find the switch jump table in
+ * .rodata associated with it.
+ *
+ * There are 3 basic patterns:
+ *
+ * 1. jmpq *[rodata addr](,%reg,8)
+ *
+ *This is the most common case by far.  It jumps to an address in a simple
+ *jump table which is stored in .rodata.
+ *
+ * 2. jmpq *[rodata addr](%rip)
+ *
+ *This is caused by a rare gcc quirk, currently only seen in three driver
+ *functions in the kernel, only with certain obscure non-distro configs.
+ *
+ *As part of an optimization, gcc makes a copy of an existing switch jump
+ *table, modifies it, and then hard-codes the jump (albeit with an indirect
+ *jump) to use a single entry in the table.  The rest of the jump table and
+ *some of its jump targets remain as dead code.
+ *
+ *In such a case we can just crudely ignore all unreachable instruction
+ *warnings for the entire object file.  Ideally we would just ignore them
+ *for the function, but that would require redesigning the code quite a
+ *bit.  And honestly that's just not worth doing: unreachable instruction
+ *warnings are of questionable value anyway, and this is such a rare issue.
+ *
+ * 3. mov [rodata addr],%reg1
+ *... some instructions ...
+ *jmpq *(%reg1,%reg2,8)
+ *
+ *This is a fairly uncommon pattern which is new for gcc 6.  As of this
+ *writing, there are 11 occurrences of it in the allmodconfig kernel.
+ *
+ *TODO: Once we have DWARF CFI and smarter instruction decoding logic,
+ *ensure the same register is used in the mov and jump instructions.
+ */
+static struct rela *find_switch_table(struct objtool_file *file,
+ struct symbol *func,
+ struct instruction *insn)
 {
-   struct instruction *insn, *prev_jump;
-   struct rela *text_rela, *rodata_rela, *prev_rela = NULL;
-   int ret;
+   struct rela *text_rela, *rodata_rela;
 
-   prev_jump = NULL;
+   text_rela = find_rela_by_dest_range(insn->sec, insn->offset, insn->len);
+   if (text_rela && text_rela->sym == file->rodata->sym) {
+   /* case 1 */
+   rodata_rela = find_rela_by_dest(file->rodata,
+   text_rela->addend);
+   if (rodata_rela)
+   return rodata_rela;
 
-   func_for_each_insn(file, func, insn) {
-   if (insn->type != INSN_JUMP_DYNAMIC)
-   continue;
+   /* case 2 */
+   rodata_rela = find_rela_by_dest(file->rodata,
+   text_rela->addend + 4);
+   if (!rodata_rela)
+

Re: [PATCH 1/3] Add a new field to struct shrinker

2016-07-28 Thread Dave Chinner

On Thu, Jul 28, 2016 at 11:25:13AM +0100, Mel Gorman wrote:
> On Thu, Jul 28, 2016 at 03:49:47PM +1000, Dave Chinner wrote:
> > Seems you're all missing the obvious.
> > 
> > Add a tracepoint for a shrinker callback that includes a "name"
> > field, have the shrinker callback fill it out appropriately. e.g
> > in the superblock shrinker:
> > 
> > trace_shrinker_callback(shrinker, shrink_control, sb->s_type->name);
> > 
> 
> That misses capturing the latency of the call unless there is a begin/end
> tracepoint.

Sure, but I didn't see that in the email talking about how to add a
name. Even if it is a requirement, it's not necessary as we've
already got shrinker runtime measurements from the
trace_mm_shrink_slab_start and trace_mm_shrink_slab_end trace
points. With the above callback event, shrinker call runtime is
simply the time between the calls to the same shrinker within
mm_shrink_slab start/end trace points.

We don't need tracepoint to measure everything - we just need enough
tracepoints that we can calculate everything we need by post
processing the trace report, and the above gives you shrinker
runtime latency. You need to look at the tracepoints in the wider
context of the code that is running, not just the individual
tracepoint itself.

IOWs, function runtime is obvious from the pattern of related tracepoints
and their timestamps.  Timing information is in the event traces, so
duration between two known tracepoints is a simple calculation.

[0.0023]mm_shrink_slab_start:   shrinker 0xblah 
[0.0025]shrinker_callback:  shrinker 0xblah name xfs
.   [xfs events ignored]
[0.0043]shrinker_callback:  shrinker 0xblah name xfs
.   [xfs events ignored]
[0.0176]shrinker_callback:  shrinker 0xblah name xfs
.   [xfs events ignored]
[0.0178]mm_shrink_slab_end: shrinker 0xblah .

Now run awk to grab the '/shrinker 0xblah/ {  } ' - That
information contains everything you need to calculate shrinker
runtime. i.e.  It ran 3 times, taking 1.8ms, 13ms and 0.2ms on each
of the calls.

That's exactly how I work out timings of various operations in XFS.
e.g. how long a specific metadata IO has taken, how long IO
completion has been queued on the endio workqueue before it got
processed, how long a process waited on a buffer lock, etc. Pick
your specific tracepoints from the haystack, post process with
grep/awk/sed/python to find the needle.

If you need more specific information than a tracepoint can give
you, then you can either add more tracepoints or craft a custom
tracer function to drill deeper.  Almost no-one will need anything
more than knowing what shrinker is running, as most shrinkers are
quite simple. Those that are more complex have their own internal
tracepoints that will tell you exactly where and why it is stalling
without the need for custom tracers

> I was aware of the function graph tracer but I don't know how
> to convince that to give the following information;
>
> 1. The length of time spent in a given function
> 2. The tracepoint information that might explain why the stall occurred
> 
> Take the compaction tracepoint for example
> 
> trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
> cc->free_pfn, end_pfn, sync);
> 
>   ...
> 
>   trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
> cc->free_pfn, end_pfn, sync, ret);
> 
> The function graph tracer can say that X time is compact_zone() but it
> cannot distinguish between a short time spent in that function because
> compaction_suitable == false or compaction simply finished quickly.

That information (i.e. value of compaction_suitable) should be in
the trace_mm_compaction_end() tracepoint, then. If you need context
information to make sense of the tracepoint then it should be in the
tracepoint.

> My understanding was the point of the tracepoints was to get detailed
> information on points where the kernel is known to stall for long periods
> of time.

First I've heard that's what tracepoints are supposed to be used
for. They are just debugging information points in the code and can
be used for any purpose you need as a developer

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [PATCH 1/3] Add a new field to struct shrinker

2016-07-28 Thread Dave Chinner

On Thu, Jul 28, 2016 at 11:25:13AM +0100, Mel Gorman wrote:
> On Thu, Jul 28, 2016 at 03:49:47PM +1000, Dave Chinner wrote:
> > Seems you're all missing the obvious.
> > 
> > Add a tracepoint for a shrinker callback that includes a "name"
> > field, have the shrinker callback fill it out appropriately. e.g
> > in the superblock shrinker:
> > 
> > trace_shrinker_callback(shrinker, shrink_control, sb->s_type->name);
> > 
> 
> That misses capturing the latency of the call unless there is a begin/end
> tracepoint.

Sure, but I didn't see that in the email talking about how to add a
name. Even if it is a requirement, it's not necessary as we've
already got shrinker runtime measurements from the
trace_mm_shrink_slab_start and trace_mm_shrink_slab_end trace
points. With the above callback event, shrinker call runtime is
simply the time between the calls to the same shrinker within
mm_shrink_slab start/end trace points.

We don't need tracepoint to measure everything - we just need enough
tracepoints that we can calculate everything we need by post
processing the trace report, and the above gives you shrinker
runtime latency. You need to look at the tracepoints in the wider
context of the code that is running, not just the individual
tracepoint itself.

IOWs, function runtime is obvious from the pattern of related tracepoints
and their timestamps.  Timing information is in the event traces, so
duration between two known tracepoints is a simple calculation.

[0.0023]mm_shrink_slab_start:   shrinker 0xblah 
[0.0025]shrinker_callback:  shrinker 0xblah name xfs
.   [xfs events ignored]
[0.0043]shrinker_callback:  shrinker 0xblah name xfs
.   [xfs events ignored]
[0.0176]shrinker_callback:  shrinker 0xblah name xfs
.   [xfs events ignored]
[0.0178]mm_shrink_slab_end: shrinker 0xblah .

Now run awk to grab the '/shrinker 0xblah/ {  } ' - That
information contains everything you need to calculate shrinker
runtime. i.e.  It ran 3 times, taking 1.8ms, 13ms and 0.2ms on each
of the calls.

That's exactly how I work out timings of various operations in XFS.
e.g. how long a specific metadata IO has taken, how long IO
completion has been queued on the endio workqueue before it got
processed, how long a process waited on a buffer lock, etc. Pick
your specific tracepoints from the haystack, post process with
grep/awk/sed/python to find the needle.

If you need more specific information than a tracepoint can give
you, then you can either add more tracepoints or craft a custom
tracer function to drill deeper.  Almost no-one will need anything
more than knowing what shrinker is running, as most shrinkers are
quite simple. Those that are more complex have their own internal
tracepoints that will tell you exactly where and why it is stalling
without the need for custom tracers

> I was aware of the function graph tracer but I don't know how
> to convince that to give the following information;
>
> 1. The length of time spent in a given function
> 2. The tracepoint information that might explain why the stall occurred
> 
> Take the compaction tracepoint for example
> 
> trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
> cc->free_pfn, end_pfn, sync);
> 
>   ...
> 
>   trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
> cc->free_pfn, end_pfn, sync, ret);
> 
> The function graph tracer can say that X time is compact_zone() but it
> cannot distinguish between a short time spent in that function because
> compaction_suitable == false or compaction simply finished quickly.

That information (i.e. value of compaction_suitable) should be in
the trace_mm_compaction_end() tracepoint, then. If you need context
information to make sense of the tracepoint then it should be in the
tracepoint.

> My understanding was the point of the tracepoints was to get detailed
> information on points where the kernel is known to stall for long periods
> of time.

First I've heard that's what tracepoints are supposed to be used
for. They are just debugging information points in the code and can
be used for any purpose you need as a developer

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [Intel-gfx] [PATCH v4 0/6] Finally fix watermarks

2016-07-28 Thread Matt Roper

This is completely untested (and probably horribly broken/buggy), but
here's a quick mockup of the general approach I was thinking for
ensuring DDB & WM's can be updated together while ensuring the
three-step pipe flushing process is honored:

https://github.com/mattrope/kernel/commits/experimental/lyude_ddb

Basically the idea is to take note of what's happening to the pipe's DDB
allocation (shrinking, growing, unchanged, etc.) during the atomic check
phase; then during the commit phase, we loop over the CRTC's three times
instead of just once, but only operate on a subset of the CRTC's in each
loop.  While operating on each CRTC, the plane, WM, and DDB all get
programmed together and have a single flush for all three.



Matt

On Tue, Jul 26, 2016 at 01:34:36PM -0400, Lyude wrote:
> Latest version of https://lkml.org/lkml/2016/7/26/290 . Resending the whole
> thing to keep it in one place.
> 
> Lyude (5):
>   drm/i915/skl: Add support for the SAGV, fix underrun hangs
>   drm/i915/skl: Only flush pipes when we change the ddb allocation
>   drm/i915/skl: Fix extra whitespace in skl_flush_wm_values()
>   drm/i915/skl: Update plane watermarks atomically during plane updates
>   drm/i915/skl: Always wait for pipes to update after a flush
> 
> Matt Roper (1):
>   drm/i915/gen9: Only copy WM results for changed pipes to skl_hw
> 
>  drivers/gpu/drm/i915/i915_drv.h  |   3 +
>  drivers/gpu/drm/i915/i915_reg.h  |   5 +
>  drivers/gpu/drm/i915/intel_display.c |  24 
>  drivers/gpu/drm/i915/intel_drv.h |   4 +
>  drivers/gpu/drm/i915/intel_pm.c  | 240 
> +++
>  drivers/gpu/drm/i915/intel_sprite.c  |   2 +
>  6 files changed, 255 insertions(+), 23 deletions(-)
> 
> -- 
> 2.7.4
> 
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Matt Roper
Graphics Software Engineer
IoTG Platform Enabling & Development
Intel Corporation
(916) 356-2795

Re: [Intel-gfx] [PATCH v4 0/6] Finally fix watermarks

2016-07-28 Thread Matt Roper

This is completely untested (and probably horribly broken/buggy), but
here's a quick mockup of the general approach I was thinking for
ensuring DDB & WM's can be updated together while ensuring the
three-step pipe flushing process is honored:

https://github.com/mattrope/kernel/commits/experimental/lyude_ddb

Basically the idea is to take note of what's happening to the pipe's DDB
allocation (shrinking, growing, unchanged, etc.) during the atomic check
phase; then during the commit phase, we loop over the CRTC's three times
instead of just once, but only operate on a subset of the CRTC's in each
loop.  While operating on each CRTC, the plane, WM, and DDB all get
programmed together and have a single flush for all three.



Matt

On Tue, Jul 26, 2016 at 01:34:36PM -0400, Lyude wrote:
> Latest version of https://lkml.org/lkml/2016/7/26/290 . Resending the whole
> thing to keep it in one place.
> 
> Lyude (5):
>   drm/i915/skl: Add support for the SAGV, fix underrun hangs
>   drm/i915/skl: Only flush pipes when we change the ddb allocation
>   drm/i915/skl: Fix extra whitespace in skl_flush_wm_values()
>   drm/i915/skl: Update plane watermarks atomically during plane updates
>   drm/i915/skl: Always wait for pipes to update after a flush
> 
> Matt Roper (1):
>   drm/i915/gen9: Only copy WM results for changed pipes to skl_hw
> 
>  drivers/gpu/drm/i915/i915_drv.h  |   3 +
>  drivers/gpu/drm/i915/i915_reg.h  |   5 +
>  drivers/gpu/drm/i915/intel_display.c |  24 
>  drivers/gpu/drm/i915/intel_drv.h |   4 +
>  drivers/gpu/drm/i915/intel_pm.c  | 240 
> +++
>  drivers/gpu/drm/i915/intel_sprite.c  |   2 +
>  6 files changed, 255 insertions(+), 23 deletions(-)
> 
> -- 
> 2.7.4
> 
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Matt Roper
Graphics Software Engineer
IoTG Platform Enabling & Development
Intel Corporation
(916) 356-2795

Re: [BUG -next] "random: make /dev/urandom scalable for silly userspace programs" causes crash

2016-07-28 Thread Tony Luck

On Thu, Jul 28, 2016 at 6:41 AM, Theodore Ts'o  wrote:
> On Thu, Jul 28, 2016 at 09:24:08AM +0200, Heiko Carstens wrote:
>>
>> Oh, I just realized that Linus pulled your changes. Actually I was hoping
>> we could get this fixed before the broken code would be merged.
>> Could you please make sure the bug fix gets included as soon as possible?
>
> Yes, I'll send the pull request to ASAP.

Also broke ia64.  Same fix works for me.

Tested-by: Tony Luck

Re: [BUG -next] "random: make /dev/urandom scalable for silly userspace programs" causes crash

2016-07-28 Thread Tony Luck

On Thu, Jul 28, 2016 at 6:41 AM, Theodore Ts'o  wrote:
> On Thu, Jul 28, 2016 at 09:24:08AM +0200, Heiko Carstens wrote:
>>
>> Oh, I just realized that Linus pulled your changes. Actually I was hoping
>> we could get this fixed before the broken code would be merged.
>> Could you please make sure the bug fix gets included as soon as possible?
>
> Yes, I'll send the pull request to ASAP.

Also broke ia64.  Same fix works for me.

Tested-by: Tony Luck

Re: [PATCH v4 2/2] input: add ADC resistor ladder driver

2016-07-28 Thread Dmitry Torokhov

On Fri, Jul 29, 2016 at 12:49:21AM +0200, Alexandre Belloni wrote:
> On 28/07/2016 at 15:09:18 -0700, Dmitry Torokhov wrote :
> > On Tue, Jul 12, 2016 at 05:41:50PM -0700, Dmitry Torokhov wrote:
> > > Hi Alexandre,
> > > 
> > > On Tue, Jul 12, 2016 at 09:36:26PM +0200, Alexandre Belloni wrote:
> > > > A common way of multiplexing buttons on a single input in cheap devices 
> > > > is
> > > > to use a resistor ladder on an ADC. This driver supports that 
> > > > configuration
> > > > by polling an ADC channel provided by IIO.
> > > 
> > > This looks quite reasonable, just a few small comments.
> > 
> > Ping.
> > 
> > Did the version below work for you? I am trying sweep in the brand new
> > drivers before we close merge window.
> > 
> 
> I was thinking it was too late for 4.8. I've juste tested and it works

Nah, it is OK - the dirver is brand-new so we can't possibly regress
anyone ;)

> fine. However, I just sent v5 including the exact same changes but
> changing mvolt to millivolt as that seemed preferred by Rob. Both
> versions haver been tested and are fine for me.

Let's see if he acks the binding then...

Thanks.

-- 
Dmitry

Re: [PATCH v4 2/2] input: add ADC resistor ladder driver

2016-07-28 Thread Dmitry Torokhov

On Fri, Jul 29, 2016 at 12:49:21AM +0200, Alexandre Belloni wrote:
> On 28/07/2016 at 15:09:18 -0700, Dmitry Torokhov wrote :
> > On Tue, Jul 12, 2016 at 05:41:50PM -0700, Dmitry Torokhov wrote:
> > > Hi Alexandre,
> > > 
> > > On Tue, Jul 12, 2016 at 09:36:26PM +0200, Alexandre Belloni wrote:
> > > > A common way of multiplexing buttons on a single input in cheap devices 
> > > > is
> > > > to use a resistor ladder on an ADC. This driver supports that 
> > > > configuration
> > > > by polling an ADC channel provided by IIO.
> > > 
> > > This looks quite reasonable, just a few small comments.
> > 
> > Ping.
> > 
> > Did the version below work for you? I am trying sweep in the brand new
> > drivers before we close merge window.
> > 
> 
> I was thinking it was too late for 4.8. I've juste tested and it works

Nah, it is OK - the dirver is brand-new so we can't possibly regress
anyone ;)

> fine. However, I just sent v5 including the exact same changes but
> changing mvolt to millivolt as that seemed preferred by Rob. Both
> versions haver been tested and are fine for me.

Let's see if he acks the binding then...

Thanks.

-- 
Dmitry

Re: [PATCH v2 05/13] gpu: ipu-v3: Add IDMA channel linking support

2016-07-28 Thread Steve Longerbeam




On 07/26/2016 03:06 AM, Philipp Zabel wrote:

Am Dienstag, den 19.07.2016, 18:11 -0700 schrieb Steve Longerbeam:

Adds functions to link and unlink IDMAC source channels to sink
channels.

So far the following links are supported:

IPUV3_CHANNEL_IC_PRP_ENC_MEM -> IPUV3_CHANNEL_MEM_ROT_ENC
PUV3_CHANNEL_IC_PRP_VF_MEM   -> IPUV3_CHANNEL_MEM_ROT_VF
IPUV3_CHANNEL_IC_PP_MEM  -> IPUV3_CHANNEL_MEM_ROT_PP

More links can be added to the idmac_link_info[] array.

Signed-off-by: Steve Longerbeam 

This patch looks good to me, but the Frame Synchronisation Unit supports
also linking the internal channels, not only the IDMAC channels.

[...]

+++ b/drivers/gpu/ipu-v3/ipu-common.c

[...]

+static const struct idmac_link_info idmac_link_info[] = {
+   {
+   .src  = { IPUV3_CHANNEL_IC_PRP_ENC_MEM, IPU_FS_PROC_FLOW1,
+ 0, 4, 7 },
+   .sink = { IPUV3_CHANNEL_MEM_ROT_ENC, IPU_FS_PROC_FLOW2,
+ 0, 4, 1 },
+   }, {
+   .src =  { IPUV3_CHANNEL_IC_PRP_VF_MEM, IPU_FS_PROC_FLOW1,
+ 8, 4, 8 },
+   .sink = { IPUV3_CHANNEL_MEM_ROT_VF, IPU_FS_PROC_FLOW2,
+ 4, 4, 1 },
+   }, {
+   .src =  { IPUV3_CHANNEL_IC_PP_MEM, IPU_FS_PROC_FLOW1,
+ 16, 4, 5 },
+   .sink = { IPUV3_CHANNEL_MEM_ROT_PP, IPU_FS_PROC_FLOW2,
+ 12, 4, 3 },
+   },
+};

How about adding new (internal) channel numbers for the CSI->VDI link
and having something like:

{
.src =  { IPUV3_CHANNEL_CSI_DIRECT, IPU_FS_PROC_FLOW1,
  FS_VDI_SRC_SEL_OFFSET, 2, 1 },
.sink = { IPUV3_CHANNEL_VDI_CUR, 0, 0, 0 },
},

instead of ipu_set_vdi_src_mux? Then in addition to

[...]

+int ipu_idmac_link(struct ipuv3_channel *src, struct ipuv3_channel *sink)

We could have a lower level

ipu_fsu_link(int channel1, int channel2)

which could be called like

ipu_fsu_link(IPUV3_CHANNEL_CSI_DIRECT, IPUV3_CHANNEL_VDI_CUR);

Come to think of it, could we replace shift,bits,sel with mask,value and
add #defines for the FSU bit fields and values? I'm thinking:

/* FS_PROC_FLOW1 */
#define FS_PRPENC_ROT_SRC_SEL_MASK  (0xf << 0)
#define FS_PRPENC_ROT_SRC_SEL_ENC   (0x7 << 0)
#define FS_PRPVF_ROT_SRC_SEL_MASK   (0xf << 8)
#define FS_PRPVF_ROT_SRC_SEL_VF (0x8 << 8)
#define FS_PP_SRC_SEL_MASK  (0xf << 12)
#define FS_PP_ROT_SRC_SEL_MASK  (0xf << 16)
#define FS_PP_ROT_SRC_SEL_PP(0x5 << 16)
#define FS_VDI1_SRC_SEL_MASK(0x3 << 20)
#define FS_VDI3_SRC_SEL_MASK(0x3 << 20)
#define FS_PRP_SRC_SEL_MASK (0xf << 24)
#define FS_VDI_SRC_SEL_MASK (0x3 << 28)

/* FS_PROC_FLOW2 */
#define FS_PRP_ENC_DEST_SEL_MASK(0xf << 0)
#define FS_PRP_ENC_DEST_SEL_IRT_ENC (0x1 << 0)
#define FS_PRPVF_DEST_SEL_MASK  (0xf << 4)
#define FS_PRPVF_DEST_SEL_IRT_VF(0x1 << 4)
#define FS_PRPVF_ROT_DEST_SEL_MASK  (0xf << 8)
#define FS_PP_DEST_SEL_MASK (0xf << 12)
#define FS_PP_DEST_SEL_IRT_PP   (0x3 << 12)
#define FS_PP_ROT_DEST_SEL_MASK (0xf << 16)
#define FS_PRPENC_ROT_DEST_SEL_MASK (0xf << 20)
#define FS_PRP_DEST_SEL_MASK(0xf << 24)

struct idmac_link_reg_info {
int chno;
u32 reg;
u32 mask;
u32 val;
};

static const struct idmac_link_info idmac_link_info[] = {
{
.src  = { IPUV3_CHANNEL_IC_PRP_ENC_MEM, IPU_FS_PROC_FLOW1,
  FS_PRPENC_ROT_SRC_SEL_MASK, FS_PRPENC_ROT_SRC_SEL_ENC 
},
.sink = { IPUV3_CHANNEL_MEM_ROT_ENC, IPU_FS_PROC_FLOW2,
  FS_PRP_ENC_DEST_SEL_MASK, FS_PRP_ENC_DEST_SEL_IRT_ENC 
},
}, {
.src =  { IPUV3_CHANNEL_IC_PRP_VF_MEM, IPU_FS_PROC_FLOW1,
  FS_PRPVF_ROT_SRC_SEL_MASK, FS_PRPVF_ROT_SRC_SEL_VF },
.sink = { IPUV3_CHANNEL_MEM_ROT_VF, IPU_FS_PROC_FLOW2,
  FS_PRPVF_DEST_SEL_MASK, FS_PRPVF_DEST_SEL_IRT_VF },
}, {
.src =  { IPUV3_CHANNEL_IC_PP_MEM, IPU_FS_PROC_FLOW1,
  FS_PP_ROT_SRC_SEL_MASK, FS_PP_ROT_SRC_SEL_PP },
.sink = { IPUV3_CHANNEL_MEM_ROT_PP, IPU_FS_PROC_FLOW2,
  FS_PP_DEST_SEL_MASK, FS_PP_DEST_SEL_IRT_PP },
},
};


Hi Philipp,

Great ideas. I'll work on this for next version, or shall I let you? 
Should we create an

ipu-fsu.c ?

Steve

Re: [PATCH v2 05/13] gpu: ipu-v3: Add IDMA channel linking support

2016-07-28 Thread Steve Longerbeam




On 07/26/2016 03:06 AM, Philipp Zabel wrote:

Am Dienstag, den 19.07.2016, 18:11 -0700 schrieb Steve Longerbeam:

Adds functions to link and unlink IDMAC source channels to sink
channels.

So far the following links are supported:

IPUV3_CHANNEL_IC_PRP_ENC_MEM -> IPUV3_CHANNEL_MEM_ROT_ENC
PUV3_CHANNEL_IC_PRP_VF_MEM   -> IPUV3_CHANNEL_MEM_ROT_VF
IPUV3_CHANNEL_IC_PP_MEM  -> IPUV3_CHANNEL_MEM_ROT_PP

More links can be added to the idmac_link_info[] array.

Signed-off-by: Steve Longerbeam 

This patch looks good to me, but the Frame Synchronisation Unit supports
also linking the internal channels, not only the IDMAC channels.

[...]

+++ b/drivers/gpu/ipu-v3/ipu-common.c

[...]

+static const struct idmac_link_info idmac_link_info[] = {
+   {
+   .src  = { IPUV3_CHANNEL_IC_PRP_ENC_MEM, IPU_FS_PROC_FLOW1,
+ 0, 4, 7 },
+   .sink = { IPUV3_CHANNEL_MEM_ROT_ENC, IPU_FS_PROC_FLOW2,
+ 0, 4, 1 },
+   }, {
+   .src =  { IPUV3_CHANNEL_IC_PRP_VF_MEM, IPU_FS_PROC_FLOW1,
+ 8, 4, 8 },
+   .sink = { IPUV3_CHANNEL_MEM_ROT_VF, IPU_FS_PROC_FLOW2,
+ 4, 4, 1 },
+   }, {
+   .src =  { IPUV3_CHANNEL_IC_PP_MEM, IPU_FS_PROC_FLOW1,
+ 16, 4, 5 },
+   .sink = { IPUV3_CHANNEL_MEM_ROT_PP, IPU_FS_PROC_FLOW2,
+ 12, 4, 3 },
+   },
+};

How about adding new (internal) channel numbers for the CSI->VDI link
and having something like:

{
.src =  { IPUV3_CHANNEL_CSI_DIRECT, IPU_FS_PROC_FLOW1,
  FS_VDI_SRC_SEL_OFFSET, 2, 1 },
.sink = { IPUV3_CHANNEL_VDI_CUR, 0, 0, 0 },
},

instead of ipu_set_vdi_src_mux? Then in addition to

[...]

+int ipu_idmac_link(struct ipuv3_channel *src, struct ipuv3_channel *sink)

We could have a lower level

ipu_fsu_link(int channel1, int channel2)

which could be called like

ipu_fsu_link(IPUV3_CHANNEL_CSI_DIRECT, IPUV3_CHANNEL_VDI_CUR);

Come to think of it, could we replace shift,bits,sel with mask,value and
add #defines for the FSU bit fields and values? I'm thinking:

/* FS_PROC_FLOW1 */
#define FS_PRPENC_ROT_SRC_SEL_MASK  (0xf << 0)
#define FS_PRPENC_ROT_SRC_SEL_ENC   (0x7 << 0)
#define FS_PRPVF_ROT_SRC_SEL_MASK   (0xf << 8)
#define FS_PRPVF_ROT_SRC_SEL_VF (0x8 << 8)
#define FS_PP_SRC_SEL_MASK  (0xf << 12)
#define FS_PP_ROT_SRC_SEL_MASK  (0xf << 16)
#define FS_PP_ROT_SRC_SEL_PP(0x5 << 16)
#define FS_VDI1_SRC_SEL_MASK(0x3 << 20)
#define FS_VDI3_SRC_SEL_MASK(0x3 << 20)
#define FS_PRP_SRC_SEL_MASK (0xf << 24)
#define FS_VDI_SRC_SEL_MASK (0x3 << 28)

/* FS_PROC_FLOW2 */
#define FS_PRP_ENC_DEST_SEL_MASK(0xf << 0)
#define FS_PRP_ENC_DEST_SEL_IRT_ENC (0x1 << 0)
#define FS_PRPVF_DEST_SEL_MASK  (0xf << 4)
#define FS_PRPVF_DEST_SEL_IRT_VF(0x1 << 4)
#define FS_PRPVF_ROT_DEST_SEL_MASK  (0xf << 8)
#define FS_PP_DEST_SEL_MASK (0xf << 12)
#define FS_PP_DEST_SEL_IRT_PP   (0x3 << 12)
#define FS_PP_ROT_DEST_SEL_MASK (0xf << 16)
#define FS_PRPENC_ROT_DEST_SEL_MASK (0xf << 20)
#define FS_PRP_DEST_SEL_MASK(0xf << 24)

struct idmac_link_reg_info {
int chno;
u32 reg;
u32 mask;
u32 val;
};

static const struct idmac_link_info idmac_link_info[] = {
{
.src  = { IPUV3_CHANNEL_IC_PRP_ENC_MEM, IPU_FS_PROC_FLOW1,
  FS_PRPENC_ROT_SRC_SEL_MASK, FS_PRPENC_ROT_SRC_SEL_ENC 
},
.sink = { IPUV3_CHANNEL_MEM_ROT_ENC, IPU_FS_PROC_FLOW2,
  FS_PRP_ENC_DEST_SEL_MASK, FS_PRP_ENC_DEST_SEL_IRT_ENC 
},
}, {
.src =  { IPUV3_CHANNEL_IC_PRP_VF_MEM, IPU_FS_PROC_FLOW1,
  FS_PRPVF_ROT_SRC_SEL_MASK, FS_PRPVF_ROT_SRC_SEL_VF },
.sink = { IPUV3_CHANNEL_MEM_ROT_VF, IPU_FS_PROC_FLOW2,
  FS_PRPVF_DEST_SEL_MASK, FS_PRPVF_DEST_SEL_IRT_VF },
}, {
.src =  { IPUV3_CHANNEL_IC_PP_MEM, IPU_FS_PROC_FLOW1,
  FS_PP_ROT_SRC_SEL_MASK, FS_PP_ROT_SRC_SEL_PP },
.sink = { IPUV3_CHANNEL_MEM_ROT_PP, IPU_FS_PROC_FLOW2,
  FS_PP_DEST_SEL_MASK, FS_PP_DEST_SEL_IRT_PP },
},
};


Hi Philipp,

Great ideas. I'll work on this for next version, or shall I let you? 
Should we create an

ipu-fsu.c ?

Steve

[PATCH 1/5] ipc/msg: Implement lockless pipelined wakeups

2016-07-28 Thread Davidlohr Bueso

From: Sebastian Andrzej Siewior 

This patch moves the wakeup_process() invocation so it is not done under
the ipc global lock by making use of a lockless wake_q. With this change,
the waiter is woken up once the message has been assigned and it does not
need to loop on SMP if the message points to NULL. In the signal case we
still need to check the pointer under the lock to verify the state.

This change should also avoid the introduction of preempt_disable() in
-RT which avoids a busy-loop which pools for the NULL -> !NULL
change if the waiter has a higher priority compared to the waker.

By making use of wake_qs, the logic of sysv msg queues is greatly
simplified (and very well suited as we can batch lockless wakeups),
particularly around the lockless receive algorithm.

This has been tested with Manred's pmsg-shared tool on a "AMD A10-7800
Radeon R7, 12 Compute Cores 4C+8G":

test |   before   |   after| diff
-|||--
pmsg-shared 8 60 | 19,347,422 | 30,442,191 | + ~57.34 %
pmsg-shared 4 60 | 21,367,197 | 35,743,458 | + ~67.28 %
pmsg-shared 2 60 | 22,884,224 | 24,278,200 | +  ~6.09 %

Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Davidlohr Bueso 
---
 ipc/msg.c | 133 +++---
 1 file changed, 40 insertions(+), 93 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 1471db9a7e61..45ce6a3021be 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -51,13 +51,7 @@ struct msg_receiver {
longr_msgtype;
longr_maxsize;
 
-   /*
-* Mark r_msg volatile so that the compiler
-* does not try to get smart and optimize
-* it. We rely on this for the lockless
-* receive algorithm.
-*/
-   struct msg_msg  *volatile r_msg;
+   struct msg_msg  *r_msg;
 };
 
 /* one msg_sender for each sleeping sender */
@@ -183,21 +177,14 @@ static void ss_wakeup(struct list_head *h, int kill)
}
 }
 
-static void expunge_all(struct msg_queue *msq, int res)
+static void expunge_all(struct msg_queue *msq, int res,
+   struct wake_q_head *wake_q)
 {
struct msg_receiver *msr, *t;
 
list_for_each_entry_safe(msr, t, >q_receivers, r_list) {
-   msr->r_msg = NULL; /* initialize expunge ordering */
-   wake_up_process(msr->r_tsk);
-   /*
-* Ensure that the wakeup is visible before setting r_msg as
-* the receiving end depends on it: either spinning on a nil,
-* or dealing with -EAGAIN cases. See lockless receive part 1
-* and 2 in do_msgrcv().
-*/
-   smp_wmb(); /* barrier (B) */
-   msr->r_msg = ERR_PTR(res);
+   wake_q_add(wake_q, msr->r_tsk);
+   WRITE_ONCE(msr->r_msg, ERR_PTR(res));
}
 }
 
@@ -213,11 +200,13 @@ static void freeque(struct ipc_namespace *ns, struct 
kern_ipc_perm *ipcp)
 {
struct msg_msg *msg, *t;
struct msg_queue *msq = container_of(ipcp, struct msg_queue, q_perm);
+   WAKE_Q(wake_q);
 
-   expunge_all(msq, -EIDRM);
+   expunge_all(msq, -EIDRM, _q);
ss_wakeup(>q_senders, 1);
msg_rmid(ns, msq);
ipc_unlock_object(>q_perm);
+   wake_up_q(_q);
rcu_read_unlock();
 
list_for_each_entry_safe(msg, t, >q_messages, m_list) {
@@ -342,6 +331,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
struct kern_ipc_perm *ipcp;
struct msqid64_ds uninitialized_var(msqid64);
struct msg_queue *msq;
+   WAKE_Q(wake_q);
int err;
 
if (cmd == IPC_SET) {
@@ -389,7 +379,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
/* sleeping receivers might be excluded by
 * stricter permissions.
 */
-   expunge_all(msq, -EAGAIN);
+   expunge_all(msq, -EAGAIN, _q);
/* sleeping senders might be able to send
 * due to a larger queue size.
 */
@@ -402,6 +392,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
 
 out_unlock0:
ipc_unlock_object(>q_perm);
+   wake_up_q(_q);
 out_unlock1:
rcu_read_unlock();
 out_up:
@@ -566,7 +557,8 @@ static int testmsg(struct msg_msg *msg, long type, int mode)
return 0;
 }
 
-static inline int pipelined_send(struct msg_queue *msq, struct msg_msg *msg)
+static inline int pipelined_send(struct msg_queue *msq, struct msg_msg *msg,
+struct wake_q_head *wake_q)
 {
struct msg_receiver *msr, *t;
 
@@ -577,27 +569,14 @@ static inline int pipelined_send(struct msg_queue *msq, 
struct msg_msg *msg)
 
list_del(>r_list);
if

[PATCH 1/5] ipc/msg: Implement lockless pipelined wakeups

2016-07-28 Thread Davidlohr Bueso

From: Sebastian Andrzej Siewior 

This patch moves the wakeup_process() invocation so it is not done under
the ipc global lock by making use of a lockless wake_q. With this change,
the waiter is woken up once the message has been assigned and it does not
need to loop on SMP if the message points to NULL. In the signal case we
still need to check the pointer under the lock to verify the state.

This change should also avoid the introduction of preempt_disable() in
-RT which avoids a busy-loop which pools for the NULL -> !NULL
change if the waiter has a higher priority compared to the waker.

By making use of wake_qs, the logic of sysv msg queues is greatly
simplified (and very well suited as we can batch lockless wakeups),
particularly around the lockless receive algorithm.

This has been tested with Manred's pmsg-shared tool on a "AMD A10-7800
Radeon R7, 12 Compute Cores 4C+8G":

test |   before   |   after| diff
-|||--
pmsg-shared 8 60 | 19,347,422 | 30,442,191 | + ~57.34 %
pmsg-shared 4 60 | 21,367,197 | 35,743,458 | + ~67.28 %
pmsg-shared 2 60 | 22,884,224 | 24,278,200 | +  ~6.09 %

Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Davidlohr Bueso 
---
 ipc/msg.c | 133 +++---
 1 file changed, 40 insertions(+), 93 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 1471db9a7e61..45ce6a3021be 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -51,13 +51,7 @@ struct msg_receiver {
longr_msgtype;
longr_maxsize;
 
-   /*
-* Mark r_msg volatile so that the compiler
-* does not try to get smart and optimize
-* it. We rely on this for the lockless
-* receive algorithm.
-*/
-   struct msg_msg  *volatile r_msg;
+   struct msg_msg  *r_msg;
 };
 
 /* one msg_sender for each sleeping sender */
@@ -183,21 +177,14 @@ static void ss_wakeup(struct list_head *h, int kill)
}
 }
 
-static void expunge_all(struct msg_queue *msq, int res)
+static void expunge_all(struct msg_queue *msq, int res,
+   struct wake_q_head *wake_q)
 {
struct msg_receiver *msr, *t;
 
list_for_each_entry_safe(msr, t, >q_receivers, r_list) {
-   msr->r_msg = NULL; /* initialize expunge ordering */
-   wake_up_process(msr->r_tsk);
-   /*
-* Ensure that the wakeup is visible before setting r_msg as
-* the receiving end depends on it: either spinning on a nil,
-* or dealing with -EAGAIN cases. See lockless receive part 1
-* and 2 in do_msgrcv().
-*/
-   smp_wmb(); /* barrier (B) */
-   msr->r_msg = ERR_PTR(res);
+   wake_q_add(wake_q, msr->r_tsk);
+   WRITE_ONCE(msr->r_msg, ERR_PTR(res));
}
 }
 
@@ -213,11 +200,13 @@ static void freeque(struct ipc_namespace *ns, struct 
kern_ipc_perm *ipcp)
 {
struct msg_msg *msg, *t;
struct msg_queue *msq = container_of(ipcp, struct msg_queue, q_perm);
+   WAKE_Q(wake_q);
 
-   expunge_all(msq, -EIDRM);
+   expunge_all(msq, -EIDRM, _q);
ss_wakeup(>q_senders, 1);
msg_rmid(ns, msq);
ipc_unlock_object(>q_perm);
+   wake_up_q(_q);
rcu_read_unlock();
 
list_for_each_entry_safe(msg, t, >q_messages, m_list) {
@@ -342,6 +331,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
struct kern_ipc_perm *ipcp;
struct msqid64_ds uninitialized_var(msqid64);
struct msg_queue *msq;
+   WAKE_Q(wake_q);
int err;
 
if (cmd == IPC_SET) {
@@ -389,7 +379,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
/* sleeping receivers might be excluded by
 * stricter permissions.
 */
-   expunge_all(msq, -EAGAIN);
+   expunge_all(msq, -EAGAIN, _q);
/* sleeping senders might be able to send
 * due to a larger queue size.
 */
@@ -402,6 +392,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
 
 out_unlock0:
ipc_unlock_object(>q_perm);
+   wake_up_q(_q);
 out_unlock1:
rcu_read_unlock();
 out_up:
@@ -566,7 +557,8 @@ static int testmsg(struct msg_msg *msg, long type, int mode)
return 0;
 }
 
-static inline int pipelined_send(struct msg_queue *msq, struct msg_msg *msg)
+static inline int pipelined_send(struct msg_queue *msq, struct msg_msg *msg,
+struct wake_q_head *wake_q)
 {
struct msg_receiver *msr, *t;
 
@@ -577,27 +569,14 @@ static inline int pipelined_send(struct msg_queue *msq, 
struct msg_msg *msg)
 
list_del(>r_list);
if (msr->r_maxsize < msg->m_ts) {
-

[PATCH 0/5] ipc/msg: Sender/receiver optimizations

2016-07-28 Thread Davidlohr Bueso

Hi,

I'm resending Sebastian's sysv msg queue use of wake_qs but updated
to the last observations I need wrt the need of explicit barriers
after removing the whole receiver busy-looping. After some irc exchange
it seems we're both on the same page, and things now look like he had
them earlier, in v2. This is all patch 1.

The rest of the patches are changes I noticed while reviewing patch 1,
which are mainly sender-side rework/optimizations. Details are in each
changelog.

The changes have survived ltp (which has some nasty corner cases for msgsnd
changes), as well as pmsg-shared benchmark.

Applies on Linus's latest - please consider for v4.9.

Thanks!

  ipc/msg: Implement lockless pipelined wakeups
  ipc/msg: Batch queue sender wakeups
  ipc/msg: Make ss_wakeup() kill arg boolean
  ipc/msg: Lockless security checks for msgsnd
  ipc/msg: Avoid waking sender upon full queue

 ipc/msg.c | 210 ++
 1 file changed, 101 insertions(+), 109 deletions(-)

-- 
2.6.6

[PATCH 0/5] ipc/msg: Sender/receiver optimizations

2016-07-28 Thread Davidlohr Bueso

Hi,

I'm resending Sebastian's sysv msg queue use of wake_qs but updated
to the last observations I need wrt the need of explicit barriers
after removing the whole receiver busy-looping. After some irc exchange
it seems we're both on the same page, and things now look like he had
them earlier, in v2. This is all patch 1.

The rest of the patches are changes I noticed while reviewing patch 1,
which are mainly sender-side rework/optimizations. Details are in each
changelog.

The changes have survived ltp (which has some nasty corner cases for msgsnd
changes), as well as pmsg-shared benchmark.

Applies on Linus's latest - please consider for v4.9.

Thanks!

  ipc/msg: Implement lockless pipelined wakeups
  ipc/msg: Batch queue sender wakeups
  ipc/msg: Make ss_wakeup() kill arg boolean
  ipc/msg: Lockless security checks for msgsnd
  ipc/msg: Avoid waking sender upon full queue

 ipc/msg.c | 210 ++
 1 file changed, 101 insertions(+), 109 deletions(-)

-- 
2.6.6

[PATCH 3/5] ipc/msg: Make ss_wakeup() kill arg boolean

2016-07-28 Thread Davidlohr Bueso

... 'tis annoying.

Signed-off-by: Davidlohr Bueso 
---
 ipc/msg.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 395013d58fda..5181259e2ff0 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -167,7 +167,7 @@ static inline void ss_del(struct msg_sender *mss)
 }
 
 static void ss_wakeup(struct list_head *h,
- struct wake_q_head *wake_q, int kill)
+ struct wake_q_head *wake_q, bool kill)
 {
struct msg_sender *mss, *t;
 
@@ -204,7 +204,7 @@ static void freeque(struct ipc_namespace *ns, struct 
kern_ipc_perm *ipcp)
WAKE_Q(wake_q);
 
expunge_all(msq, -EIDRM, _q);
-   ss_wakeup(>q_senders, _q, 1);
+   ss_wakeup(>q_senders, _q, true);
msg_rmid(ns, msq);
ipc_unlock_object(>q_perm);
wake_up_q(_q);
@@ -388,7 +388,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
 * Sleeping senders might be able to send
 * due to a larger queue size.
 */
-   ss_wakeup(>q_senders, _q, 0);
+   ss_wakeup(>q_senders, _q, false);
ipc_unlock_object(>q_perm);
wake_up_q(_q);
 
@@ -882,7 +882,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, 
long msgtyp, int msgfl
msq->q_cbytes -= msg->m_ts;
atomic_sub(msg->m_ts, >msg_bytes);
atomic_dec(>msg_hdrs);
-   ss_wakeup(>q_senders, _q, 0);
+   ss_wakeup(>q_senders, _q, false);
 
goto out_unlock0;
}
-- 
2.6.6

[PATCH 2/5] ipc/msg: Batch queue sender wakeups

2016-07-28 Thread Davidlohr Bueso

Currently the use of wake_qs in sysv msg queues are only
for the receiver tasks that are blocked on the queue. But
blocked sender tasks (due to queue size constraints) still
are awoken with the ipc object lock held, which can be a
problem particularly for small sized queues and far from
gracious for -rt (just like it was for the receiver side).

The paths that actually wakeup a sender are obviously
related to when we are either getting rid of the queue
or after (some) space is freed-up after a receiver takes
the msg (msgrcv). Furthermore, with the exception of msgrcv,
we can always piggy-back on expunge_all that has its own
tasks lined-up for waking. Finally, upon unlinking the
message, it should be no problem delaying the wakeups a
bit until after we've released the lock.

Signed-off-by: Davidlohr Bueso 
---
 ipc/msg.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 45ce6a3021be..395013d58fda 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -166,14 +166,15 @@ static inline void ss_del(struct msg_sender *mss)
list_del(>list);
 }
 
-static void ss_wakeup(struct list_head *h, int kill)
+static void ss_wakeup(struct list_head *h,
+ struct wake_q_head *wake_q, int kill)
 {
struct msg_sender *mss, *t;
 
list_for_each_entry_safe(mss, t, h, list) {
if (kill)
mss->list.next = NULL;
-   wake_up_process(mss->tsk);
+   wake_q_add(wake_q, mss->tsk);
}
 }
 
@@ -203,7 +204,7 @@ static void freeque(struct ipc_namespace *ns, struct 
kern_ipc_perm *ipcp)
WAKE_Q(wake_q);
 
expunge_all(msq, -EIDRM, _q);
-   ss_wakeup(>q_senders, 1);
+   ss_wakeup(>q_senders, _q, 1);
msg_rmid(ns, msq);
ipc_unlock_object(>q_perm);
wake_up_q(_q);
@@ -331,7 +332,6 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
struct kern_ipc_perm *ipcp;
struct msqid64_ds uninitialized_var(msqid64);
struct msg_queue *msq;
-   WAKE_Q(wake_q);
int err;
 
if (cmd == IPC_SET) {
@@ -362,6 +362,9 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
freeque(ns, ipcp);
goto out_up;
case IPC_SET:
+   {
+   WAKE_Q(wake_q);
+
if (msqid64.msg_qbytes > ns->msg_ctlmnb &&
!capable(CAP_SYS_RESOURCE)) {
err = -EPERM;
@@ -376,15 +379,21 @@ static int msgctl_down(struct ipc_namespace *ns, int 
msqid, int cmd,
msq->q_qbytes = msqid64.msg_qbytes;
 
msq->q_ctime = get_seconds();
-   /* sleeping receivers might be excluded by
+   /*
+* Sleeping receivers might be excluded by
 * stricter permissions.
 */
expunge_all(msq, -EAGAIN, _q);
-   /* sleeping senders might be able to send
+   /*
+* Sleeping senders might be able to send
 * due to a larger queue size.
 */
-   ss_wakeup(>q_senders, 0);
-   break;
+   ss_wakeup(>q_senders, _q, 0);
+   ipc_unlock_object(>q_perm);
+   wake_up_q(_q);
+
+   goto out_unlock1;
+   }
default:
err = -EINVAL;
goto out_unlock1;
@@ -392,7 +401,6 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
 
 out_unlock0:
ipc_unlock_object(>q_perm);
-   wake_up_q(_q);
 out_unlock1:
rcu_read_unlock();
 out_up:
@@ -809,6 +817,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, 
long msgtyp, int msgfl
struct msg_queue *msq;
struct ipc_namespace *ns;
struct msg_msg *msg, *copy = NULL;
+   WAKE_Q(wake_q);
 
ns = current->nsproxy->ipc_ns;
 
@@ -873,7 +882,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, 
long msgtyp, int msgfl
msq->q_cbytes -= msg->m_ts;
atomic_sub(msg->m_ts, >msg_bytes);
atomic_dec(>msg_hdrs);
-   ss_wakeup(>q_senders, 0);
+   ss_wakeup(>q_senders, _q, 0);
 
goto out_unlock0;
}
@@ -945,6 +954,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, 
long msgtyp, int msgfl
 
 out_unlock0:
ipc_unlock_object(>q_perm);
+   wake_up_q(_q);
 out_unlock1:
rcu_read_unlock();
if (IS_ERR(msg)) {
-- 
2.6.6

[PATCH 3/5] ipc/msg: Make ss_wakeup() kill arg boolean

2016-07-28 Thread Davidlohr Bueso

... 'tis annoying.

Signed-off-by: Davidlohr Bueso 
---
 ipc/msg.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 395013d58fda..5181259e2ff0 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -167,7 +167,7 @@ static inline void ss_del(struct msg_sender *mss)
 }
 
 static void ss_wakeup(struct list_head *h,
- struct wake_q_head *wake_q, int kill)
+ struct wake_q_head *wake_q, bool kill)
 {
struct msg_sender *mss, *t;
 
@@ -204,7 +204,7 @@ static void freeque(struct ipc_namespace *ns, struct 
kern_ipc_perm *ipcp)
WAKE_Q(wake_q);
 
expunge_all(msq, -EIDRM, _q);
-   ss_wakeup(>q_senders, _q, 1);
+   ss_wakeup(>q_senders, _q, true);
msg_rmid(ns, msq);
ipc_unlock_object(>q_perm);
wake_up_q(_q);
@@ -388,7 +388,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
 * Sleeping senders might be able to send
 * due to a larger queue size.
 */
-   ss_wakeup(>q_senders, _q, 0);
+   ss_wakeup(>q_senders, _q, false);
ipc_unlock_object(>q_perm);
wake_up_q(_q);
 
@@ -882,7 +882,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, 
long msgtyp, int msgfl
msq->q_cbytes -= msg->m_ts;
atomic_sub(msg->m_ts, >msg_bytes);
atomic_dec(>msg_hdrs);
-   ss_wakeup(>q_senders, _q, 0);
+   ss_wakeup(>q_senders, _q, false);
 
goto out_unlock0;
}
-- 
2.6.6

[PATCH 2/5] ipc/msg: Batch queue sender wakeups

2016-07-28 Thread Davidlohr Bueso

Currently the use of wake_qs in sysv msg queues are only
for the receiver tasks that are blocked on the queue. But
blocked sender tasks (due to queue size constraints) still
are awoken with the ipc object lock held, which can be a
problem particularly for small sized queues and far from
gracious for -rt (just like it was for the receiver side).

The paths that actually wakeup a sender are obviously
related to when we are either getting rid of the queue
or after (some) space is freed-up after a receiver takes
the msg (msgrcv). Furthermore, with the exception of msgrcv,
we can always piggy-back on expunge_all that has its own
tasks lined-up for waking. Finally, upon unlinking the
message, it should be no problem delaying the wakeups a
bit until after we've released the lock.

Signed-off-by: Davidlohr Bueso 
---
 ipc/msg.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 45ce6a3021be..395013d58fda 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -166,14 +166,15 @@ static inline void ss_del(struct msg_sender *mss)
list_del(>list);
 }
 
-static void ss_wakeup(struct list_head *h, int kill)
+static void ss_wakeup(struct list_head *h,
+ struct wake_q_head *wake_q, int kill)
 {
struct msg_sender *mss, *t;
 
list_for_each_entry_safe(mss, t, h, list) {
if (kill)
mss->list.next = NULL;
-   wake_up_process(mss->tsk);
+   wake_q_add(wake_q, mss->tsk);
}
 }
 
@@ -203,7 +204,7 @@ static void freeque(struct ipc_namespace *ns, struct 
kern_ipc_perm *ipcp)
WAKE_Q(wake_q);
 
expunge_all(msq, -EIDRM, _q);
-   ss_wakeup(>q_senders, 1);
+   ss_wakeup(>q_senders, _q, 1);
msg_rmid(ns, msq);
ipc_unlock_object(>q_perm);
wake_up_q(_q);
@@ -331,7 +332,6 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
struct kern_ipc_perm *ipcp;
struct msqid64_ds uninitialized_var(msqid64);
struct msg_queue *msq;
-   WAKE_Q(wake_q);
int err;
 
if (cmd == IPC_SET) {
@@ -362,6 +362,9 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
freeque(ns, ipcp);
goto out_up;
case IPC_SET:
+   {
+   WAKE_Q(wake_q);
+
if (msqid64.msg_qbytes > ns->msg_ctlmnb &&
!capable(CAP_SYS_RESOURCE)) {
err = -EPERM;
@@ -376,15 +379,21 @@ static int msgctl_down(struct ipc_namespace *ns, int 
msqid, int cmd,
msq->q_qbytes = msqid64.msg_qbytes;
 
msq->q_ctime = get_seconds();
-   /* sleeping receivers might be excluded by
+   /*
+* Sleeping receivers might be excluded by
 * stricter permissions.
 */
expunge_all(msq, -EAGAIN, _q);
-   /* sleeping senders might be able to send
+   /*
+* Sleeping senders might be able to send
 * due to a larger queue size.
 */
-   ss_wakeup(>q_senders, 0);
-   break;
+   ss_wakeup(>q_senders, _q, 0);
+   ipc_unlock_object(>q_perm);
+   wake_up_q(_q);
+
+   goto out_unlock1;
+   }
default:
err = -EINVAL;
goto out_unlock1;
@@ -392,7 +401,6 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
 
 out_unlock0:
ipc_unlock_object(>q_perm);
-   wake_up_q(_q);
 out_unlock1:
rcu_read_unlock();
 out_up:
@@ -809,6 +817,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, 
long msgtyp, int msgfl
struct msg_queue *msq;
struct ipc_namespace *ns;
struct msg_msg *msg, *copy = NULL;
+   WAKE_Q(wake_q);
 
ns = current->nsproxy->ipc_ns;
 
@@ -873,7 +882,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, 
long msgtyp, int msgfl
msq->q_cbytes -= msg->m_ts;
atomic_sub(msg->m_ts, >msg_bytes);
atomic_dec(>msg_hdrs);
-   ss_wakeup(>q_senders, 0);
+   ss_wakeup(>q_senders, _q, 0);
 
goto out_unlock0;
}
@@ -945,6 +954,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, 
long msgtyp, int msgfl
 
 out_unlock0:
ipc_unlock_object(>q_perm);
+   wake_up_q(_q);
 out_unlock1:
rcu_read_unlock();
if (IS_ERR(msg)) {
-- 
2.6.6

[PATCH 4/5] ipc/msg: Lockless security checks for msgsnd

2016-07-28 Thread Davidlohr Bueso

Just as with msgrcv (along with the rest of sysvipc since a few
years ago), perform the security checks without holding the ipc
object lock. This also reduces the hogging of the lock for the
entire duration of a sender, as we drop the lock upon every
iteration -- and this is exactly why we also check for racing
with RMID in the first place.

Signed-off-by: Davidlohr Bueso 
---
 ipc/msg.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 5181259e2ff0..fe793304dddb 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -623,14 +623,14 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
goto out_unlock1;
}
 
-   ipc_lock_object(>q_perm);
-
for (;;) {
struct msg_sender s;
 
err = -EACCES;
if (ipcperms(ns, >q_perm, S_IWUGO))
-   goto out_unlock0;
+   goto out_unlock1;
+
+   ipc_lock_object(>q_perm);
 
/* raced with RMID? */
if (!ipc_valid_object(>q_perm)) {
@@ -681,6 +681,7 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
goto out_unlock0;
}
 
+   ipc_unlock_object(>q_perm);
}
msq->q_lspid = task_tgid_vnr(current);
msq->q_stime = get_seconds();
-- 
2.6.6

[PATCH 4/5] ipc/msg: Lockless security checks for msgsnd

2016-07-28 Thread Davidlohr Bueso

Just as with msgrcv (along with the rest of sysvipc since a few
years ago), perform the security checks without holding the ipc
object lock. This also reduces the hogging of the lock for the
entire duration of a sender, as we drop the lock upon every
iteration -- and this is exactly why we also check for racing
with RMID in the first place.

Signed-off-by: Davidlohr Bueso 
---
 ipc/msg.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 5181259e2ff0..fe793304dddb 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -623,14 +623,14 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
goto out_unlock1;
}
 
-   ipc_lock_object(>q_perm);
-
for (;;) {
struct msg_sender s;
 
err = -EACCES;
if (ipcperms(ns, >q_perm, S_IWUGO))
-   goto out_unlock0;
+   goto out_unlock1;
+
+   ipc_lock_object(>q_perm);
 
/* raced with RMID? */
if (!ipc_valid_object(>q_perm)) {
@@ -681,6 +681,7 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
goto out_unlock0;
}
 
+   ipc_unlock_object(>q_perm);
}
msq->q_lspid = task_tgid_vnr(current);
msq->q_stime = get_seconds();
-- 
2.6.6

[PATCH 5/5] ipc/msg: Avoid waking sender upon full queue

2016-07-28 Thread Davidlohr Bueso

Blocked tasks queued in q_senders waiting for their message to
fit in the queue are blindly awoken every time we think there's
a remote chance this might happen. This could cause numerous
(and expensive -- thundering herd-ish) bogus wakeups if the queue
is still really full. Adding to the scheduling cost/overhead,
there's also the fact that we need to take the ipc object lock
and requeue ourselves in the q_senders list.

By keeping track of the blocked sender's message size, we can
know previously if the wakeup ought to occur or not. Otherwise,
to maintain the current wakeup order we just move it to the
tail. This is exactly what occurs right now if the sender needs
to go back to sleep.

The case of EIDRM is left completely untouched, as we need to
wakeup all the tasks, and shouldn't be playing games in the
first place.

This patch was seen to save on the 'msgctl10' ltp testcase
~15% in context switches (avg out of ten runs). Although these
tests are really about functionality (as opposed to performance),
is does show the direct benefits of the optimization.

Signed-off-by: Davidlohr Bueso 
---
 ipc/msg.c | 52 +++-
 1 file changed, 43 insertions(+), 9 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index fe793304dddb..1a77dfacc0d6 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -58,6 +58,7 @@ struct msg_receiver {
 struct msg_sender {
struct list_headlist;
struct task_struct  *tsk;
+   size_t  msgsz;
 };
 
 #define SEARCH_ANY 1
@@ -153,27 +154,60 @@ static int newque(struct ipc_namespace *ns, struct 
ipc_params *params)
return msq->q_perm.id;
 }
 
-static inline void ss_add(struct msg_queue *msq, struct msg_sender *mss)
+static inline bool msg_fits_inqueue(struct msg_queue *msq, size_t msgsz)
+{
+   return msgsz + msq->q_cbytes <= msq->q_qbytes &&
+   1 + msq->q_qnum <= msq->q_qbytes;
+}
+
+static inline void ss_add(struct msg_queue *msq,
+ struct msg_sender *mss, size_t msgsz)
 {
mss->tsk = current;
+   mss->msgsz = msgsz;
__set_current_state(TASK_INTERRUPTIBLE);
list_add_tail(>list, >q_senders);
 }
 
 static inline void ss_del(struct msg_sender *mss)
 {
-   if (mss->list.next != NULL)
+   if (mss->list.next)
list_del(>list);
 }
 
-static void ss_wakeup(struct list_head *h,
+static void ss_wakeup(struct msg_queue *msq,
  struct wake_q_head *wake_q, bool kill)
 {
struct msg_sender *mss, *t;
+   struct task_struct *stop_tsk = NULL;
+   struct list_head *h = >q_senders;
 
list_for_each_entry_safe(mss, t, h, list) {
if (kill)
mss->list.next = NULL;
+
+   /*
+* Stop at the first task we don't wakeup,
+* we've already iterated the original
+* sender queue.
+*/
+   else if (stop_tsk == mss->tsk)
+   break;
+   /*
+* We are not in an EIDRM scenario here, therefore
+* verify that we really need to wakeup the task.
+* To maintain current semantics and wakeup order,
+* move the sender to the tail on behalf of the
+* blocked task.
+*/
+   else if (!msg_fits_inqueue(msq, mss->msgsz)) {
+   if (!stop_tsk)
+   stop_tsk = mss->tsk;
+
+   list_move_tail(>list, >q_senders);
+   continue;
+   }
+
wake_q_add(wake_q, mss->tsk);
}
 }
@@ -204,7 +238,7 @@ static void freeque(struct ipc_namespace *ns, struct 
kern_ipc_perm *ipcp)
WAKE_Q(wake_q);
 
expunge_all(msq, -EIDRM, _q);
-   ss_wakeup(>q_senders, _q, true);
+   ss_wakeup(msq, _q, true);
msg_rmid(ns, msq);
ipc_unlock_object(>q_perm);
wake_up_q(_q);
@@ -388,7 +422,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
 * Sleeping senders might be able to send
 * due to a larger queue size.
 */
-   ss_wakeup(>q_senders, _q, false);
+   ss_wakeup(msq, _q, false);
ipc_unlock_object(>q_perm);
wake_up_q(_q);
 
@@ -642,8 +676,7 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
if (err)
goto out_unlock0;
 
-   if (msgsz + msq->q_cbytes <= msq->q_qbytes &&
-   1 + msq->q_qnum <= msq->q_qbytes) {
+   if (msg_fits_inqueue(msq, msgsz)) {
break;
}
 
@@ -654,7 +687,7 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
}
 
/* enqueue the sender and prepare to block */
-   ss_add(msq,

mmotm 2016-07-28-16-33 uploaded

2016-07-28 Thread akpm

The mm-of-the-moment snapshot 2016-07-28-16-33 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (4.x
or 4.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.

A git tree which contains the memory management portion of this tree is
maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
by Michal Hocko.  It contains the patches which are between the
"#NEXT_PATCHES_START mm" and "#NEXT_PATCHES_END" markers, from the series
file, http://www.ozlabs.org/~akpm/mmotm/series.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/

To develop on top of mmotm git:

  $ git remote add mmotm 
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
  $ git remote update mmotm
  $ git checkout -b topic mmotm/master
  
  $ git send-email mmotm/master.. [...]

To rebase a branch with older patches to a new mmotm release:

  $ git remote update mmotm
  $ git rebase --onto mmotm/master  topic




The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is available at

http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/

and use of this tree is similar to
http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/, described above.


This mmotm tree contains the following patches against 4.7:
(patches marked "*" will be included in linux-next)

  origin.patch
* proc-oom-drop-bogus-task_lock-and-mm-check.patch
* proc-oom-drop-bogus-sighand-lock.patch
* proc-oom_adj-extract-oom_score_adj-setting-into-a-helper.patch
* 
mm-oom_adj-make-sure-processes-sharing-mm-have-same-view-of-oom_score_adj.patch
* mm-oom-skip-vforked-tasks-from-being-selected.patch
* mm-oom-kill-all-tasks-sharing-the-mm.patch
* mm-oom-fortify-task_will_free_mem.patch
* mm-oom-task_will_free_mem-should-skip-oom_reaped-tasks.patch
* mm-oom_reaper-do-not-attempt-to-reap-a-task-more-than-twice.patch
* mm-oom-hide-mm-which-is-shared-with-kthread-or-global-init.patch
* mm-oom-fortify-task_will_free_mem-fix.patch
* mm-update-the-comment-in-__isolate_free_page.patch
* mm-fix-vm-scalability-regression-in-cgroup-aware-workingset-code.patch
* mm-compaction-remove-unnecessary-order-check-in-try_to_compact_pages.patch
* freezer-oom-check-tif_memdie-on-the-correct-task.patch
* cpuset-mm-fix-tif_memdie-check-in-cpuset_change_task_nodemask.patch
* mm-meminit-remove-early_page_nid_uninitialised.patch
* mm-vmstat-add-infrastructure-for-per-node-vmstats.patch
* mm-vmscan-move-lru_lock-to-the-node.patch
* mm-vmscan-move-lru-lists-to-node.patch
* mm-mmzone-clarify-the-usage-of-zone-padding.patch
* mm-vmscan-begin-reclaiming-pages-on-a-per-node-basis.patch
* mm-vmscan-have-kswapd-only-scan-based-on-the-highest-requested-zone.patch
* mm-vmscan-make-kswapd-reclaim-in-terms-of-nodes.patch
* mm-vmscan-remove-balance-gap.patch
* mm-vmscan-simplify-the-logic-deciding-whether-kswapd-sleeps.patch
* mm-vmscan-by-default-have-direct-reclaim-only-shrink-once-per-node.patch
* 
mm-vmscan-remove-duplicate-logic-clearing-node-congestion-and-dirty-state.patch
* mm-vmscan-do-not-reclaim-from-kswapd-if-there-is-any-eligible-zone.patch
* mm-vmscan-make-shrink_node-decisions-more-node-centric.patch
* mm-memcg-move-memcg-limit-enforcement-from-zones-to-nodes.patch
* mm-workingset-make-working-set-detection-node-aware.patch
* mm-page_alloc-consider-dirtyable-memory-in-terms-of-nodes.patch
* mm-move-page-mapped-accounting-to-the-node.patch
* mm-rename-nr_anon_pages-to-nr_anon_mapped.patch
* mm-move-most-file-based-accounting-to-the-node.patch
* mm-move-vmscan-writes-and-file-write-accounting-to-the-node.patch
* mm-vmscan-only-wakeup-kswapd-once-per-node-for-the-requested-classzone.patch
* mm-page_alloc-wake-kswapd-based-on-the-highest-eligible-zone.patch
* mm-convert-zone_reclaim-to-node_reclaim.patch
* mm-vmscan-avoid-passing-in-classzone_idx-unnecessarily-to-shrink_node.patch
*

[PATCH 5/5] ipc/msg: Avoid waking sender upon full queue

2016-07-28 Thread Davidlohr Bueso

Blocked tasks queued in q_senders waiting for their message to
fit in the queue are blindly awoken every time we think there's
a remote chance this might happen. This could cause numerous
(and expensive -- thundering herd-ish) bogus wakeups if the queue
is still really full. Adding to the scheduling cost/overhead,
there's also the fact that we need to take the ipc object lock
and requeue ourselves in the q_senders list.

By keeping track of the blocked sender's message size, we can
know previously if the wakeup ought to occur or not. Otherwise,
to maintain the current wakeup order we just move it to the
tail. This is exactly what occurs right now if the sender needs
to go back to sleep.

The case of EIDRM is left completely untouched, as we need to
wakeup all the tasks, and shouldn't be playing games in the
first place.

This patch was seen to save on the 'msgctl10' ltp testcase
~15% in context switches (avg out of ten runs). Although these
tests are really about functionality (as opposed to performance),
is does show the direct benefits of the optimization.

Signed-off-by: Davidlohr Bueso 
---
 ipc/msg.c | 52 +++-
 1 file changed, 43 insertions(+), 9 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index fe793304dddb..1a77dfacc0d6 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -58,6 +58,7 @@ struct msg_receiver {
 struct msg_sender {
struct list_headlist;
struct task_struct  *tsk;
+   size_t  msgsz;
 };
 
 #define SEARCH_ANY 1
@@ -153,27 +154,60 @@ static int newque(struct ipc_namespace *ns, struct 
ipc_params *params)
return msq->q_perm.id;
 }
 
-static inline void ss_add(struct msg_queue *msq, struct msg_sender *mss)
+static inline bool msg_fits_inqueue(struct msg_queue *msq, size_t msgsz)
+{
+   return msgsz + msq->q_cbytes <= msq->q_qbytes &&
+   1 + msq->q_qnum <= msq->q_qbytes;
+}
+
+static inline void ss_add(struct msg_queue *msq,
+ struct msg_sender *mss, size_t msgsz)
 {
mss->tsk = current;
+   mss->msgsz = msgsz;
__set_current_state(TASK_INTERRUPTIBLE);
list_add_tail(>list, >q_senders);
 }
 
 static inline void ss_del(struct msg_sender *mss)
 {
-   if (mss->list.next != NULL)
+   if (mss->list.next)
list_del(>list);
 }
 
-static void ss_wakeup(struct list_head *h,
+static void ss_wakeup(struct msg_queue *msq,
  struct wake_q_head *wake_q, bool kill)
 {
struct msg_sender *mss, *t;
+   struct task_struct *stop_tsk = NULL;
+   struct list_head *h = >q_senders;
 
list_for_each_entry_safe(mss, t, h, list) {
if (kill)
mss->list.next = NULL;
+
+   /*
+* Stop at the first task we don't wakeup,
+* we've already iterated the original
+* sender queue.
+*/
+   else if (stop_tsk == mss->tsk)
+   break;
+   /*
+* We are not in an EIDRM scenario here, therefore
+* verify that we really need to wakeup the task.
+* To maintain current semantics and wakeup order,
+* move the sender to the tail on behalf of the
+* blocked task.
+*/
+   else if (!msg_fits_inqueue(msq, mss->msgsz)) {
+   if (!stop_tsk)
+   stop_tsk = mss->tsk;
+
+   list_move_tail(>list, >q_senders);
+   continue;
+   }
+
wake_q_add(wake_q, mss->tsk);
}
 }
@@ -204,7 +238,7 @@ static void freeque(struct ipc_namespace *ns, struct 
kern_ipc_perm *ipcp)
WAKE_Q(wake_q);
 
expunge_all(msq, -EIDRM, _q);
-   ss_wakeup(>q_senders, _q, true);
+   ss_wakeup(msq, _q, true);
msg_rmid(ns, msq);
ipc_unlock_object(>q_perm);
wake_up_q(_q);
@@ -388,7 +422,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, 
int cmd,
 * Sleeping senders might be able to send
 * due to a larger queue size.
 */
-   ss_wakeup(>q_senders, _q, false);
+   ss_wakeup(msq, _q, false);
ipc_unlock_object(>q_perm);
wake_up_q(_q);
 
@@ -642,8 +676,7 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
if (err)
goto out_unlock0;
 
-   if (msgsz + msq->q_cbytes <= msq->q_qbytes &&
-   1 + msq->q_qnum <= msq->q_qbytes) {
+   if (msg_fits_inqueue(msq, msgsz)) {
break;
}
 
@@ -654,7 +687,7 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
}
 
/* enqueue the sender and prepare to block */
-   ss_add(msq, );
+

mmotm 2016-07-28-16-33 uploaded

2016-07-28 Thread akpm

The mm-of-the-moment snapshot 2016-07-28-16-33 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (4.x
or 4.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.

A git tree which contains the memory management portion of this tree is
maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
by Michal Hocko.  It contains the patches which are between the
"#NEXT_PATCHES_START mm" and "#NEXT_PATCHES_END" markers, from the series
file, http://www.ozlabs.org/~akpm/mmotm/series.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/

To develop on top of mmotm git:

  $ git remote add mmotm 
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
  $ git remote update mmotm
  $ git checkout -b topic mmotm/master
  
  $ git send-email mmotm/master.. [...]

To rebase a branch with older patches to a new mmotm release:

  $ git remote update mmotm
  $ git rebase --onto mmotm/master  topic




The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is available at

http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/

and use of this tree is similar to
http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/, described above.


This mmotm tree contains the following patches against 4.7:
(patches marked "*" will be included in linux-next)

  origin.patch
* proc-oom-drop-bogus-task_lock-and-mm-check.patch
* proc-oom-drop-bogus-sighand-lock.patch
* proc-oom_adj-extract-oom_score_adj-setting-into-a-helper.patch
* 
mm-oom_adj-make-sure-processes-sharing-mm-have-same-view-of-oom_score_adj.patch
* mm-oom-skip-vforked-tasks-from-being-selected.patch
* mm-oom-kill-all-tasks-sharing-the-mm.patch
* mm-oom-fortify-task_will_free_mem.patch
* mm-oom-task_will_free_mem-should-skip-oom_reaped-tasks.patch
* mm-oom_reaper-do-not-attempt-to-reap-a-task-more-than-twice.patch
* mm-oom-hide-mm-which-is-shared-with-kthread-or-global-init.patch
* mm-oom-fortify-task_will_free_mem-fix.patch
* mm-update-the-comment-in-__isolate_free_page.patch
* mm-fix-vm-scalability-regression-in-cgroup-aware-workingset-code.patch
* mm-compaction-remove-unnecessary-order-check-in-try_to_compact_pages.patch
* freezer-oom-check-tif_memdie-on-the-correct-task.patch
* cpuset-mm-fix-tif_memdie-check-in-cpuset_change_task_nodemask.patch
* mm-meminit-remove-early_page_nid_uninitialised.patch
* mm-vmstat-add-infrastructure-for-per-node-vmstats.patch
* mm-vmscan-move-lru_lock-to-the-node.patch
* mm-vmscan-move-lru-lists-to-node.patch
* mm-mmzone-clarify-the-usage-of-zone-padding.patch
* mm-vmscan-begin-reclaiming-pages-on-a-per-node-basis.patch
* mm-vmscan-have-kswapd-only-scan-based-on-the-highest-requested-zone.patch
* mm-vmscan-make-kswapd-reclaim-in-terms-of-nodes.patch
* mm-vmscan-remove-balance-gap.patch
* mm-vmscan-simplify-the-logic-deciding-whether-kswapd-sleeps.patch
* mm-vmscan-by-default-have-direct-reclaim-only-shrink-once-per-node.patch
* 
mm-vmscan-remove-duplicate-logic-clearing-node-congestion-and-dirty-state.patch
* mm-vmscan-do-not-reclaim-from-kswapd-if-there-is-any-eligible-zone.patch
* mm-vmscan-make-shrink_node-decisions-more-node-centric.patch
* mm-memcg-move-memcg-limit-enforcement-from-zones-to-nodes.patch
* mm-workingset-make-working-set-detection-node-aware.patch
* mm-page_alloc-consider-dirtyable-memory-in-terms-of-nodes.patch
* mm-move-page-mapped-accounting-to-the-node.patch
* mm-rename-nr_anon_pages-to-nr_anon_mapped.patch
* mm-move-most-file-based-accounting-to-the-node.patch
* mm-move-vmscan-writes-and-file-write-accounting-to-the-node.patch
* mm-vmscan-only-wakeup-kswapd-once-per-node-for-the-requested-classzone.patch
* mm-page_alloc-wake-kswapd-based-on-the-highest-eligible-zone.patch
* mm-convert-zone_reclaim-to-node_reclaim.patch
* mm-vmscan-avoid-passing-in-classzone_idx-unnecessarily-to-shrink_node.patch
*

Re: [Intel-wired-lan] [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Francois Romieu

Eric Dumazet  :
[...]
> I would prefer having a definitive advice from Thomas Gleixner and/or
> others if disable_irq() is forbidden from IRQ path.
> 
> As I said, about all netpoll() methods in net drivers use disable_irq()
> so a lot of patches would be needed.

s/about all/many/

There has been a WARN_ONCE(!irqs_disabled() in netpoll_send_skb_on_dev for
quite some time now but it's apparently screened by too many tests to be
effective. :o/

-- 
Ueimor

Re: [Intel-wired-lan] [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Francois Romieu

Eric Dumazet  :
[...]
> I would prefer having a definitive advice from Thomas Gleixner and/or
> others if disable_irq() is forbidden from IRQ path.
> 
> As I said, about all netpoll() methods in net drivers use disable_irq()
> so a lot of patches would be needed.

s/about all/many/

There has been a WARN_ONCE(!irqs_disabled() in netpoll_send_skb_on_dev for
quite some time now but it's apparently screened by too many tests to be
effective. :o/

-- 
Ueimor

Re: [RFC] ath10k: silence firmware file probing warnings

2016-07-28 Thread Luis R. Rodriguez

On Thu, Jul 28, 2016 at 09:23:35PM +0200, Arend van Spriel wrote:
> On 23-07-16 00:05, Luis R. Rodriguez wrote:
> > The firmware API is a mess and I've been trying to correct that
> > with a more flexible API.

<-- snip -->

> > Extensions to the fw API are IMHO best done through a newer flexible
> > API, feel free to refer to this development tree if you'd like to
> > contribute:
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20160616-sysdata-v2
> 
> So I had a look and noticed commit c8df68e83392 ("firmware: annotate
> thou shalt not request fw on init or probe"). Now this conflicts with
> our wireless driver. The original suggestion a long, long time ago was
> to use IFF_UP as trigger to go and request firmware. However, for that
> we would need to register a netdevice during probe, and consequently we
> should also have a wiphy instance registered. However, that has all kind
> of feature flags for which we need firmware running on the device to
> query what is supported and what not. I can make a fair bet that
> brcmfmac is not the only driver with such a requirement. So how can we
> crack that nut.

Glad you asked.

Despite my latest set of updates on documentation for the firmware_class [0] 
(not
yet merged), and it being based on the latest discussed and agreed upon we
really have nothing well ironed out for what you describe, so let's try
to figure that out now.

To be clear, folks had originally said using the firmware API on *init* was
dumb, and some of this came up because of the infamous systemd udev timeout.
For completeness, I've documented some of the history of this issue
in great detail [1], mostly because I had to deal with a bug at SUSE about
this, and find a proper solution. Avoiding re-iterating *exactly why* the
timeout for kmod was ill-founded, and assuming you all now buy that,
the summary of facts of the *why* it turns out to be a bad idea to use the
firmware API on init *or* probe:

  a) by default the driver model actually calls both init and probe serially and
 synchronously
  b) we have no deterministic way of being certain we have whatever filesystem
 the driver needed as ready, the firmware may live in initramfs, or may only
 be available later on the real filesystem, and even after that the system
 may be designed to pivot_root.

In terms of solutions, lets discuss, here are a few options I can think of:

  1) Because of b) if you know you are going to use the firmware API you should
 just stuff firmware that is needed on init or probe as built-in
 (CONFIG_EXTRA_FIRMWARE) or stuff it into initramfs. This approach is 
generally
 accepted, however there is no systematic approach to ensure this is done. 
Now
 that we have coccinelle grammar to find these drivers it should be 
relatively
 easy to identify them and require the firmware as part of the 
distribution's
 initramfs or peg them as part of CONFIG_EXTRA_FIRMWARE if a distro prefers 
that.

 The only issue with this approach is its left up to the distribution to 
resolve.

  2) If the driver *knows* that probe may take a while it can request the 
driver core
 to probe the driver asynchronously, it can do so by using:

static struct pci_driver foo_pci_driver = {
  ...
  .driver.probe_type = PROBE_PREFER_ASYNCHRONOUS,
};

This would not really resolve the deterministic issues listed in b) and for
this reason this is not really a firmware-API-on-probe solution -- its an
annotation to help avoid delays boot due to knowing probe may take a while.

  3) Userspace that loads modules can / should pass the async probe generic
 module parameter "async_probe" (for instance 'modprobe ath10k async_probe')
 when loading all modules. This should already be relatively
 safe when using on modules. This is what systemd long ago assumed was
 being done anyway. Again, also this is not really a firmware-API-on-probe 
solution,
 it can however be used by systemd for instance to help avoid delays on
 boot due to module lengthy probe calls

As it stands we have no resolution for the deterministic existential issues of
the firmware but in practice 1-3 should help. 2-3 should probably be done
regardless. I've been meaning to send patches for 3) but haven't had time.

As far as the deterministic existential firmware issue... Since we're just
reading files from the filesystem now (there are exceptions to this, but my
goal is to corner-case this code with the sysdata API), if we really wanted
something rock solid for these drivers I suppose we could consider implementing
an event driven probe if a driver requests for async probe. For instance, if
async probe was requested and the driver has MODULE_FIRMWARE(firmware_name)
we could add a notifier to probe the driver once the firmware is accessible.

For all intents and purposes though -- although I know the real issue here
the deterministic way of knowing when firmware is

Re: [RFC] ath10k: silence firmware file probing warnings

2016-07-28 Thread Luis R. Rodriguez

On Thu, Jul 28, 2016 at 09:23:35PM +0200, Arend van Spriel wrote:
> On 23-07-16 00:05, Luis R. Rodriguez wrote:
> > The firmware API is a mess and I've been trying to correct that
> > with a more flexible API.

<-- snip -->

> > Extensions to the fw API are IMHO best done through a newer flexible
> > API, feel free to refer to this development tree if you'd like to
> > contribute:
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20160616-sysdata-v2
> 
> So I had a look and noticed commit c8df68e83392 ("firmware: annotate
> thou shalt not request fw on init or probe"). Now this conflicts with
> our wireless driver. The original suggestion a long, long time ago was
> to use IFF_UP as trigger to go and request firmware. However, for that
> we would need to register a netdevice during probe, and consequently we
> should also have a wiphy instance registered. However, that has all kind
> of feature flags for which we need firmware running on the device to
> query what is supported and what not. I can make a fair bet that
> brcmfmac is not the only driver with such a requirement. So how can we
> crack that nut.

Glad you asked.

Despite my latest set of updates on documentation for the firmware_class [0] 
(not
yet merged), and it being based on the latest discussed and agreed upon we
really have nothing well ironed out for what you describe, so let's try
to figure that out now.

To be clear, folks had originally said using the firmware API on *init* was
dumb, and some of this came up because of the infamous systemd udev timeout.
For completeness, I've documented some of the history of this issue
in great detail [1], mostly because I had to deal with a bug at SUSE about
this, and find a proper solution. Avoiding re-iterating *exactly why* the
timeout for kmod was ill-founded, and assuming you all now buy that,
the summary of facts of the *why* it turns out to be a bad idea to use the
firmware API on init *or* probe:

  a) by default the driver model actually calls both init and probe serially and
 synchronously
  b) we have no deterministic way of being certain we have whatever filesystem
 the driver needed as ready, the firmware may live in initramfs, or may only
 be available later on the real filesystem, and even after that the system
 may be designed to pivot_root.

In terms of solutions, lets discuss, here are a few options I can think of:

  1) Because of b) if you know you are going to use the firmware API you should
 just stuff firmware that is needed on init or probe as built-in
 (CONFIG_EXTRA_FIRMWARE) or stuff it into initramfs. This approach is 
generally
 accepted, however there is no systematic approach to ensure this is done. 
Now
 that we have coccinelle grammar to find these drivers it should be 
relatively
 easy to identify them and require the firmware as part of the 
distribution's
 initramfs or peg them as part of CONFIG_EXTRA_FIRMWARE if a distro prefers 
that.

 The only issue with this approach is its left up to the distribution to 
resolve.

  2) If the driver *knows* that probe may take a while it can request the 
driver core
 to probe the driver asynchronously, it can do so by using:

static struct pci_driver foo_pci_driver = {
  ...
  .driver.probe_type = PROBE_PREFER_ASYNCHRONOUS,
};

This would not really resolve the deterministic issues listed in b) and for
this reason this is not really a firmware-API-on-probe solution -- its an
annotation to help avoid delays boot due to knowing probe may take a while.

  3) Userspace that loads modules can / should pass the async probe generic
 module parameter "async_probe" (for instance 'modprobe ath10k async_probe')
 when loading all modules. This should already be relatively
 safe when using on modules. This is what systemd long ago assumed was
 being done anyway. Again, also this is not really a firmware-API-on-probe 
solution,
 it can however be used by systemd for instance to help avoid delays on
 boot due to module lengthy probe calls

As it stands we have no resolution for the deterministic existential issues of
the firmware but in practice 1-3 should help. 2-3 should probably be done
regardless. I've been meaning to send patches for 3) but haven't had time.

As far as the deterministic existential firmware issue... Since we're just
reading files from the filesystem now (there are exceptions to this, but my
goal is to corner-case this code with the sysdata API), if we really wanted
something rock solid for these drivers I suppose we could consider implementing
an event driven probe if a driver requests for async probe. For instance, if
async probe was requested and the driver has MODULE_FIRMWARE(firmware_name)
we could add a notifier to probe the driver once the firmware is accessible.

For all intents and purposes though -- although I know the real issue here
the deterministic way of knowing when firmware is

Re: [PATCH 2/8] KEYS: Provide keyctls to drive the new key type ops for asymmetric keys [ver #2]

2016-07-28 Thread Mat Martineau



On Thu, 23 Jun 2016, David Howells wrote:


diff --git a/include/uapi/linux/keyctl.h b/include/uapi/linux/keyctl.h
index 8ac2c5fbc8fc..93ebd25b1427 100644
--- a/include/uapi/linux/keyctl.h
+++ b/include/uapi/linux/keyctl.h
@@ -60,6 +60,11 @@
#define KEYCTL_INVALIDATE   21  /* invalidate a key */
#define KEYCTL_GET_PERSISTENT   22  /* get a user's persistent 
keyring */
#define KEYCTL_DH_COMPUTE   23  /* Compute Diffie-Hellman 
values */
+#define KEYCTL_PKEY_QUERY  24  /* Query public key parameters 
*/
+#define KEYCTL_PKEY_ENCRYPT25  /* Encrypt a blob using a 
public key */
+#define KEYCTL_PKEY_DECRYPT26  /* Decrypt a blob using a 
public key */
+#define KEYCTL_PKEY_SIGN   27  /* Create a public key 
signature */
+#define KEYCTL_PKEY_VERIFY 28  /* Verify a public key 
signature */

/* keyctl structures */
struct keyctl_dh_params {
@@ -73,4 +78,24 @@ struct keyctl_dh_params {
#define KEYCTL_SUPPORTS_SIGN0x04
#define KEYCTL_SUPPORTS_VERIFY  0x08

+struct keyctl_pkey_query {
+   __u32   supported_ops;  /* Which ops are supported */
+   __u32   key_size;   /* Size of the key in bits */
+   __u16   max_data_size;  /* Maximum size of raw data to sign in 
bytes */
+   __u16   max_sig_size;   /* Maximum size of signature in bytes */
+   __u16   max_enc_size;   /* Maximum size of encrypted blob in 
bytes */
+   __u16   max_dec_size;   /* Maximum size of decrypted blob in 
bytes */
+   __u32   __spare[10];
+};


It would also be useful to return pkey_algo so userspace can see which 
algorithm is in use for the given public key. The public key algorithm is 
printed in /proc/keys, but is not returned by KEYCTL_PKEY_QUERY or 
KEYCTL_DESCRIBE.


Does it make sense to add the information from key->type->describe() to 
KEYCTL_PKEY_QUERY or KEYCTL_DESCRIBE? Or add something new like 
KEYCTL_DESCRIBE_TYPE?


--
Mat Martineau
Intel OTC

Re: [PATCH 2/8] KEYS: Provide keyctls to drive the new key type ops for asymmetric keys [ver #2]

2016-07-28 Thread Mat Martineau



On Thu, 23 Jun 2016, David Howells wrote:


diff --git a/include/uapi/linux/keyctl.h b/include/uapi/linux/keyctl.h
index 8ac2c5fbc8fc..93ebd25b1427 100644
--- a/include/uapi/linux/keyctl.h
+++ b/include/uapi/linux/keyctl.h
@@ -60,6 +60,11 @@
#define KEYCTL_INVALIDATE   21  /* invalidate a key */
#define KEYCTL_GET_PERSISTENT   22  /* get a user's persistent 
keyring */
#define KEYCTL_DH_COMPUTE   23  /* Compute Diffie-Hellman 
values */
+#define KEYCTL_PKEY_QUERY  24  /* Query public key parameters 
*/
+#define KEYCTL_PKEY_ENCRYPT25  /* Encrypt a blob using a 
public key */
+#define KEYCTL_PKEY_DECRYPT26  /* Decrypt a blob using a 
public key */
+#define KEYCTL_PKEY_SIGN   27  /* Create a public key 
signature */
+#define KEYCTL_PKEY_VERIFY 28  /* Verify a public key 
signature */

/* keyctl structures */
struct keyctl_dh_params {
@@ -73,4 +78,24 @@ struct keyctl_dh_params {
#define KEYCTL_SUPPORTS_SIGN0x04
#define KEYCTL_SUPPORTS_VERIFY  0x08

+struct keyctl_pkey_query {
+   __u32   supported_ops;  /* Which ops are supported */
+   __u32   key_size;   /* Size of the key in bits */
+   __u16   max_data_size;  /* Maximum size of raw data to sign in 
bytes */
+   __u16   max_sig_size;   /* Maximum size of signature in bytes */
+   __u16   max_enc_size;   /* Maximum size of encrypted blob in 
bytes */
+   __u16   max_dec_size;   /* Maximum size of decrypted blob in 
bytes */
+   __u32   __spare[10];
+};


It would also be useful to return pkey_algo so userspace can see which 
algorithm is in use for the given public key. The public key algorithm is 
printed in /proc/keys, but is not returned by KEYCTL_PKEY_QUERY or 
KEYCTL_DESCRIBE.


Does it make sense to add the information from key->type->describe() to 
KEYCTL_PKEY_QUERY or KEYCTL_DESCRIBE? Or add something new like 
KEYCTL_DESCRIBE_TYPE?


--
Mat Martineau
Intel OTC

Re: [PATCH v3 2/2] tpm: add driver for cr50 on SPI

2016-07-28 Thread Jason Gunthorpe

On Thu, Jul 28, 2016 at 04:01:41PM -0700, Dmitry Torokhov wrote:

> > +   u8 tx_buf[MAX_SPI_FRAMESIZE];
> > +   u8 rx_buf[MAX_SPI_FRAMESIZE];
> 
> Both of these need to be annotated as "cacheline_aligned" since we
> eye them for use in DMA transfers.

Huh. That sure looks true to me..

We make that same mistake in all other tpm SPI drivers.

Can someone send a patch please?

Thanks,
Jason

Re: [PATCH v3 2/2] tpm: add driver for cr50 on SPI

2016-07-28 Thread Jason Gunthorpe

On Thu, Jul 28, 2016 at 04:01:41PM -0700, Dmitry Torokhov wrote:

> > +   u8 tx_buf[MAX_SPI_FRAMESIZE];
> > +   u8 rx_buf[MAX_SPI_FRAMESIZE];
> 
> Both of these need to be annotated as "cacheline_aligned" since we
> eye them for use in DMA transfers.

Huh. That sure looks true to me..

We make that same mistake in all other tpm SPI drivers.

Can someone send a patch please?

Thanks,
Jason

< 1 2 3 4 5 6 7 8 9 10 >

101 - 200 of 1092 matches

Mail list logo