date:20170606

Re: [PATCH 10/16] powerpc: vio: use dev_groups and not dev_attrs for bus_type

2017-06-06 Thread Benjamin Herrenschmidt

On Wed, 2017-06-07 at 07:45 +0200, Greg Kroah-Hartman wrote:
> On Wed, Jun 07, 2017 at 09:04:41AM +1000, Benjamin Herrenschmidt wrote:
> > On Tue, 2017-06-06 at 21:30 +0200, Greg Kroah-Hartman wrote:
> > > >   
> > > >   static struct device_attribute vio_dev_attrs[] = {
> > > >    __ATTR_RO(name),
> > > > @@ -1573,6 +1576,13 @@ static struct device_attribute vio_dev_attrs[] = 
> > > > {
> > > >    __ATTR_RO(modalias),
> > > >    __ATTR_NULL
> > > >   };
> > > > +static struct attribute *vio_dev_attrs[] = {
> > > 
> > > Hm, this feels wrong, odd that 0-day passed it.  I should be deleting
> > > the above vio_dev_attrs field as well.  Is powerpc really a dead
> > > platform?  :)
> > 
> > Haha, not yet no, and the above is actually still quite actively
> > in use as it's part of our hypervisor virtual IO infrastructure.
> 
> Ok, let me fix this, right after I emailed 0-day sent me the build error :)

Thanks !

Cheers,
Ben.

Re: [PATCH 10/16] powerpc: vio: use dev_groups and not dev_attrs for bus_type

2017-06-06 Thread Benjamin Herrenschmidt

On Wed, 2017-06-07 at 07:45 +0200, Greg Kroah-Hartman wrote:
> On Wed, Jun 07, 2017 at 09:04:41AM +1000, Benjamin Herrenschmidt wrote:
> > On Tue, 2017-06-06 at 21:30 +0200, Greg Kroah-Hartman wrote:
> > > >   
> > > >   static struct device_attribute vio_dev_attrs[] = {
> > > >    __ATTR_RO(name),
> > > > @@ -1573,6 +1576,13 @@ static struct device_attribute vio_dev_attrs[] = 
> > > > {
> > > >    __ATTR_RO(modalias),
> > > >    __ATTR_NULL
> > > >   };
> > > > +static struct attribute *vio_dev_attrs[] = {
> > > 
> > > Hm, this feels wrong, odd that 0-day passed it.  I should be deleting
> > > the above vio_dev_attrs field as well.  Is powerpc really a dead
> > > platform?  :)
> > 
> > Haha, not yet no, and the above is actually still quite actively
> > in use as it's part of our hypervisor virtual IO infrastructure.
> 
> Ok, let me fix this, right after I emailed 0-day sent me the build error :)

Thanks !

Cheers,
Ben.

[patch] compiler, clang: move inline definition to compiler-gcc.h

2017-06-06 Thread David Rientjes

The motivation of commit abb2ea7dfd82 ("compiler, clang: suppress warning 
for unused static inline functions") is to suppress clang's warnings about 
unused static inline functions.

Clang defines __GNUC__ so it inherits all of compiler-gcc.h as well, so 
the redefinition of `inline' ends up overriding the definition in 
compiler-gcc.h.

Simply annotate all inline functions as __attribute__((unused)).  It's 
necessary to suppress the warning for clang and is implicit with gcc.

Reported-by: Matthias Kaehlcke 
Signed-off-by: David Rientjes 
---
 Matthias, please add your Tested-by if this works for you, thanks!

 include/linux/compiler-clang.h |  7 ---
 include/linux/compiler-gcc.h   | 18 ++
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -15,10 +15,3 @@
  * with any version that can compile the kernel
  */
 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
-
-/*
- * GCC does not warn about unused static inline functions for
- * -Wunused-function.  This turns out to avoid the need for complex #ifdef
- * directives.  Suppress the warning in clang as well.
- */
-#define inline inline __attribute__((unused))
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 0efef9cf014f..1264f0688b10 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -65,19 +65,21 @@
 #endif
 
 /*
- * Force always-inline if the user requests it so via the .config,
- * or if gcc is too old:
+ * Force always_inline if the user requests it via the .config, or if gcc is
+ * version 3 or earlier.  Also disable unused static inline function warnings
+ * to avoid the need for complex #ifdef directives: this is implicit with gcc
+ * but is needed for clang, which also sets __GNUC__.
  */
 #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) ||   \
 !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)
-#define inline inline  __attribute__((always_inline)) notrace
-#define __inline__ __inline__  __attribute__((always_inline)) notrace
-#define __inline   __inline__attribute__((always_inline)) notrace
+#define inline inline  __attribute__((always_inline,unused)) 
notrace
+#define __inline__ __inline__  __attribute__((always_inline,unused)) 
notrace
+#define __inline   __inline__attribute__((always_inline,unused)) 
notrace
 #else
 /* A lot of inline functions can cause havoc with function tracing */
-#define inline inline  notrace
-#define __inline__ __inline__  notrace
-#define __inline   __inlinenotrace
+#define inline inline  __attribute__((unused)) notrace
+#define __inline__ __inline__  __attribute__((unused)) notrace
+#define __inline   __inline__attribute__((unused)) notrace
 #endif
 
 #define __always_inlineinline __attribute__((always_inline))

[patch] compiler, clang: move inline definition to compiler-gcc.h

2017-06-06 Thread David Rientjes

The motivation of commit abb2ea7dfd82 ("compiler, clang: suppress warning 
for unused static inline functions") is to suppress clang's warnings about 
unused static inline functions.

Clang defines __GNUC__ so it inherits all of compiler-gcc.h as well, so 
the redefinition of `inline' ends up overriding the definition in 
compiler-gcc.h.

Simply annotate all inline functions as __attribute__((unused)).  It's 
necessary to suppress the warning for clang and is implicit with gcc.

Reported-by: Matthias Kaehlcke 
Signed-off-by: David Rientjes 
---
 Matthias, please add your Tested-by if this works for you, thanks!

 include/linux/compiler-clang.h |  7 ---
 include/linux/compiler-gcc.h   | 18 ++
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -15,10 +15,3 @@
  * with any version that can compile the kernel
  */
 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
-
-/*
- * GCC does not warn about unused static inline functions for
- * -Wunused-function.  This turns out to avoid the need for complex #ifdef
- * directives.  Suppress the warning in clang as well.
- */
-#define inline inline __attribute__((unused))
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 0efef9cf014f..1264f0688b10 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -65,19 +65,21 @@
 #endif
 
 /*
- * Force always-inline if the user requests it so via the .config,
- * or if gcc is too old:
+ * Force always_inline if the user requests it via the .config, or if gcc is
+ * version 3 or earlier.  Also disable unused static inline function warnings
+ * to avoid the need for complex #ifdef directives: this is implicit with gcc
+ * but is needed for clang, which also sets __GNUC__.
  */
 #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) ||   \
 !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)
-#define inline inline  __attribute__((always_inline)) notrace
-#define __inline__ __inline__  __attribute__((always_inline)) notrace
-#define __inline   __inline__attribute__((always_inline)) notrace
+#define inline inline  __attribute__((always_inline,unused)) 
notrace
+#define __inline__ __inline__  __attribute__((always_inline,unused)) 
notrace
+#define __inline   __inline__attribute__((always_inline,unused)) 
notrace
 #else
 /* A lot of inline functions can cause havoc with function tracing */
-#define inline inline  notrace
-#define __inline__ __inline__  notrace
-#define __inline   __inlinenotrace
+#define inline inline  __attribute__((unused)) notrace
+#define __inline__ __inline__  __attribute__((unused)) notrace
+#define __inline   __inline__attribute__((unused)) notrace
 #endif
 
 #define __always_inlineinline __attribute__((always_inline))

[PATCH 1/2] HID: Add Oculus Rift CV1 id

2017-06-06 Thread Philipp Zabel

Add VID/PID for Oculus Rift CV1.

Signed-off-by: Philipp Zabel 
---
 drivers/hid/hid-ids.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index 8ca1e8ce0af2..2953d53a8cc9 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -1133,6 +1133,9 @@
 #define USB_VENDOR_ID_MULTIPLE_17810x1781
 #define USB_DEVICE_ID_RAPHNET_4NES4SNES_OLD0x0a9d
 
+#define USB_VENDOR_ID_OCULUSVR 0x2833
+#define USB_DEVICE_ID_RIFT_CV1 0x0031
+
 #define USB_VENDOR_ID_DRACAL_RAPHNET   0x289b
 #define USB_DEVICE_ID_RAPHNET_2NES2SNES0x0002
 #define USB_DEVICE_ID_RAPHNET_4NES4SNES0x0003
-- 
2.11.0

[PATCH 1/2] HID: Add Oculus Rift CV1 id

2017-06-06 Thread Philipp Zabel

Add VID/PID for Oculus Rift CV1.

Signed-off-by: Philipp Zabel 
---
 drivers/hid/hid-ids.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index 8ca1e8ce0af2..2953d53a8cc9 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -1133,6 +1133,9 @@
 #define USB_VENDOR_ID_MULTIPLE_17810x1781
 #define USB_DEVICE_ID_RAPHNET_4NES4SNES_OLD0x0a9d
 
+#define USB_VENDOR_ID_OCULUSVR 0x2833
+#define USB_DEVICE_ID_RIFT_CV1 0x0031
+
 #define USB_VENDOR_ID_DRACAL_RAPHNET   0x289b
 #define USB_DEVICE_ID_RAPHNET_2NES2SNES0x0002
 #define USB_DEVICE_ID_RAPHNET_4NES4SNES0x0003
-- 
2.11.0

[PATCH 2/2] HID: usbhid: Add HID_QUIRK_NO_INIT_REPORTS for Oculus Rift CV1

2017-06-06 Thread Philipp Zabel

When plugging in an Oculus Rift CV1 HMD, it takes a long time until the hidraw
devices appear, specifically two control transfers time out querying the HID
report descriptors:

$ echo 1 > /sys/module/hid/parameters/debug

usb 1-3.1: new full-speed USB device number 11 using xhci_hcd
usb 1-3.1: New USB device found, idVendor=2833, idProduct=0031
usb 1-3.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-3.1: Product: Rift
usb 1-3.1: Manufacturer: Oculus VR, Inc.
drivers/hid/usbhid/hid-core.c: HID probe called for ifnum 0
hid-generic 0003:2833:0031.0005: Kicking head 1 tail 0
drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report 
wValue=0x0101 wIndex=0x wLength=62
drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report 
wValue=0x010b wIndex=0x wLength=64

[10 s delay]

drivers/hid/usbhid/hid-core.c: timeout waiting for ctrl or out queue to 
clear
drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report 
wValue=0x010c wIndex=0x wLength=64
hid-generic 0003:2833:0031.0005: usb_submit_urb(ctrl) failed: -1
hid-generic 0003:2833:0031.0005: timeout initializing reports
hid-generic 0003:2833:0031.0005: hiddev0,hidraw0: USB HID v1.10 Device 
[Oculus VR, Inc. Rift] on usb-:00:14.0-3.1/input0
drivers/hid/usbhid/hid-core.c: HID probe called for ifnum 1
hid-generic 0003:2833:0031.0006: Kicking head 1 tail 0
drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report 
wValue=0x0101 wIndex=0x0001 wLength=62
drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report 
wValue=0x010b wIndex=0x0001 wLength=64

[10 s delay]

drivers/hid/usbhid/hid-core.c: timeout waiting for ctrl or out queue to 
clear
drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report 
wValue=0x010c wIndex=0x0001 wLength=64
hid-generic 0003:2833:0031.0006: usb_submit_urb(ctrl) failed: -1
hid-generic 0003:2833:0031.0006: timeout initializing reports
hid-generic 0003:2833:0031.0006: hiddev0,hidraw1: USB HID v1.10 Device 
[Oculus VR, Inc. Rift] on usb-:00:14.0-3.1/input1
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver

These timeouts disappear when setting HID_QUIRK_NO_INIT_REPORTS. The hidraw
devices appear quickly and are functional.

Signed-off-by: Philipp Zabel 
---
 drivers/hid/usbhid/hid-quirks.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/hid/usbhid/hid-quirks.c b/drivers/hid/usbhid/hid-quirks.c
index 6316498b7812..7b2df4042167 100644
--- a/drivers/hid/usbhid/hid-quirks.c
+++ b/drivers/hid/usbhid/hid-quirks.c
@@ -111,6 +111,7 @@ static const struct hid_blacklist {
{ USB_VENDOR_ID_MSI, USB_DEVICE_ID_MSI_GT683R_LED_PANEL, 
HID_QUIRK_NO_INIT_REPORTS },
{ USB_VENDOR_ID_NEXIO, USB_DEVICE_ID_NEXIO_MULTITOUCH_PTI0750, 
HID_QUIRK_NO_INIT_REPORTS },
{ USB_VENDOR_ID_NOVATEK, USB_DEVICE_ID_NOVATEK_MOUSE, 
HID_QUIRK_NO_INIT_REPORTS },
+   { USB_VENDOR_ID_OCULUSVR, USB_DEVICE_ID_RIFT_CV1, 
HID_QUIRK_NO_INIT_REPORTS },
{ USB_VENDOR_ID_PENMOUNT, USB_DEVICE_ID_PENMOUNT_1610, HID_QUIRK_NOGET 
},
{ USB_VENDOR_ID_PENMOUNT, USB_DEVICE_ID_PENMOUNT_1640, HID_QUIRK_NOGET 
},
{ USB_VENDOR_ID_PIXART, USB_DEVICE_ID_PIXART_USB_OPTICAL_MOUSE, 
HID_QUIRK_ALWAYS_POLL },
-- 
2.11.0

[PATCH 2/2] HID: usbhid: Add HID_QUIRK_NO_INIT_REPORTS for Oculus Rift CV1

2017-06-06 Thread Philipp Zabel

When plugging in an Oculus Rift CV1 HMD, it takes a long time until the hidraw
devices appear, specifically two control transfers time out querying the HID
report descriptors:

$ echo 1 > /sys/module/hid/parameters/debug

usb 1-3.1: new full-speed USB device number 11 using xhci_hcd
usb 1-3.1: New USB device found, idVendor=2833, idProduct=0031
usb 1-3.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-3.1: Product: Rift
usb 1-3.1: Manufacturer: Oculus VR, Inc.
drivers/hid/usbhid/hid-core.c: HID probe called for ifnum 0
hid-generic 0003:2833:0031.0005: Kicking head 1 tail 0
drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report 
wValue=0x0101 wIndex=0x wLength=62
drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report 
wValue=0x010b wIndex=0x wLength=64

[10 s delay]

drivers/hid/usbhid/hid-core.c: timeout waiting for ctrl or out queue to 
clear
drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report 
wValue=0x010c wIndex=0x wLength=64
hid-generic 0003:2833:0031.0005: usb_submit_urb(ctrl) failed: -1
hid-generic 0003:2833:0031.0005: timeout initializing reports
hid-generic 0003:2833:0031.0005: hiddev0,hidraw0: USB HID v1.10 Device 
[Oculus VR, Inc. Rift] on usb-:00:14.0-3.1/input0
drivers/hid/usbhid/hid-core.c: HID probe called for ifnum 1
hid-generic 0003:2833:0031.0006: Kicking head 1 tail 0
drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report 
wValue=0x0101 wIndex=0x0001 wLength=62
drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report 
wValue=0x010b wIndex=0x0001 wLength=64

[10 s delay]

drivers/hid/usbhid/hid-core.c: timeout waiting for ctrl or out queue to 
clear
drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report 
wValue=0x010c wIndex=0x0001 wLength=64
hid-generic 0003:2833:0031.0006: usb_submit_urb(ctrl) failed: -1
hid-generic 0003:2833:0031.0006: timeout initializing reports
hid-generic 0003:2833:0031.0006: hiddev0,hidraw1: USB HID v1.10 Device 
[Oculus VR, Inc. Rift] on usb-:00:14.0-3.1/input1
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver

These timeouts disappear when setting HID_QUIRK_NO_INIT_REPORTS. The hidraw
devices appear quickly and are functional.

Signed-off-by: Philipp Zabel 
---
 drivers/hid/usbhid/hid-quirks.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/hid/usbhid/hid-quirks.c b/drivers/hid/usbhid/hid-quirks.c
index 6316498b7812..7b2df4042167 100644
--- a/drivers/hid/usbhid/hid-quirks.c
+++ b/drivers/hid/usbhid/hid-quirks.c
@@ -111,6 +111,7 @@ static const struct hid_blacklist {
{ USB_VENDOR_ID_MSI, USB_DEVICE_ID_MSI_GT683R_LED_PANEL, 
HID_QUIRK_NO_INIT_REPORTS },
{ USB_VENDOR_ID_NEXIO, USB_DEVICE_ID_NEXIO_MULTITOUCH_PTI0750, 
HID_QUIRK_NO_INIT_REPORTS },
{ USB_VENDOR_ID_NOVATEK, USB_DEVICE_ID_NOVATEK_MOUSE, 
HID_QUIRK_NO_INIT_REPORTS },
+   { USB_VENDOR_ID_OCULUSVR, USB_DEVICE_ID_RIFT_CV1, 
HID_QUIRK_NO_INIT_REPORTS },
{ USB_VENDOR_ID_PENMOUNT, USB_DEVICE_ID_PENMOUNT_1610, HID_QUIRK_NOGET 
},
{ USB_VENDOR_ID_PENMOUNT, USB_DEVICE_ID_PENMOUNT_1640, HID_QUIRK_NOGET 
},
{ USB_VENDOR_ID_PIXART, USB_DEVICE_ID_PIXART_USB_OPTICAL_MOUSE, 
HID_QUIRK_ALWAYS_POLL },
-- 
2.11.0

Re: [PATCH] clk: sunxi-ng: Move all clock types to a library

2017-06-06 Thread Chen-Yu Tsai

On Tue, Jun 6, 2017 at 6:04 PM, Arnd Bergmann  wrote:
> On Mon, Jun 5, 2017 at 4:45 PM, Maxime Ripard
>  wrote:
>> Hi,
>>
>> On Sat, Jun 03, 2017 at 12:22:32PM +0800, kbuild test robot wrote:
>>> Hi Stephen,
>>>
>>> [auto build test ERROR on sunxi/sunxi/for-next]
>>> [also build test ERROR on next-20170602]
>>> [cannot apply to clk/clk-next v4.12-rc3]
>>> [if your patch is applied to the wrong git tree, please drop us a note to 
>>> help improve the system]
>>>
>>>sunxi_sid.c:(.data+0x1da8c): undefined reference to `ccu_gate_ops'
>>>sunxi_sid.c:(.data+0x1dac8): undefined reference to `ccu_gate_ops'
>>>sunxi_sid.c:(.data+0x1db20): undefined reference to `ccu_mux_ops'
>>>sunxi_sid.c:(.data+0x1db74): undefined reference to `ccu_mux_ops'
>>>sunxi_sid.c:(.data+0x1dbec): undefined reference to `ccu_mp_ops'
>>>sunxi_sid.c:(.data+0x1dc64): undefined reference to `ccu_mp_ops'
>>
>> It seems like even though the lib.a file is compiled properly, it is
>> never linked in. It looks like we would be supposed to add
>> drivers/clk/sunxi-ng/ to libs-y, but that doesn't seem to work for
>> Makefiles in drivers/*
>
> Ah, too bad. I see that only one directory under drivers/ uses something
> with lib.a, in drivers/firmware/efi/libstub/Makefile, but that seems to
> rely on being special-cased as well.
>
> However, this patch seems to fix it:
>
> diff --git a/drivers/clk/sunxi-ng/Makefile b/drivers/clk/sunxi-ng/Makefile
> index cbc8cb4f70e3..321d3da7cc6a 100644
> --- a/drivers/clk/sunxi-ng/Makefile
> +++ b/drivers/clk/sunxi-ng/Makefile
> @@ -18,16 +18,16 @@ lib-$(CONFIG_SUNXI_CCU) += ccu_nm.o
>  lib-$(CONFIG_SUNXI_CCU) += ccu_mp.o
>
>  # SoC support
> -obj-$(CONFIG_SUN50I_A64_CCU) += ccu-sun50i-a64.o
> -obj-$(CONFIG_SUN5I_CCU) += ccu-sun5i.o
> -obj-$(CONFIG_SUN6I_A31_CCU) += ccu-sun6i-a31.o
> -obj-$(CONFIG_SUN8I_A23_CCU) += ccu-sun8i-a23.o
> -obj-$(CONFIG_SUN8I_A33_CCU) += ccu-sun8i-a33.o
> -obj-$(CONFIG_SUN8I_A83T_CCU) += ccu-sun8i-a83t.o
> -obj-$(CONFIG_SUN8I_H3_CCU) += ccu-sun8i-h3.o
> -obj-$(CONFIG_SUN8I_V3S_CCU) += ccu-sun8i-v3s.o
> -obj-$(CONFIG_SUN8I_DE2_CCU) += ccu-sun8i-de2.o
> -obj-$(CONFIG_SUN8I_R_CCU) += ccu-sun8i-r.o
> -obj-$(CONFIG_SUN9I_A80_CCU) += ccu-sun9i-a80.o
> -obj-$(CONFIG_SUN9I_A80_CCU) += ccu-sun9i-a80-de.o
> -obj-$(CONFIG_SUN9I_A80_CCU) += ccu-sun9i-a80-usb.o
> +obj-$(CONFIG_SUN50I_A64_CCU) += ccu-sun50i-a64.o lib.a
> +obj-$(CONFIG_SUN5I_CCU) += ccu-sun5i.o lib.a
> +obj-$(CONFIG_SUN6I_A31_CCU) += ccu-sun6i-a31.o lib.a
> +obj-$(CONFIG_SUN8I_A23_CCU) += ccu-sun8i-a23.o lib.a
> +obj-$(CONFIG_SUN8I_A33_CCU) += ccu-sun8i-a33.o lib.a
> +obj-$(CONFIG_SUN8I_A83T_CCU) += ccu-sun8i-a83t.o lib.a
> +obj-$(CONFIG_SUN8I_H3_CCU) += ccu-sun8i-h3.o lib.a
> +obj-$(CONFIG_SUN8I_V3S_CCU) += ccu-sun8i-v3s.o lib.a
> +obj-$(CONFIG_SUN8I_DE2_CCU) += ccu-sun8i-de2.o lib.a
> +obj-$(CONFIG_SUN8I_R_CCU) += ccu-sun8i-r.o lib.a
> +obj-$(CONFIG_SUN9I_A80_CCU) += ccu-sun9i-a80.o lib.a
> +obj-$(CONFIG_SUN9I_A80_CCU) += ccu-sun9i-a80-de.o lib.a
> +obj-$(CONFIG_SUN9I_A80_CCU) += ccu-sun9i-a80-usb.o lib.a
>
> It's not documented behavior, but I think it's still good enough,
> and improves the current version, unless there is another bug
> after we add this to your patch.

I think having just

obj-$(CONFIG_SUNXI_CCU) += lib.a

at the bottom (with a comment) would be cleaner, and
we wouldn't need to modify all the existing lines.
AFAIK about Makefiles, that should work?

ChenYu

>
> Arnd

Re: [PATCH 10/16] powerpc: vio: use dev_groups and not dev_attrs for bus_type

2017-06-06 Thread Greg Kroah-Hartman

On Wed, Jun 07, 2017 at 09:04:41AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2017-06-06 at 21:30 +0200, Greg Kroah-Hartman wrote:
> > >   
> > >   static struct device_attribute vio_dev_attrs[] = {
> > >    __ATTR_RO(name),
> > > @@ -1573,6 +1576,13 @@ static struct device_attribute vio_dev_attrs[] = {
> > >    __ATTR_RO(modalias),
> > >    __ATTR_NULL
> > >   };
> > > +static struct attribute *vio_dev_attrs[] = {
> > 
> > Hm, this feels wrong, odd that 0-day passed it.  I should be deleting
> > the above vio_dev_attrs field as well.  Is powerpc really a dead
> > platform?  :)
> 
> Haha, not yet no, and the above is actually still quite actively
> in use as it's part of our hypervisor virtual IO infrastructure.

Ok, let me fix this, right after I emailed 0-day sent me the build error :)

Re: [PATCH] clk: sunxi-ng: Move all clock types to a library

2017-06-06 Thread Chen-Yu Tsai

On Tue, Jun 6, 2017 at 6:04 PM, Arnd Bergmann  wrote:
> On Mon, Jun 5, 2017 at 4:45 PM, Maxime Ripard
>  wrote:
>> Hi,
>>
>> On Sat, Jun 03, 2017 at 12:22:32PM +0800, kbuild test robot wrote:
>>> Hi Stephen,
>>>
>>> [auto build test ERROR on sunxi/sunxi/for-next]
>>> [also build test ERROR on next-20170602]
>>> [cannot apply to clk/clk-next v4.12-rc3]
>>> [if your patch is applied to the wrong git tree, please drop us a note to 
>>> help improve the system]
>>>
>>>sunxi_sid.c:(.data+0x1da8c): undefined reference to `ccu_gate_ops'
>>>sunxi_sid.c:(.data+0x1dac8): undefined reference to `ccu_gate_ops'
>>>sunxi_sid.c:(.data+0x1db20): undefined reference to `ccu_mux_ops'
>>>sunxi_sid.c:(.data+0x1db74): undefined reference to `ccu_mux_ops'
>>>sunxi_sid.c:(.data+0x1dbec): undefined reference to `ccu_mp_ops'
>>>sunxi_sid.c:(.data+0x1dc64): undefined reference to `ccu_mp_ops'
>>
>> It seems like even though the lib.a file is compiled properly, it is
>> never linked in. It looks like we would be supposed to add
>> drivers/clk/sunxi-ng/ to libs-y, but that doesn't seem to work for
>> Makefiles in drivers/*
>
> Ah, too bad. I see that only one directory under drivers/ uses something
> with lib.a, in drivers/firmware/efi/libstub/Makefile, but that seems to
> rely on being special-cased as well.
>
> However, this patch seems to fix it:
>
> diff --git a/drivers/clk/sunxi-ng/Makefile b/drivers/clk/sunxi-ng/Makefile
> index cbc8cb4f70e3..321d3da7cc6a 100644
> --- a/drivers/clk/sunxi-ng/Makefile
> +++ b/drivers/clk/sunxi-ng/Makefile
> @@ -18,16 +18,16 @@ lib-$(CONFIG_SUNXI_CCU) += ccu_nm.o
>  lib-$(CONFIG_SUNXI_CCU) += ccu_mp.o
>
>  # SoC support
> -obj-$(CONFIG_SUN50I_A64_CCU) += ccu-sun50i-a64.o
> -obj-$(CONFIG_SUN5I_CCU) += ccu-sun5i.o
> -obj-$(CONFIG_SUN6I_A31_CCU) += ccu-sun6i-a31.o
> -obj-$(CONFIG_SUN8I_A23_CCU) += ccu-sun8i-a23.o
> -obj-$(CONFIG_SUN8I_A33_CCU) += ccu-sun8i-a33.o
> -obj-$(CONFIG_SUN8I_A83T_CCU) += ccu-sun8i-a83t.o
> -obj-$(CONFIG_SUN8I_H3_CCU) += ccu-sun8i-h3.o
> -obj-$(CONFIG_SUN8I_V3S_CCU) += ccu-sun8i-v3s.o
> -obj-$(CONFIG_SUN8I_DE2_CCU) += ccu-sun8i-de2.o
> -obj-$(CONFIG_SUN8I_R_CCU) += ccu-sun8i-r.o
> -obj-$(CONFIG_SUN9I_A80_CCU) += ccu-sun9i-a80.o
> -obj-$(CONFIG_SUN9I_A80_CCU) += ccu-sun9i-a80-de.o
> -obj-$(CONFIG_SUN9I_A80_CCU) += ccu-sun9i-a80-usb.o
> +obj-$(CONFIG_SUN50I_A64_CCU) += ccu-sun50i-a64.o lib.a
> +obj-$(CONFIG_SUN5I_CCU) += ccu-sun5i.o lib.a
> +obj-$(CONFIG_SUN6I_A31_CCU) += ccu-sun6i-a31.o lib.a
> +obj-$(CONFIG_SUN8I_A23_CCU) += ccu-sun8i-a23.o lib.a
> +obj-$(CONFIG_SUN8I_A33_CCU) += ccu-sun8i-a33.o lib.a
> +obj-$(CONFIG_SUN8I_A83T_CCU) += ccu-sun8i-a83t.o lib.a
> +obj-$(CONFIG_SUN8I_H3_CCU) += ccu-sun8i-h3.o lib.a
> +obj-$(CONFIG_SUN8I_V3S_CCU) += ccu-sun8i-v3s.o lib.a
> +obj-$(CONFIG_SUN8I_DE2_CCU) += ccu-sun8i-de2.o lib.a
> +obj-$(CONFIG_SUN8I_R_CCU) += ccu-sun8i-r.o lib.a
> +obj-$(CONFIG_SUN9I_A80_CCU) += ccu-sun9i-a80.o lib.a
> +obj-$(CONFIG_SUN9I_A80_CCU) += ccu-sun9i-a80-de.o lib.a
> +obj-$(CONFIG_SUN9I_A80_CCU) += ccu-sun9i-a80-usb.o lib.a
>
> It's not documented behavior, but I think it's still good enough,
> and improves the current version, unless there is another bug
> after we add this to your patch.

I think having just

obj-$(CONFIG_SUNXI_CCU) += lib.a

at the bottom (with a comment) would be cleaner, and
we wouldn't need to modify all the existing lines.
AFAIK about Makefiles, that should work?

ChenYu

>
> Arnd

Re: [PATCH 10/16] powerpc: vio: use dev_groups and not dev_attrs for bus_type

2017-06-06 Thread Greg Kroah-Hartman

On Wed, Jun 07, 2017 at 09:04:41AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2017-06-06 at 21:30 +0200, Greg Kroah-Hartman wrote:
> > >   
> > >   static struct device_attribute vio_dev_attrs[] = {
> > >    __ATTR_RO(name),
> > > @@ -1573,6 +1576,13 @@ static struct device_attribute vio_dev_attrs[] = {
> > >    __ATTR_RO(modalias),
> > >    __ATTR_NULL
> > >   };
> > > +static struct attribute *vio_dev_attrs[] = {
> > 
> > Hm, this feels wrong, odd that 0-day passed it.  I should be deleting
> > the above vio_dev_attrs field as well.  Is powerpc really a dead
> > platform?  :)
> 
> Haha, not yet no, and the above is actually still quite actively
> in use as it's part of our hypervisor virtual IO infrastructure.

Ok, let me fix this, right after I emailed 0-day sent me the build error :)

Re: [PATCH v9 07/10] powerpc/perf: PMU functions for Core IMC and hotplugging

2017-06-06 Thread Anju T Sudhakar


Hi Thomas,

On Tuesday 06 June 2017 03:39 PM, Thomas Gleixner wrote:

On Mon, 5 Jun 2017, Anju T Sudhakar wrote:

+static void cleanup_all_core_imc_memory(struct imc_pmu *pmu_ptr)
+{
+   struct imc_mem_info *ptr = pmu_ptr->mem_info;
+
+   if (!ptr)
+   return;

That's pointless.


No, it is not.  We may end up here from imc_mem_init() when the memory 
allocation for
pmu_ptr->mem_info fails. So in that case we can just return from here, 
and kfree wont be

called with a NULL pointer.


+   for (; ptr; ptr++) {

for (ptr = pmu_ptr->mem_info; ptr; ptr++) {

will do the right thing.


+   if (ptr->vbase[0] != 0)
+   free_pages(ptr->vbase[0], 0);
+   }

and kfree can be called with a NULL pointer.


+   kfree(pmu_ptr->mem_info);
+}
+
+/* Enabling of Core Engine needs a scom operation */
+static void core_imc_control_enable(int *cpu_opal_rc)
+{
+   int rc;
+
+   rc = opal_imc_counters_start(OPAL_IMC_COUNTERS_CORE);
+   if (rc)
+   cpu_opal_rc[smp_processor_id()] = 1;
+}
+
+/*
+ * Disabling of IMC Core Engine needs a scom operation
+ */
+static void core_imc_control_disable(int *cpu_opal_rc)
+{
+   int rc;
+
+   rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE);
+   if (rc)
+   cpu_opal_rc[smp_processor_id()] = 1;
+}
+
+int core_imc_control(int operation)
+{
+   int cpu, *cpus_opal_rc;
+
+   /* Memory for OPAL call return value. */
+   cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
+   if (!cpus_opal_rc)
+   return -ENOMEM;
+
+   /*
+* Enable or disable the core IMC PMU on each core using the
+* core_imc_cpumask.
+*/
+   switch (operation) {
+
+   case IMC_COUNTER_DISABLE:
+   on_each_cpu_mask(_imc_cpumask,
+   (smp_call_func_t)core_imc_control_disable,
+   cpus_opal_rc, 1);
+   break;
+   case IMC_COUNTER_ENABLE:
+   on_each_cpu_mask(_imc_cpumask,
+   (smp_call_func_t)core_imc_control_enable,
+   cpus_opal_rc, 1);
+   break;
+   default:
+   goto fail;
+   }
+   /* Check return value array for any OPAL call failure */
+   for_each_cpu(cpu, _imc_cpumask) {
+   if (cpus_opal_rc[cpu])
+   goto fail;
+   }
+   return 0;
+fail:
+   if (cpus_opal_rc)
+   kfree(cpus_opal_rc);
+   return -EINVAL;
+}

Interesting. You copied the other function and then implemented it
differently.

But, what's worse is that this is just duplicated code for no value. Both
the nest and core control functions call

 opal_imc_counters_(WHICH_COUNTER);

So the obvious thing to do is:

static struct cpumask imc_result_mask;
static DEFINE_MUTEX(imc_control_mutex);

static void opal_imc_XXX(void *which)
{
if (opal_imc_counters_XXX((unsigned long) which))
cpumask_set_cpu(smp_processor_id(), _result_mask);
}

static int imc_control(unsigned long which, bool start)
{
smp_call_func_t *fn;
int res;

mutex_lock(_control_mutex);
memset(_result_mask, 0, sizeof(imc_result_mask));

switch (which) {
case OPAL_IMC_COUNTERS_CORE:
case OPAL_IMC_COUNTERS_NEST:
break;
default:
res = -EINVAL;
goto out;
}

fn = start ? opal_imc_start : opal_imc_stop;

on_each_cpu_mask(_imc_cpumask, fn, (void *) which, 1);

res = cpumask_empty(_imc_result) ? 0 : -ENODEV;
out:
mutex_unlock(_control_mutex);
return res;

That would allow sharing of too much code, right?


Yes. This looks really good. Thanks!
I will rework the code.


+/*
+ * core_imc_mem_init : Initializes memory for the current core.
+ *
+ * Uses alloc_pages_exact_nid() and uses the returned address as an argument to
+ * an opal call to configure the pdbar. The address sent as an argument is
+ * converted to physical address before the opal call is made. This is the
+ * base address at which the core imc counters are populated.
+ */
+static int  __meminit core_imc_mem_init(int cpu, int size)
+{
+   int phys_id, rc = 0, core_id = (cpu / threads_per_core);
+   struct imc_mem_info *mem_info = NULL;

What's that NULL initialization for?


+
+   phys_id = topology_physical_package_id(cpu);
+   /*
+* alloc_pages_exact_nid() will allocate memory for core in the
+* local node only.
+*/
+   mem_info = _imc_pmu->mem_info[core_id];
+   mem_info->id = core_id;
+   mem_info->vbase[0] = (u64) alloc_pages_exact_nid(phys_id,
+   (size_t)size, GFP_KERNEL | __GFP_ZERO);

That allocation is guaranteed not to fail?


Nice catch. We need a check here. Will fix this.


Thanks and Regards,

Anju

Re: [PATCH v9 07/10] powerpc/perf: PMU functions for Core IMC and hotplugging

2017-06-06 Thread Anju T Sudhakar


Hi Thomas,

On Tuesday 06 June 2017 03:39 PM, Thomas Gleixner wrote:

On Mon, 5 Jun 2017, Anju T Sudhakar wrote:

+static void cleanup_all_core_imc_memory(struct imc_pmu *pmu_ptr)
+{
+   struct imc_mem_info *ptr = pmu_ptr->mem_info;
+
+   if (!ptr)
+   return;

That's pointless.


No, it is not.  We may end up here from imc_mem_init() when the memory 
allocation for
pmu_ptr->mem_info fails. So in that case we can just return from here, 
and kfree wont be

called with a NULL pointer.


+   for (; ptr; ptr++) {

for (ptr = pmu_ptr->mem_info; ptr; ptr++) {

will do the right thing.


+   if (ptr->vbase[0] != 0)
+   free_pages(ptr->vbase[0], 0);
+   }

and kfree can be called with a NULL pointer.


+   kfree(pmu_ptr->mem_info);
+}
+
+/* Enabling of Core Engine needs a scom operation */
+static void core_imc_control_enable(int *cpu_opal_rc)
+{
+   int rc;
+
+   rc = opal_imc_counters_start(OPAL_IMC_COUNTERS_CORE);
+   if (rc)
+   cpu_opal_rc[smp_processor_id()] = 1;
+}
+
+/*
+ * Disabling of IMC Core Engine needs a scom operation
+ */
+static void core_imc_control_disable(int *cpu_opal_rc)
+{
+   int rc;
+
+   rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE);
+   if (rc)
+   cpu_opal_rc[smp_processor_id()] = 1;
+}
+
+int core_imc_control(int operation)
+{
+   int cpu, *cpus_opal_rc;
+
+   /* Memory for OPAL call return value. */
+   cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
+   if (!cpus_opal_rc)
+   return -ENOMEM;
+
+   /*
+* Enable or disable the core IMC PMU on each core using the
+* core_imc_cpumask.
+*/
+   switch (operation) {
+
+   case IMC_COUNTER_DISABLE:
+   on_each_cpu_mask(_imc_cpumask,
+   (smp_call_func_t)core_imc_control_disable,
+   cpus_opal_rc, 1);
+   break;
+   case IMC_COUNTER_ENABLE:
+   on_each_cpu_mask(_imc_cpumask,
+   (smp_call_func_t)core_imc_control_enable,
+   cpus_opal_rc, 1);
+   break;
+   default:
+   goto fail;
+   }
+   /* Check return value array for any OPAL call failure */
+   for_each_cpu(cpu, _imc_cpumask) {
+   if (cpus_opal_rc[cpu])
+   goto fail;
+   }
+   return 0;
+fail:
+   if (cpus_opal_rc)
+   kfree(cpus_opal_rc);
+   return -EINVAL;
+}

Interesting. You copied the other function and then implemented it
differently.

But, what's worse is that this is just duplicated code for no value. Both
the nest and core control functions call

 opal_imc_counters_(WHICH_COUNTER);

So the obvious thing to do is:

static struct cpumask imc_result_mask;
static DEFINE_MUTEX(imc_control_mutex);

static void opal_imc_XXX(void *which)
{
if (opal_imc_counters_XXX((unsigned long) which))
cpumask_set_cpu(smp_processor_id(), _result_mask);
}

static int imc_control(unsigned long which, bool start)
{
smp_call_func_t *fn;
int res;

mutex_lock(_control_mutex);
memset(_result_mask, 0, sizeof(imc_result_mask));

switch (which) {
case OPAL_IMC_COUNTERS_CORE:
case OPAL_IMC_COUNTERS_NEST:
break;
default:
res = -EINVAL;
goto out;
}

fn = start ? opal_imc_start : opal_imc_stop;

on_each_cpu_mask(_imc_cpumask, fn, (void *) which, 1);

res = cpumask_empty(_imc_result) ? 0 : -ENODEV;
out:
mutex_unlock(_control_mutex);
return res;

That would allow sharing of too much code, right?


Yes. This looks really good. Thanks!
I will rework the code.


+/*
+ * core_imc_mem_init : Initializes memory for the current core.
+ *
+ * Uses alloc_pages_exact_nid() and uses the returned address as an argument to
+ * an opal call to configure the pdbar. The address sent as an argument is
+ * converted to physical address before the opal call is made. This is the
+ * base address at which the core imc counters are populated.
+ */
+static int  __meminit core_imc_mem_init(int cpu, int size)
+{
+   int phys_id, rc = 0, core_id = (cpu / threads_per_core);
+   struct imc_mem_info *mem_info = NULL;

What's that NULL initialization for?


+
+   phys_id = topology_physical_package_id(cpu);
+   /*
+* alloc_pages_exact_nid() will allocate memory for core in the
+* local node only.
+*/
+   mem_info = _imc_pmu->mem_info[core_id];
+   mem_info->id = core_id;
+   mem_info->vbase[0] = (u64) alloc_pages_exact_nid(phys_id,
+   (size_t)size, GFP_KERNEL | __GFP_ZERO);

That allocation is guaranteed not to fail?


Nice catch. We need a check here. Will fix this.


Thanks and Regards,

Anju

Re: [PATCH v3 1/3] EDAC: mv64x60: check driver registration success

2017-06-06 Thread Chris Packham

Hi Borislav,

On 30/05/17 09:21, Chris Packham wrote:
> Check the return status of platform_driver_register() in
> mv64x60_edac_init(). Only output messages and initialise the
> edac_op_state if the registration is successful.
> 
> Signed-off-by: Chris Packham 
> ---
> Changes in v3:
> - catch the retval of platform_register_drivers and return early on failure
>(thanks Borislav).
> 
>   drivers/edac/mv64x60_edac.c | 8 ++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/edac/mv64x60_edac.c b/drivers/edac/mv64x60_edac.c
> index 14b7e7b71eaa..172081551a70 100644
> --- a/drivers/edac/mv64x60_edac.c
> +++ b/drivers/edac/mv64x60_edac.c
> @@ -853,7 +853,11 @@ static struct platform_driver * const drivers[] = {
>   
>   static int __init mv64x60_edac_init(void)
>   {
> - int ret = 0;
> + int ret;
> +
> + ret = platform_register_drivers(drivers, ARRAY_SIZE(drivers));
> + if (ret)
> + return ret;

I actually think this was wrong. The edac_op_state is set as a module 
param and affects the behaviour of the driver probe function which on 
success could be called before we sanity check it below.

I'm back to thinking that my original v1 of just removing the unused ret 
was appropriate. If we still consider the driver too noisy we could just 
remove the printks.

What do you think?

>   
>   printk(KERN_INFO "Marvell MV64x60 EDAC driver " MV64x60_REVISION "\n");
>   printk(KERN_INFO "\t(C) 2006-2007 MontaVista Software\n");
> @@ -867,7 +871,7 @@ static int __init mv64x60_edac_init(void)
>   break;
>   }
>   
> - return platform_register_drivers(drivers, ARRAY_SIZE(drivers));
> + return 0;
>   }
>   module_init(mv64x60_edac_init);
>   
>

Re: [PATCH v3 1/3] EDAC: mv64x60: check driver registration success

2017-06-06 Thread Chris Packham

Hi Borislav,

On 30/05/17 09:21, Chris Packham wrote:
> Check the return status of platform_driver_register() in
> mv64x60_edac_init(). Only output messages and initialise the
> edac_op_state if the registration is successful.
> 
> Signed-off-by: Chris Packham 
> ---
> Changes in v3:
> - catch the retval of platform_register_drivers and return early on failure
>(thanks Borislav).
> 
>   drivers/edac/mv64x60_edac.c | 8 ++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/edac/mv64x60_edac.c b/drivers/edac/mv64x60_edac.c
> index 14b7e7b71eaa..172081551a70 100644
> --- a/drivers/edac/mv64x60_edac.c
> +++ b/drivers/edac/mv64x60_edac.c
> @@ -853,7 +853,11 @@ static struct platform_driver * const drivers[] = {
>   
>   static int __init mv64x60_edac_init(void)
>   {
> - int ret = 0;
> + int ret;
> +
> + ret = platform_register_drivers(drivers, ARRAY_SIZE(drivers));
> + if (ret)
> + return ret;

I actually think this was wrong. The edac_op_state is set as a module 
param and affects the behaviour of the driver probe function which on 
success could be called before we sanity check it below.

I'm back to thinking that my original v1 of just removing the unused ret 
was appropriate. If we still consider the driver too noisy we could just 
remove the printks.

What do you think?

>   
>   printk(KERN_INFO "Marvell MV64x60 EDAC driver " MV64x60_REVISION "\n");
>   printk(KERN_INFO "\t(C) 2006-2007 MontaVista Software\n");
> @@ -867,7 +871,7 @@ static int __init mv64x60_edac_init(void)
>   break;
>   }
>   
> - return platform_register_drivers(drivers, ARRAY_SIZE(drivers));
> + return 0;
>   }
>   module_init(mv64x60_edac_init);
>   
>

Re: [PATCH v9 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-06-06 Thread Anju T Sudhakar


Hi Thomas,

Thank you for reviewing the patch.


On Tuesday 06 June 2017 02:09 PM, Thomas Gleixner wrote:

On Mon, 5 Jun 2017, Anju T Sudhakar wrote:

+/*
+ * nest_imc_stop : Does OPAL call to stop nest engine.
+ */
+static void nest_imc_stop(int *cpu_opal_rc)
+{
+   int rc;
+
+   rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
+   if (rc)
+   cpu_opal_rc[smp_processor_id()] = 1;

What's wrong with just assigning the result directly?



yes. We can assign it directly. Will do this.




+/* nest_imc_start: Does the OPAL call to enable nest counters. */
+static void nest_imc_start(int *cpu_opal_rc)
+{
+   int rc;
+
+   rc = opal_imc_counters_start(OPAL_IMC_COUNTERS_NEST);
+   if (rc)
+   cpu_opal_rc[smp_processor_id()] = 1;
+}
+
+/*
+ * nest_imc_control: Function to start and stop nest imc engine.
+ *  The function is primarily called from event init
+ *  and event destroy.

As I don't see other call sites, what's the secondary usage?



This function is called from event init and event destroy only.
Will fix the comment here.




+ */
+static int nest_imc_control(int operation)
+{
+   int *cpus_opal_rc, cpu;
+
+   /*
+* Memory for OPAL call return value.
+*/
+   cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
+   if (!cpus_opal_rc)
+   return -ENOMEM;
+   switch (operation) {
+   case IMC_COUNTER_ENABLE:
+   /* Initialize Nest PMUs in each node using designated 
cpus */
+   on_each_cpu_mask(_imc_cpumask, 
(smp_call_func_t)nest_imc_start,
+   cpus_opal_rc, 1);

That's one indentation level too deep.


My bad. Will fix this.


+   break;
+   case IMC_COUNTER_DISABLE:
+   /* Disable the counters */
+   on_each_cpu_mask(_imc_cpumask, 
(smp_call_func_t)nest_imc_stop,
+   cpus_opal_rc, 1);
+   break;
+   default:
+   kfree(cpus_opal_rc);
+   return -EINVAL;
+
+   }
+
+   /* Check return value array for any OPAL call failure */
+   for_each_cpu(cpu, _imc_cpumask) {
+   if (cpus_opal_rc[cpu]) {
+   kfree(cpus_opal_rc);
+   return -ENODEV;
+   }
+   }
+   kfree(cpus_opal_rc);

So you have THREE places where you do kfree(cpus_opal_rc). Sigh.

You can spare that hole alloc/free dance by simply having

static struct cpumask nest_imc_result;

mutex_lock(_nest_reserve);
memset(_imc_result, 0, sizeof(nest_imc_result));

switch (op) {
case IMC_COUNTER_ENABLE:
on_each_cpu_mask(_imc_cpumask, nest_imc_start, NULL, 1);
break;

}

res = cpumask_empty(_imc_result) ? 0 : -ENODEV;
mutex_unlock(_nest_reserve);
return res;

And in the start/stop functions:

if (opal_imc_counters_xxx(OPAL_IMC_COUNTERS_NEST))
cpumask_set_cpu(smp_processor_id(), _imc_result);


Ok.


+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+   struct imc_pmu **pn = per_nest_pmu_arr;
+   int i;
+
+   if (old_cpu < 0 || new_cpu < 0)
+   return;
+
+   for (i = 0; *pn && i < IMC_MAX_PMUS; i++, pn++)
+   perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);
+
+}
+
+static int ppc_nest_imc_cpu_offline(unsigned int cpu)
+{
+   int nid, target = -1;
+   const struct cpumask *l_cpumask;
+
+   /*
+* Check in the designated list for this cpu. Dont bother
+* if not one of them.
+*/
+   if (!cpumask_test_and_clear_cpu(cpu, _imc_cpumask))
+   return 0;
+
+   /*
+* Now that this cpu is one of the designated,
+* find a next cpu a) which is online and b) in same chip.
+*/
+   nid = cpu_to_node(cpu);
+   l_cpumask = cpumask_of_node(nid);
+   target = cpumask_next(cpu, l_cpumask);

So if the outgoing CPU is the highest numbered CPU on the node, then
cpumask_next() will not find a CPU, while there still might be other lower
numbered CPUs online in the node.

target = cpumask_any_but(l_cpumask, cpu);

is what you want.


In the previous revisions we designated the smallest cpu in the mask. 
And then we re wrote the

code to pick next available cpu. But this is a nice catch.  :-) Thanks!
I will fix this.


+
+   /*
+* Update the cpumask with the target cpu and
+* migrate the context if needed
+*/
+   if (target >= 0 && target < nr_cpu_ids) {
+   cpumask_set_cpu(target, _imc_cpumask);
+   nest_change_cpu_context(cpu, target);
+   } else {
+   target = -1;
+   opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
+

Re: [PATCH v9 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-06-06 Thread Anju T Sudhakar


Hi Thomas,

Thank you for reviewing the patch.


On Tuesday 06 June 2017 02:09 PM, Thomas Gleixner wrote:

On Mon, 5 Jun 2017, Anju T Sudhakar wrote:

+/*
+ * nest_imc_stop : Does OPAL call to stop nest engine.
+ */
+static void nest_imc_stop(int *cpu_opal_rc)
+{
+   int rc;
+
+   rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
+   if (rc)
+   cpu_opal_rc[smp_processor_id()] = 1;

What's wrong with just assigning the result directly?



yes. We can assign it directly. Will do this.




+/* nest_imc_start: Does the OPAL call to enable nest counters. */
+static void nest_imc_start(int *cpu_opal_rc)
+{
+   int rc;
+
+   rc = opal_imc_counters_start(OPAL_IMC_COUNTERS_NEST);
+   if (rc)
+   cpu_opal_rc[smp_processor_id()] = 1;
+}
+
+/*
+ * nest_imc_control: Function to start and stop nest imc engine.
+ *  The function is primarily called from event init
+ *  and event destroy.

As I don't see other call sites, what's the secondary usage?



This function is called from event init and event destroy only.
Will fix the comment here.




+ */
+static int nest_imc_control(int operation)
+{
+   int *cpus_opal_rc, cpu;
+
+   /*
+* Memory for OPAL call return value.
+*/
+   cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
+   if (!cpus_opal_rc)
+   return -ENOMEM;
+   switch (operation) {
+   case IMC_COUNTER_ENABLE:
+   /* Initialize Nest PMUs in each node using designated 
cpus */
+   on_each_cpu_mask(_imc_cpumask, 
(smp_call_func_t)nest_imc_start,
+   cpus_opal_rc, 1);

That's one indentation level too deep.


My bad. Will fix this.


+   break;
+   case IMC_COUNTER_DISABLE:
+   /* Disable the counters */
+   on_each_cpu_mask(_imc_cpumask, 
(smp_call_func_t)nest_imc_stop,
+   cpus_opal_rc, 1);
+   break;
+   default:
+   kfree(cpus_opal_rc);
+   return -EINVAL;
+
+   }
+
+   /* Check return value array for any OPAL call failure */
+   for_each_cpu(cpu, _imc_cpumask) {
+   if (cpus_opal_rc[cpu]) {
+   kfree(cpus_opal_rc);
+   return -ENODEV;
+   }
+   }
+   kfree(cpus_opal_rc);

So you have THREE places where you do kfree(cpus_opal_rc). Sigh.

You can spare that hole alloc/free dance by simply having

static struct cpumask nest_imc_result;

mutex_lock(_nest_reserve);
memset(_imc_result, 0, sizeof(nest_imc_result));

switch (op) {
case IMC_COUNTER_ENABLE:
on_each_cpu_mask(_imc_cpumask, nest_imc_start, NULL, 1);
break;

}

res = cpumask_empty(_imc_result) ? 0 : -ENODEV;
mutex_unlock(_nest_reserve);
return res;

And in the start/stop functions:

if (opal_imc_counters_xxx(OPAL_IMC_COUNTERS_NEST))
cpumask_set_cpu(smp_processor_id(), _imc_result);


Ok.


+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+   struct imc_pmu **pn = per_nest_pmu_arr;
+   int i;
+
+   if (old_cpu < 0 || new_cpu < 0)
+   return;
+
+   for (i = 0; *pn && i < IMC_MAX_PMUS; i++, pn++)
+   perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);
+
+}
+
+static int ppc_nest_imc_cpu_offline(unsigned int cpu)
+{
+   int nid, target = -1;
+   const struct cpumask *l_cpumask;
+
+   /*
+* Check in the designated list for this cpu. Dont bother
+* if not one of them.
+*/
+   if (!cpumask_test_and_clear_cpu(cpu, _imc_cpumask))
+   return 0;
+
+   /*
+* Now that this cpu is one of the designated,
+* find a next cpu a) which is online and b) in same chip.
+*/
+   nid = cpu_to_node(cpu);
+   l_cpumask = cpumask_of_node(nid);
+   target = cpumask_next(cpu, l_cpumask);

So if the outgoing CPU is the highest numbered CPU on the node, then
cpumask_next() will not find a CPU, while there still might be other lower
numbered CPUs online in the node.

target = cpumask_any_but(l_cpumask, cpu);

is what you want.


In the previous revisions we designated the smallest cpu in the mask. 
And then we re wrote the

code to pick next available cpu. But this is a nice catch.  :-) Thanks!
I will fix this.


+
+   /*
+* Update the cpumask with the target cpu and
+* migrate the context if needed
+*/
+   if (target >= 0 && target < nr_cpu_ids) {
+   cpumask_set_cpu(target, _imc_cpumask);
+   nest_change_cpu_context(cpu, target);
+   } else {
+   target = -1;
+   opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
+

page table questions

2017-06-06 Thread meng chen

hello ,how to know wheather a pmd maps a 2M physical page or not??
if(pmd_trans_huge(*pmd)){
if(pte_present(*pmd)){
/*page is in ram?*/
  }
}

can I judge it according the above codes??
thanks a lot!!

page table questions

2017-06-06 Thread meng chen

hello ,how to know wheather a pmd maps a 2M physical page or not??
if(pmd_trans_huge(*pmd)){
if(pte_present(*pmd)){
/*page is in ram?*/
  }
}

can I judge it according the above codes??
thanks a lot!!

Re: [PATCH 0/3] target: Add TARGET_SCF_LOOKUP_LUN_FROM_TAG support

2017-06-06 Thread Madhani, Himanshu


> On Jun 3, 2017, at 3:10 PM, Nicholas A. Bellinger  
> wrote:
> 
> From: Nicholas Bellinger 
> 
> Hi Himanshu + Quinn,
> 
> Here is a small series to introduce proper percpu se_lun->lun_ref
> counting for TMR, and add common code in target_submit_tmr() to
> do tag lookup for unpacked_lun in order to drop the original
> driver specific lookup within __qlt_24xx_handle_abts().
> 
> It's rather straight-forward, so review and test as a v4.13 item.
> 
> Thanks!
> 
> Nicholas Bellinger (3):
>  target: Add support for TMR percpu reference counting
>  target: Add TARGET_SCF_LOOKUP_LUN_FROM_TAG support for ABORT_TASK
>  qla2xxx: Convert QLA_TGT_ABTS to TARGET_SCF_LOOKUP_LUN_FROM_TAG
> 
> drivers/scsi/qla2xxx/qla_target.c  | 39 ++-
> drivers/scsi/qla2xxx/tcm_qla2xxx.c |  4 ++-
> drivers/target/target_core_device.c| 14 ++---
> drivers/target/target_core_transport.c | 56 --
> include/target/target_core_base.h  |  3 +-
> 5 files changed, 71 insertions(+), 45 deletions(-)
> 
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Series looks good. 

Acked-by: Himanshu Madhani 

Thanks,
- Himanshu

Re: [PATCH 0/3] target: Add TARGET_SCF_LOOKUP_LUN_FROM_TAG support

2017-06-06 Thread Madhani, Himanshu


> On Jun 3, 2017, at 3:10 PM, Nicholas A. Bellinger  
> wrote:
> 
> From: Nicholas Bellinger 
> 
> Hi Himanshu + Quinn,
> 
> Here is a small series to introduce proper percpu se_lun->lun_ref
> counting for TMR, and add common code in target_submit_tmr() to
> do tag lookup for unpacked_lun in order to drop the original
> driver specific lookup within __qlt_24xx_handle_abts().
> 
> It's rather straight-forward, so review and test as a v4.13 item.
> 
> Thanks!
> 
> Nicholas Bellinger (3):
>  target: Add support for TMR percpu reference counting
>  target: Add TARGET_SCF_LOOKUP_LUN_FROM_TAG support for ABORT_TASK
>  qla2xxx: Convert QLA_TGT_ABTS to TARGET_SCF_LOOKUP_LUN_FROM_TAG
> 
> drivers/scsi/qla2xxx/qla_target.c  | 39 ++-
> drivers/scsi/qla2xxx/tcm_qla2xxx.c |  4 ++-
> drivers/target/target_core_device.c| 14 ++---
> drivers/target/target_core_transport.c | 56 --
> include/target/target_core_base.h  |  3 +-
> 5 files changed, 71 insertions(+), 45 deletions(-)
> 
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Series looks good. 

Acked-by: Himanshu Madhani 

Thanks,
- Himanshu

Re: [kernel-hardening] Re: [PATCH v3 04/13] crypto/rng: ensure that the RNG is ready before using

2017-06-06 Thread Stephan Müller

Am Mittwoch, 7. Juni 2017, 00:19:10 CEST schrieb Henrique de Moraes Holschuh:

Hi Henrique,

> On that same idea, one could add an early_initramfs handler for entropy
> data.

Any data that comes from outside during the boot process, be it some NVRAM 
location, the /var/lib...seed file for /dev/random or other approaches are 
viewed by a number of folks to have zero bits of entropy.

I.e. this data is nice for stirring the pool, but is not considered to help 
our entropy problem.

Ciao
Stephan

Re: [kernel-hardening] Re: [PATCH v3 04/13] crypto/rng: ensure that the RNG is ready before using

2017-06-06 Thread Stephan Müller

Am Mittwoch, 7. Juni 2017, 00:19:10 CEST schrieb Henrique de Moraes Holschuh:

Hi Henrique,

> On that same idea, one could add an early_initramfs handler for entropy
> data.

Any data that comes from outside during the boot process, be it some NVRAM 
location, the /var/lib...seed file for /dev/random or other approaches are 
viewed by a number of folks to have zero bits of entropy.

I.e. this data is nice for stirring the pool, but is not considered to help 
our entropy problem.

Ciao
Stephan

Re: [PATCH] Revert "mm: vmpressure: fix sending wrong events on underflow"

2017-06-06 Thread zhong jiang

On 2017/6/7 11:55, Minchan Kim wrote:
> On Wed, Jun 07, 2017 at 11:08:37AM +0800, zhongjiang wrote:
>> This reverts commit e1587a4945408faa58d0485002c110eb2454740c.
>>
>> THP lru page is reclaimed , THP is split to normal page and loop again.
>> reclaimed pages should not be bigger than nr_scan.  because of each
>> loop will increase nr_scan counter.
> Unfortunately, there is still underflow issue caused by slab pages as
> Vinayak reported in description of e1587a4945408 so we cannot revert.
> Please correct comment instead of removing the logic.
>
> Thanks.
  we calculate the vmpressue based on the Lru page, exclude the slab pages by 
previous
  discussion.is it not this?

  Thanks
 zhongjiang
>> Signed-off-by: zhongjiang 
>> ---
>>  mm/vmpressure.c | 10 +-
>>  1 file changed, 1 insertion(+), 9 deletions(-)
>>
>> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
>> index 6063581..149fdf6 100644
>> --- a/mm/vmpressure.c
>> +++ b/mm/vmpressure.c
>> @@ -112,16 +112,9 @@ static enum vmpressure_levels 
>> vmpressure_calc_level(unsigned long scanned,
>>  unsigned long reclaimed)
>>  {
>>  unsigned long scale = scanned + reclaimed;
>> -unsigned long pressure = 0;
>> +unsigned long pressure;
>>  
>>  /*
>> - * reclaimed can be greater than scanned in cases
>> - * like THP, where the scanned is 1 and reclaimed
>> - * could be 512
>> - */
>> -if (reclaimed >= scanned)
>> -goto out;
>> -/*
>>   * We calculate the ratio (in percents) of how many pages were
>>   * scanned vs. reclaimed in a given time frame (window). Note that
>>   * time is in VM reclaimer's "ticks", i.e. number of pages
>> @@ -131,7 +124,6 @@ static enum vmpressure_levels 
>> vmpressure_calc_level(unsigned long scanned,
>>  pressure = scale - (reclaimed * scale / scanned);
>>  pressure = pressure * 100 / scale;
>>  
>> -out:
>>  pr_debug("%s: %3lu  (s: %lu  r: %lu)\n", __func__, pressure,
>>   scanned, reclaimed);
>>  
>> -- 
>> 1.7.12.4
>>
> .
>

Re: [PATCH] Revert "mm: vmpressure: fix sending wrong events on underflow"

2017-06-06 Thread zhong jiang

On 2017/6/7 11:55, Minchan Kim wrote:
> On Wed, Jun 07, 2017 at 11:08:37AM +0800, zhongjiang wrote:
>> This reverts commit e1587a4945408faa58d0485002c110eb2454740c.
>>
>> THP lru page is reclaimed , THP is split to normal page and loop again.
>> reclaimed pages should not be bigger than nr_scan.  because of each
>> loop will increase nr_scan counter.
> Unfortunately, there is still underflow issue caused by slab pages as
> Vinayak reported in description of e1587a4945408 so we cannot revert.
> Please correct comment instead of removing the logic.
>
> Thanks.
  we calculate the vmpressue based on the Lru page, exclude the slab pages by 
previous
  discussion.is it not this?

  Thanks
 zhongjiang
>> Signed-off-by: zhongjiang 
>> ---
>>  mm/vmpressure.c | 10 +-
>>  1 file changed, 1 insertion(+), 9 deletions(-)
>>
>> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
>> index 6063581..149fdf6 100644
>> --- a/mm/vmpressure.c
>> +++ b/mm/vmpressure.c
>> @@ -112,16 +112,9 @@ static enum vmpressure_levels 
>> vmpressure_calc_level(unsigned long scanned,
>>  unsigned long reclaimed)
>>  {
>>  unsigned long scale = scanned + reclaimed;
>> -unsigned long pressure = 0;
>> +unsigned long pressure;
>>  
>>  /*
>> - * reclaimed can be greater than scanned in cases
>> - * like THP, where the scanned is 1 and reclaimed
>> - * could be 512
>> - */
>> -if (reclaimed >= scanned)
>> -goto out;
>> -/*
>>   * We calculate the ratio (in percents) of how many pages were
>>   * scanned vs. reclaimed in a given time frame (window). Note that
>>   * time is in VM reclaimer's "ticks", i.e. number of pages
>> @@ -131,7 +124,6 @@ static enum vmpressure_levels 
>> vmpressure_calc_level(unsigned long scanned,
>>  pressure = scale - (reclaimed * scale / scanned);
>>  pressure = pressure * 100 / scale;
>>  
>> -out:
>>  pr_debug("%s: %3lu  (s: %lu  r: %lu)\n", __func__, pressure,
>>   scanned, reclaimed);
>>  
>> -- 
>> 1.7.12.4
>>
> .
>

Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

2017-06-06 Thread Andy Lutomirski

On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez  wrote:
> On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
>> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
>> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
>
> We rely on swait, and swait right now only uses -ERESTARTSYS. Are
> you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
> or -ERESTARTNOHAND if we see fit for some future functionality / need ?

I think that has essentially nothing to do with swait.  User code does
some syscall.  That syscall triggers a firmware load.  The caller gets
a signal.  If you're going to let firmware load get interrupted, you
need to consider what the syscall is.

Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

2017-06-06 Thread Andy Lutomirski

On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez  wrote:
> On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
>> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
>> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
>
> We rely on swait, and swait right now only uses -ERESTARTSYS. Are
> you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
> or -ERESTARTNOHAND if we see fit for some future functionality / need ?

I think that has essentially nothing to do with swait.  User code does
some syscall.  That syscall triggers a firmware load.  The caller gets
a signal.  If you're going to let firmware load get interrupted, you
need to consider what the syscall is.

[PATCH v5] ACPICA: Tables: Add mechanism to allow to balance late stage acpi_get_table() independently

2017-06-06 Thread Lv Zheng

Considering this case:
1. A program opens a sysfs table file 65535 times, it can increase
   validation_count and first increment cause the table to be mapped:
validation_count = 65535
2. AML execution causes "Load" to be executed on the same table, this time
   it cannot increase validation_count, so validation_count remains:
validation_count = 65535
3. The program closes sysfs table file 65535 times, it can decrease
   validation_count and the last decrement cause the table to be unmapped:
validation_count = 0
4. AML code still accessing the loaded table, kernel crash can be observed.

This is because orginally ACPICA doesn't support unmapping tables during
OS late stage. So the current code only allows unmapping tables during OS
early stage, and for late stage, no acpi_put_table() clones should be
invoked, especially cases that can trigger frequent invocations of
acpi_get_table()/acpi_put_table() are forbidden:
1. sysfs table accesses
2. dynamic Load/Unload opcode executions
3. acpi_load_table()
4. etc.
Such frequent acpi_put_table() balance changes have to be done altogether.

This philosophy is not convenient for Linux driver writers. Since the API
is just there, developers will start to use acpi_put_table() during late
stage. So we need to consider a better mechanism to allow them to safely
invoke acpi_put_table().

This patch provides such a mechanism by adding a validation_count
threashold. When it is reached, the validation_count can no longer be
incremented/decremented to invalidate the table descriptor (means
preventing table unmappings) so that acpi_put_table() balance changes can be
done independently to each others.

Note: code added in acpi_tb_put_table() is actually a no-op but changes the
warning message into a warning once message. Lv Zheng.

Signed-off-by: Lv Zheng 
---
 drivers/acpi/acpica/tbutils.c | 36 +++-
 include/acpi/actbl.h  | 13 +
 2 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index cd96026..4048523 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -416,9 +416,19 @@ acpi_tb_get_table(struct acpi_table_desc *table_desc,
}
}
 
-   table_desc->validation_count++;
-   if (table_desc->validation_count == 0) {
-   table_desc->validation_count--;
+   if (table_desc->validation_count < ACPI_MAX_TABLE_VALIDATIONS) {
+   table_desc->validation_count++;
+
+   /*
+* Ensure the warning message can only be displayed once. The
+* warning message occurs when the "get" operations are 
performed
+* more than "put" operations, causing count overflow.
+*/
+   if (table_desc->validation_count >= ACPI_MAX_TABLE_VALIDATIONS) 
{
+   ACPI_WARNING((AE_INFO,
+ "Table %p, Validation count overflows\n",
+ table_desc));
+   }
}
 
*out_table = table_desc->pointer;
@@ -445,13 +455,21 @@ void acpi_tb_put_table(struct acpi_table_desc *table_desc)
 
ACPI_FUNCTION_TRACE(acpi_tb_put_table);
 
-   if (table_desc->validation_count == 0) {
-   ACPI_WARNING((AE_INFO,
- "Table %p, Validation count is zero before 
decrement\n",
- table_desc));
-   return_VOID;
+   if (table_desc->validation_count < ACPI_MAX_TABLE_VALIDATIONS) {
+   table_desc->validation_count--;
+
+   /*
+* Ensure the warning message can only be displayed once. The
+* warning message occurs when the "put" operations are 
performed
+* more than "get" operations, causing count underflow.
+*/
+   if (table_desc->validation_count >= ACPI_MAX_TABLE_VALIDATIONS) 
{
+   ACPI_WARNING((AE_INFO,
+ "Table %p, Validation count underflows\n",
+ table_desc));
+   return_VOID;
+   }
}
-   table_desc->validation_count--;
 
if (table_desc->validation_count == 0) {
 
diff --git a/include/acpi/actbl.h b/include/acpi/actbl.h
index d92543f..f42e6d5 100644
--- a/include/acpi/actbl.h
+++ b/include/acpi/actbl.h
@@ -374,6 +374,19 @@ struct acpi_table_desc {
u16 validation_count;
 };
 
+/*
+ * Maximum validation count, when it is reached, validation count can no
+ * longer be changed. Which means, the table can no longer be invalidated.
+ * This mechanism is implemented for backward compatibility, where in OS
+ * late stage, old drivers are not facilitated with paired validations and
+ * invalidations.
+ * The maximum validation count can be defined to any value, but should be
+ *

[PATCH v5] ACPICA: Tables: Add mechanism to allow to balance late stage acpi_get_table() independently

2017-06-06 Thread Lv Zheng

Considering this case:
1. A program opens a sysfs table file 65535 times, it can increase
   validation_count and first increment cause the table to be mapped:
validation_count = 65535
2. AML execution causes "Load" to be executed on the same table, this time
   it cannot increase validation_count, so validation_count remains:
validation_count = 65535
3. The program closes sysfs table file 65535 times, it can decrease
   validation_count and the last decrement cause the table to be unmapped:
validation_count = 0
4. AML code still accessing the loaded table, kernel crash can be observed.

This is because orginally ACPICA doesn't support unmapping tables during
OS late stage. So the current code only allows unmapping tables during OS
early stage, and for late stage, no acpi_put_table() clones should be
invoked, especially cases that can trigger frequent invocations of
acpi_get_table()/acpi_put_table() are forbidden:
1. sysfs table accesses
2. dynamic Load/Unload opcode executions
3. acpi_load_table()
4. etc.
Such frequent acpi_put_table() balance changes have to be done altogether.

This philosophy is not convenient for Linux driver writers. Since the API
is just there, developers will start to use acpi_put_table() during late
stage. So we need to consider a better mechanism to allow them to safely
invoke acpi_put_table().

This patch provides such a mechanism by adding a validation_count
threashold. When it is reached, the validation_count can no longer be
incremented/decremented to invalidate the table descriptor (means
preventing table unmappings) so that acpi_put_table() balance changes can be
done independently to each others.

Note: code added in acpi_tb_put_table() is actually a no-op but changes the
warning message into a warning once message. Lv Zheng.

Signed-off-by: Lv Zheng 
---
 drivers/acpi/acpica/tbutils.c | 36 +++-
 include/acpi/actbl.h  | 13 +
 2 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index cd96026..4048523 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -416,9 +416,19 @@ acpi_tb_get_table(struct acpi_table_desc *table_desc,
}
}
 
-   table_desc->validation_count++;
-   if (table_desc->validation_count == 0) {
-   table_desc->validation_count--;
+   if (table_desc->validation_count < ACPI_MAX_TABLE_VALIDATIONS) {
+   table_desc->validation_count++;
+
+   /*
+* Ensure the warning message can only be displayed once. The
+* warning message occurs when the "get" operations are 
performed
+* more than "put" operations, causing count overflow.
+*/
+   if (table_desc->validation_count >= ACPI_MAX_TABLE_VALIDATIONS) 
{
+   ACPI_WARNING((AE_INFO,
+ "Table %p, Validation count overflows\n",
+ table_desc));
+   }
}
 
*out_table = table_desc->pointer;
@@ -445,13 +455,21 @@ void acpi_tb_put_table(struct acpi_table_desc *table_desc)
 
ACPI_FUNCTION_TRACE(acpi_tb_put_table);
 
-   if (table_desc->validation_count == 0) {
-   ACPI_WARNING((AE_INFO,
- "Table %p, Validation count is zero before 
decrement\n",
- table_desc));
-   return_VOID;
+   if (table_desc->validation_count < ACPI_MAX_TABLE_VALIDATIONS) {
+   table_desc->validation_count--;
+
+   /*
+* Ensure the warning message can only be displayed once. The
+* warning message occurs when the "put" operations are 
performed
+* more than "get" operations, causing count underflow.
+*/
+   if (table_desc->validation_count >= ACPI_MAX_TABLE_VALIDATIONS) 
{
+   ACPI_WARNING((AE_INFO,
+ "Table %p, Validation count underflows\n",
+ table_desc));
+   return_VOID;
+   }
}
-   table_desc->validation_count--;
 
if (table_desc->validation_count == 0) {
 
diff --git a/include/acpi/actbl.h b/include/acpi/actbl.h
index d92543f..f42e6d5 100644
--- a/include/acpi/actbl.h
+++ b/include/acpi/actbl.h
@@ -374,6 +374,19 @@ struct acpi_table_desc {
u16 validation_count;
 };
 
+/*
+ * Maximum validation count, when it is reached, validation count can no
+ * longer be changed. Which means, the table can no longer be invalidated.
+ * This mechanism is implemented for backward compatibility, where in OS
+ * late stage, old drivers are not facilitated with paired validations and
+ * invalidations.
+ * The maximum validation count can be defined to any value, but should be
+ * greater than the

Re: [PATCH v5 9/9] nvmet: allow overriding the NVMe VS via configfs

2017-06-06 Thread Guan Junxiong

Hi,Johannes

On 2017/6/6 22:36, Johannes Thumshirn wrote:
> Allow overriding the announced NVMe Version of a via configfs.

> +static ssize_t nvmet_subsys_version_store(struct config_item *item,
> +const char *page, size_t count)
> +{
> + struct nvmet_subsys *subsys = to_subsys(item);
> + int major, minor, tertiary;
> + int ret;
> +
> +
> + ret = sscanf(page, "%d.%d.%d\n", , , );
> + if (ret != 2 && ret != 3)
> + return -EINVAL;
> +
> + down_write(_config_sem);
> + subsys->ver = NVME_VS(major, minor, tertiary);

tertiary variable is not initialized, which could induce garbage info if ret = 2

> +++ b/include/linux/nvme.h
> @@ -1070,4 +1070,8 @@ struct nvme_completion {
>  #define NVME_VS(major, minor, tertiary) \
>   (((major) << 16) | ((minor) << 8) | (tertiary))
>  
> +#define NVME_MAJOR(ver)  ((ver) >> 16)
> +#define NVME_MINOR(ver)  (((ver) >> 8) & 0xff)
> +#define NVME_TERTIARY(ver)   ((ver) & 0xff)the above line could be aligned 
> with the preceding two lines

Except for those above, others looks good for me.

Reviewed-by: Guan Junxiong

Re: [PATCH v5 9/9] nvmet: allow overriding the NVMe VS via configfs

2017-06-06 Thread Guan Junxiong

Hi,Johannes

On 2017/6/6 22:36, Johannes Thumshirn wrote:
> Allow overriding the announced NVMe Version of a via configfs.

> +static ssize_t nvmet_subsys_version_store(struct config_item *item,
> +const char *page, size_t count)
> +{
> + struct nvmet_subsys *subsys = to_subsys(item);
> + int major, minor, tertiary;
> + int ret;
> +
> +
> + ret = sscanf(page, "%d.%d.%d\n", , , );
> + if (ret != 2 && ret != 3)
> + return -EINVAL;
> +
> + down_write(_config_sem);
> + subsys->ver = NVME_VS(major, minor, tertiary);

tertiary variable is not initialized, which could induce garbage info if ret = 2

> +++ b/include/linux/nvme.h
> @@ -1070,4 +1070,8 @@ struct nvme_completion {
>  #define NVME_VS(major, minor, tertiary) \
>   (((major) << 16) | ((minor) << 8) | (tertiary))
>  
> +#define NVME_MAJOR(ver)  ((ver) >> 16)
> +#define NVME_MINOR(ver)  (((ver) >> 8) & 0xff)
> +#define NVME_TERTIARY(ver)   ((ver) & 0xff)the above line could be aligned 
> with the preceding two lines

Except for those above, others looks good for me.

Reviewed-by: Guan Junxiong

Re: [PATCH 3/5] mfd: tps65217: Add a dependency on OF

2017-06-06 Thread Keerthy



On Tuesday 06 June 2017 08:34 PM, Enric Balletbo Serra wrote:
> Hi Keerthy,
> 
> By change I was looking at this. Some comments below that I think can
> be applied to all patches in this series
> 
> 2017-06-06 16:45 GMT+02:00 Keerthy :
>> Currently the driver boots only via device tree hence add a
>> dependency on OF.
>>
>> Signed-off-by: Keerthy 
>> ---
>>  drivers/mfd/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
>> index 75b59f1..2d1425d 100644
>> --- a/drivers/mfd/Kconfig
>> +++ b/drivers/mfd/Kconfig
>> @@ -1297,7 +1297,7 @@ config MFD_TPS65090
>>
>>  config MFD_TPS65217
>> tristate "TI TPS65217 Power Management / White LED chips"
>> -   depends on I2C
>> +   depends on I2C && OF
> 
> Shouldn't you add || COMPILE_TEST here ?

Sure.

> 
>> select MFD_CORE
>> select REGMAP_I2C
>> select IRQ_DOMAIN
>> --
>> 1.9.1
>>
> 
> I think you can remove the of_match_device checks in some drivers too
> 
> i.e:
> 
> http://elixir.free-electrons.com/linux/latest/source/drivers/mfd/tps65217.c#L330

Yes that and removal of unused i2c_device_id. I will follow it up once
this OF dependency is in.

> 
> Regards,
>  Enric
>

Re: [PATCH 3/5] mfd: tps65217: Add a dependency on OF

2017-06-06 Thread Keerthy



On Tuesday 06 June 2017 08:34 PM, Enric Balletbo Serra wrote:
> Hi Keerthy,
> 
> By change I was looking at this. Some comments below that I think can
> be applied to all patches in this series
> 
> 2017-06-06 16:45 GMT+02:00 Keerthy :
>> Currently the driver boots only via device tree hence add a
>> dependency on OF.
>>
>> Signed-off-by: Keerthy 
>> ---
>>  drivers/mfd/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
>> index 75b59f1..2d1425d 100644
>> --- a/drivers/mfd/Kconfig
>> +++ b/drivers/mfd/Kconfig
>> @@ -1297,7 +1297,7 @@ config MFD_TPS65090
>>
>>  config MFD_TPS65217
>> tristate "TI TPS65217 Power Management / White LED chips"
>> -   depends on I2C
>> +   depends on I2C && OF
> 
> Shouldn't you add || COMPILE_TEST here ?

Sure.

> 
>> select MFD_CORE
>> select REGMAP_I2C
>> select IRQ_DOMAIN
>> --
>> 1.9.1
>>
> 
> I think you can remove the of_match_device checks in some drivers too
> 
> i.e:
> 
> http://elixir.free-electrons.com/linux/latest/source/drivers/mfd/tps65217.c#L330

Yes that and removal of unused i2c_device_id. I will follow it up once
this OF dependency is in.

> 
> Regards,
>  Enric
>

linux-next: manual merge of the kselftest tree with the jc_docs tree

2017-06-06 Thread Stephen Rothwell

Hi Shuah,

Today's linux-next merge of the kselftest tree got a conflict in:

  MAINTAINERS

between commit:

  c061f33f35be ("doc: ReSTify seccomp_filter.txt")

from the jc_docs tree and commit:

  11fe3c3f54f9 ("selftests: Make test_harness.h more generally available")

from the kselftest tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc MAINTAINERS
index d0cda330adbb,ef292b8c771d..
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@@ -11599,7 -11492,7 +11599,8 @@@ F:   kernel/seccomp.
  F:include/uapi/linux/seccomp.h
  F:include/linux/seccomp.h
  F:tools/testing/selftests/seccomp/*
+ F:tools/testing/selftests/kselftest_harness.h
 +F:Documentation/userspace-api/seccomp_filter.rst
  K:\bsecure_computing
  K:\bTIF_SECCOMP\b

linux-next: manual merge of the kselftest tree with the jc_docs tree

2017-06-06 Thread Stephen Rothwell

Hi Shuah,

Today's linux-next merge of the kselftest tree got a conflict in:

  MAINTAINERS

between commit:

  c061f33f35be ("doc: ReSTify seccomp_filter.txt")

from the jc_docs tree and commit:

  11fe3c3f54f9 ("selftests: Make test_harness.h more generally available")

from the kselftest tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc MAINTAINERS
index d0cda330adbb,ef292b8c771d..
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@@ -11599,7 -11492,7 +11599,8 @@@ F:   kernel/seccomp.
  F:include/uapi/linux/seccomp.h
  F:include/linux/seccomp.h
  F:tools/testing/selftests/seccomp/*
+ F:tools/testing/selftests/kselftest_harness.h
 +F:Documentation/userspace-api/seccomp_filter.rst
  K:\bsecure_computing
  K:\bTIF_SECCOMP\b

Re: Regression on ARMs in next-20170531

2017-06-06 Thread Tony Lindgren

* Andrew Morton <a...@linux-foundation.org> [170606 08:07]:
> On Tue, 6 Jun 2017 10:36:48 -0400 Johannes Weiner <han...@cmpxchg.org> wrote:
> 
> > On Tue, Jun 06, 2017 at 05:30:10AM -0700, Tony Lindgren wrote:
> > > PC is at __mod_node_page_state+0x2c/0xc8
> > > LR is at __per_cpu_offset+0x0/0x8
> > > pc : []lr : []psr: 21d3
> > > sp : c0d01eec  ip :   fp : 0001
> > > r10: c0c7cf68  r9 : 8000  r8 : 
> > > r7 : 0001  r6 : 0006  r5 : 2ea2d000  r4 : 0007
> > > r3 : 0007  r2 : 0001  r1 : 0006  r0 : c0dc1fc0
> > > Flags: nzCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment none
> > > Control: 10c5387d  Table: 8000404a  DAC: 0051
> > > Process swapper (pid: 0, stack limit = 0xc0d00218)
> > > Stack: (0xc0d01eec to 0xc0d02000)
> > > 1ee0:41d3 c0dc1fc0 c028018c 0001 
> > > c1599440
> > > 1f00: c0d58834 efd83000  c02af214 0100 c157a890 2000 
> > > 8000
> > > 1f20: 0001 0001 8000 c02aeb4c  8000 c0d58834 
> > > 8000
> > > 1f40: 01008000 c0c23a88 c0d58834 c1580034 41d3 c02afa9c  
> > > c086b230
> > > 1f60: c0d58834 00c0 0100 c157a78c c0abe0fc 0080 2000 
> > > c0dd4000
> > > 1f80: efffec40 c0c55a48  c0c23a88 c157a78c c0c5be48 c0c5bde8 
> > > c157a890
> > > 1fa0: c0dd4000 c0c25a9c   c0dd4000 c0d07940 c0dd4000 
> > > c0c00abc
> > > 1fc0:    c0c006a0  c0c55a48 c0dd4214 
> > > c0d07958
> > > 1fe0: c0c55a44 c0d0cae4 8000406a 411fc093  8000807c  
> > > 
> > > [] (__mod_node_page_state) from [] 
> > > (mod_node_page_state+0x2c/0x4c)
> > > [] (mod_node_page_state) from [] 
> > > (cache_alloc_refill+0x654/0x898)
> > > [] (cache_alloc_refill) from [] 
> > > (kmem_cache_alloc+0x2d4/0x364)
> > > [] (kmem_cache_alloc) from [] 
> > > (create_kmalloc_cache+0x20/0x8c)
> > > [] (create_kmalloc_cache) from [] 
> > > (kmem_cache_init+0xac/0x11c)
> > > [] (kmem_cache_init) from [] 
> > > (start_kernel+0x1b8/0x3d8)
> > 
> > That's the one Russell analyzed and I misinterpreted. We put a fix
> > into -next to initialize pgdat->per_cpu_nodestats in time for slab
> > initialization during boot.

OK

> > Is today's -next working again?

Yes just tested that next-20170606 is working again thanks!

> I'll be even less than usually functional for the next week, so please
> cc Stephen on any -next hotfixes.

OK good to know if new issues show up.

Regards,

Tony

Re: Regression on ARMs in next-20170531

2017-06-06 Thread Tony Lindgren

* Andrew Morton  [170606 08:07]:
> On Tue, 6 Jun 2017 10:36:48 -0400 Johannes Weiner  wrote:
> 
> > On Tue, Jun 06, 2017 at 05:30:10AM -0700, Tony Lindgren wrote:
> > > PC is at __mod_node_page_state+0x2c/0xc8
> > > LR is at __per_cpu_offset+0x0/0x8
> > > pc : []lr : []psr: 21d3
> > > sp : c0d01eec  ip :   fp : 0001
> > > r10: c0c7cf68  r9 : 8000  r8 : 
> > > r7 : 0001  r6 : 0006  r5 : 2ea2d000  r4 : 0007
> > > r3 : 0007  r2 : 0001  r1 : 0006  r0 : c0dc1fc0
> > > Flags: nzCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment none
> > > Control: 10c5387d  Table: 8000404a  DAC: 0051
> > > Process swapper (pid: 0, stack limit = 0xc0d00218)
> > > Stack: (0xc0d01eec to 0xc0d02000)
> > > 1ee0:41d3 c0dc1fc0 c028018c 0001 
> > > c1599440
> > > 1f00: c0d58834 efd83000  c02af214 0100 c157a890 2000 
> > > 8000
> > > 1f20: 0001 0001 8000 c02aeb4c  8000 c0d58834 
> > > 8000
> > > 1f40: 01008000 c0c23a88 c0d58834 c1580034 41d3 c02afa9c  
> > > c086b230
> > > 1f60: c0d58834 00c0 0100 c157a78c c0abe0fc 0080 2000 
> > > c0dd4000
> > > 1f80: efffec40 c0c55a48  c0c23a88 c157a78c c0c5be48 c0c5bde8 
> > > c157a890
> > > 1fa0: c0dd4000 c0c25a9c   c0dd4000 c0d07940 c0dd4000 
> > > c0c00abc
> > > 1fc0:    c0c006a0  c0c55a48 c0dd4214 
> > > c0d07958
> > > 1fe0: c0c55a44 c0d0cae4 8000406a 411fc093  8000807c  
> > > 
> > > [] (__mod_node_page_state) from [] 
> > > (mod_node_page_state+0x2c/0x4c)
> > > [] (mod_node_page_state) from [] 
> > > (cache_alloc_refill+0x654/0x898)
> > > [] (cache_alloc_refill) from [] 
> > > (kmem_cache_alloc+0x2d4/0x364)
> > > [] (kmem_cache_alloc) from [] 
> > > (create_kmalloc_cache+0x20/0x8c)
> > > [] (create_kmalloc_cache) from [] 
> > > (kmem_cache_init+0xac/0x11c)
> > > [] (kmem_cache_init) from [] 
> > > (start_kernel+0x1b8/0x3d8)
> > 
> > That's the one Russell analyzed and I misinterpreted. We put a fix
> > into -next to initialize pgdat->per_cpu_nodestats in time for slab
> > initialization during boot.

OK

> > Is today's -next working again?

Yes just tested that next-20170606 is working again thanks!

> I'll be even less than usually functional for the next week, so please
> cc Stephen on any -next hotfixes.

OK good to know if new issues show up.

Regards,

Tony

linux-next: manual merge of the kselftest tree with the jc_docs tree

2017-06-06 Thread Stephen Rothwell

Hi Shuah,

Today's linux-next merge of the kselftest tree got a conflict in:

  Documentation/dev-tools/index.rst

between commit:

  7fb2e8a49037 ("docs-rst: convert kgdb DocBook to ReST")

from the jc_docs tree and commit:

  5714b6531b49 ("Documentation/dev-tools: Use reStructuredText markups for 
kselftest")

from the kselftest tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc Documentation/dev-tools/index.rst
index 4ac991dbddb7,e50054c6aeaa..
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@@ -23,7 -23,7 +23,8 @@@ whole; patches welcome
 kmemleak
 kmemcheck
 gdb-kernel-debugging
 +   kgdb
+kselftest
  
  
  .. only::  subproject and html

linux-next: manual merge of the kselftest tree with the jc_docs tree

2017-06-06 Thread Stephen Rothwell

Hi Shuah,

Today's linux-next merge of the kselftest tree got a conflict in:

  Documentation/dev-tools/index.rst

between commit:

  7fb2e8a49037 ("docs-rst: convert kgdb DocBook to ReST")

from the jc_docs tree and commit:

  5714b6531b49 ("Documentation/dev-tools: Use reStructuredText markups for 
kselftest")

from the kselftest tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc Documentation/dev-tools/index.rst
index 4ac991dbddb7,e50054c6aeaa..
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@@ -23,7 -23,7 +23,8 @@@ whole; patches welcome
 kmemleak
 kmemcheck
 gdb-kernel-debugging
 +   kgdb
+kselftest
  
  
  .. only::  subproject and html

Re: [PATCH v2 0/2] x86/mm/KASLR: Do not adapt size of the direct mapping section for SGI UV system

2017-06-06 Thread Baoquan He

Hi all,

PING!

Is there any further comment or suggetion about this patchset?

Thanks
Baoquan

On 05/20/17 at 08:02pm, Baoquan He wrote:
> This is v2 post.
> 
> This patchset is trying to fix a bug that SGI UV system casually hang
> during boot with KASLR enabled. The root cause is that mm KASLR adapts
> size of the direct mapping section only based on the system RAM size.
> Then later when map SGI UV MMIOH region into the direct mapping during
> rest_init() invocation, it might go beyond of the directing mapping
> section and step into VMALLOC or VMEMMAP area, then BUG_ON triggered.
> 
> The fix is adding a helper function is_early_uv_system to check UV system
> earlier, then call the helper function in kernel_randomize_memory() to
> check if it's a SGI UV system, if yes, we keep the size of direct mapping
> section to be 64TB just as nokslr.
> 
> With this fix, SGI UV system can have 64TB direct mapping size always,
> and the starting address of direct mapping/vmalloc/vmemmap and the padding
> between them can still be randomized to enhance the system security.
> 
> v1->v2:
> 1. Mike suggested making is_early_uv_system() an inline function and be
> put in include/asm/uv/uv.h so that they can adjust them easier in the
> future.
> 
> 2. Split the v1 code into uv part and mm KASLR part as Mike suggested.
> 
> Baoquan He (2):
>   x86/UV: Introduce a helper function to check UV system at earlier
> stage
>   x86/mm/KASLR: Do not adapt the size of the direct mapping section for
> SGI UV system
> 
>  arch/x86/include/asm/uv/uv.h | 6 ++
>  arch/x86/mm/kaslr.c  | 3 ++-
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> -- 
> 2.5.5
>

Re: [PATCH v2 0/2] x86/mm/KASLR: Do not adapt size of the direct mapping section for SGI UV system

2017-06-06 Thread Baoquan He

Hi all,

PING!

Is there any further comment or suggetion about this patchset?

Thanks
Baoquan

On 05/20/17 at 08:02pm, Baoquan He wrote:
> This is v2 post.
> 
> This patchset is trying to fix a bug that SGI UV system casually hang
> during boot with KASLR enabled. The root cause is that mm KASLR adapts
> size of the direct mapping section only based on the system RAM size.
> Then later when map SGI UV MMIOH region into the direct mapping during
> rest_init() invocation, it might go beyond of the directing mapping
> section and step into VMALLOC or VMEMMAP area, then BUG_ON triggered.
> 
> The fix is adding a helper function is_early_uv_system to check UV system
> earlier, then call the helper function in kernel_randomize_memory() to
> check if it's a SGI UV system, if yes, we keep the size of direct mapping
> section to be 64TB just as nokslr.
> 
> With this fix, SGI UV system can have 64TB direct mapping size always,
> and the starting address of direct mapping/vmalloc/vmemmap and the padding
> between them can still be randomized to enhance the system security.
> 
> v1->v2:
> 1. Mike suggested making is_early_uv_system() an inline function and be
> put in include/asm/uv/uv.h so that they can adjust them easier in the
> future.
> 
> 2. Split the v1 code into uv part and mm KASLR part as Mike suggested.
> 
> Baoquan He (2):
>   x86/UV: Introduce a helper function to check UV system at earlier
> stage
>   x86/mm/KASLR: Do not adapt the size of the direct mapping section for
> SGI UV system
> 
>  arch/x86/include/asm/uv/uv.h | 6 ++
>  arch/x86/mm/kaslr.c  | 3 ++-
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> -- 
> 2.5.5
>

Re: [RFC 10/55] KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and exit

2017-06-06 Thread Jintack Lim

Hi Bandan,

On Tue, Jun 6, 2017 at 4:16 PM, Bandan Das  wrote:
> Jintack Lim  writes:
>
>> From: Christoffer Dall 
>>
>> When running in virtual EL2 we use the shadow EL1 systerm register array
>> for the save/restore process, so that hardware and especially the memory
>> subsystem behaves as code written for EL2 expects while really running
>> in EL1.
>>
>> This works great for EL1 system register accesses that we trap, because
>> these accesses will be written into the virtual state for the EL1 system
>> registers used when eventually switching the VCPU mode to EL1.
>>
>> However, there was a collection of EL1 system registers which we do not
>> trap, and as a consequence all save/restore operations of these
>> registers were happening locally in the shadow array, with no benefit to
>> software actually running in virtual EL1 at all.
>>
>> To fix this, simply synchronize the shadow and real EL1 state for these
>> registers on entry/exit to/from virtual EL2 state.
>>
>> Signed-off-by: Christoffer Dall 
>> Signed-off-by: Jintack Lim 
>> ---
>>  arch/arm64/kvm/context.c | 47 
>> +++
>>  1 file changed, 47 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
>> index 2e9e386..0025dd9 100644
>> --- a/arch/arm64/kvm/context.c
>> +++ b/arch/arm64/kvm/context.c
>> @@ -88,6 +88,51 @@ static void create_shadow_el1_sysregs(struct kvm_vcpu 
>> *vcpu)
>>   s_sys_regs[CPACR_EL1] = cptr_el2_to_cpacr_el1(el2_regs[CPTR_EL2]);
>>  }
>>
>> +/*
>> + * List of EL1 registers which we allow the virtual EL2 mode to access
>> + * directly without trapping and which haven't been paravirtualized.
>> + *
>> + * Probably CNTKCTL_EL1 should not be copied but be accessed via trap. 
>> Because,
>> + * the guest hypervisor running in EL1 can be affected by event streams
>> + * configured via CNTKCTL_EL1, which it does not expect. We don't have a
>> + * mechanism to trap on CNTKCTL_EL1 as of now (v8.3), keep it in here 
>> instead.
>> + */
>> +static const int el1_non_trap_regs[] = {
>> + CNTKCTL_EL1,
>> + CSSELR_EL1,
>> + PAR_EL1,
>> + TPIDR_EL0,
>> + TPIDR_EL1,
>> + TPIDRRO_EL0
>> +};
>> +
>
> Do we trap on all register accesses in the non-nested case +
> all accesses to the memory access registers ? I am trying to
> understand how we decide what registers to trap on. For example,
> shouldn't accesses to CSSELR_EL1, the cache size selection register
> be trapped ?

So, the principle is that we trap on EL1 register accesses from the
virtual EL2 if those EL1 register accesses without trap make the guest
hypervisor's execution different from what it should be if it really
runs in EL2. (e.g. A write operation to TTBR0_EL1 from the guest
hypervisor without trap will change the guest hypervisor's page table
base. However, if a hypervisor runs in EL2, this operation only
affects the software running in EL1, not itself.)

For non-nested case, this patch does not introduce any additional
traps since there's no virtual EL2 state.

For CSSELR_EL1 case, this register can be used at any exception level
other than EL0 and its behavior is the same whether it is executed in
EL1 or EL2. In other words, the guest hypervisor can interact with
this register in EL1 just the way a non-nesting hypervisor would do in
EL2.

Thanks,
Jintack

>
> Bandan
>
>
>> +/**
>> + * sync_shadow_el1_state - Going to/from the virtual EL2 state, sync state
>> + * @vcpu:The VCPU pointer
>> + * @setup:   True, if on the way to the guest (called from setup)
>> + *   False, if returning form the guet (calld from restore)
>> + *
>> + * Some EL1 registers are accessed directly by the virtual EL2 mode because
>> + * they in no way affect execution state in virtual EL2.   However, we must
>> + * still ensure that virtual EL2 observes the same state of the EL1 
>> registers
>> + * as the normal VM's EL1 mode, so copy this state as needed on 
>> setup/restore.
>> + */
>> +static void sync_shadow_el1_state(struct kvm_vcpu *vcpu, bool setup)
>> +{
>> + u64 *sys_regs = vcpu->arch.ctxt.sys_regs;
>> + u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
>> + int i;
>> +
>> + for (i = 0; i < ARRAY_SIZE(el1_non_trap_regs); i++) {
>> + const int sr = el1_non_trap_regs[i];
>> +
>> + if (setup)
>> + s_sys_regs[sr] = sys_regs[sr];
>> + else
>> + sys_regs[sr] = s_sys_regs[sr];
>> + }
>> +}
>> +
>>  /**
>>   * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
>>   * @vcpu: The VCPU pointer
>> @@ -107,6 +152,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>>   else
>>   ctxt->hw_pstate |= PSR_MODE_EL1t;
>>
>> + sync_shadow_el1_state(vcpu, true);
>>

Re: [RFC 10/55] KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and exit

2017-06-06 Thread Jintack Lim

Hi Bandan,

On Tue, Jun 6, 2017 at 4:16 PM, Bandan Das  wrote:
> Jintack Lim  writes:
>
>> From: Christoffer Dall 
>>
>> When running in virtual EL2 we use the shadow EL1 systerm register array
>> for the save/restore process, so that hardware and especially the memory
>> subsystem behaves as code written for EL2 expects while really running
>> in EL1.
>>
>> This works great for EL1 system register accesses that we trap, because
>> these accesses will be written into the virtual state for the EL1 system
>> registers used when eventually switching the VCPU mode to EL1.
>>
>> However, there was a collection of EL1 system registers which we do not
>> trap, and as a consequence all save/restore operations of these
>> registers were happening locally in the shadow array, with no benefit to
>> software actually running in virtual EL1 at all.
>>
>> To fix this, simply synchronize the shadow and real EL1 state for these
>> registers on entry/exit to/from virtual EL2 state.
>>
>> Signed-off-by: Christoffer Dall 
>> Signed-off-by: Jintack Lim 
>> ---
>>  arch/arm64/kvm/context.c | 47 
>> +++
>>  1 file changed, 47 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
>> index 2e9e386..0025dd9 100644
>> --- a/arch/arm64/kvm/context.c
>> +++ b/arch/arm64/kvm/context.c
>> @@ -88,6 +88,51 @@ static void create_shadow_el1_sysregs(struct kvm_vcpu 
>> *vcpu)
>>   s_sys_regs[CPACR_EL1] = cptr_el2_to_cpacr_el1(el2_regs[CPTR_EL2]);
>>  }
>>
>> +/*
>> + * List of EL1 registers which we allow the virtual EL2 mode to access
>> + * directly without trapping and which haven't been paravirtualized.
>> + *
>> + * Probably CNTKCTL_EL1 should not be copied but be accessed via trap. 
>> Because,
>> + * the guest hypervisor running in EL1 can be affected by event streams
>> + * configured via CNTKCTL_EL1, which it does not expect. We don't have a
>> + * mechanism to trap on CNTKCTL_EL1 as of now (v8.3), keep it in here 
>> instead.
>> + */
>> +static const int el1_non_trap_regs[] = {
>> + CNTKCTL_EL1,
>> + CSSELR_EL1,
>> + PAR_EL1,
>> + TPIDR_EL0,
>> + TPIDR_EL1,
>> + TPIDRRO_EL0
>> +};
>> +
>
> Do we trap on all register accesses in the non-nested case +
> all accesses to the memory access registers ? I am trying to
> understand how we decide what registers to trap on. For example,
> shouldn't accesses to CSSELR_EL1, the cache size selection register
> be trapped ?

So, the principle is that we trap on EL1 register accesses from the
virtual EL2 if those EL1 register accesses without trap make the guest
hypervisor's execution different from what it should be if it really
runs in EL2. (e.g. A write operation to TTBR0_EL1 from the guest
hypervisor without trap will change the guest hypervisor's page table
base. However, if a hypervisor runs in EL2, this operation only
affects the software running in EL1, not itself.)

For non-nested case, this patch does not introduce any additional
traps since there's no virtual EL2 state.

For CSSELR_EL1 case, this register can be used at any exception level
other than EL0 and its behavior is the same whether it is executed in
EL1 or EL2. In other words, the guest hypervisor can interact with
this register in EL1 just the way a non-nesting hypervisor would do in
EL2.

Thanks,
Jintack

>
> Bandan
>
>
>> +/**
>> + * sync_shadow_el1_state - Going to/from the virtual EL2 state, sync state
>> + * @vcpu:The VCPU pointer
>> + * @setup:   True, if on the way to the guest (called from setup)
>> + *   False, if returning form the guet (calld from restore)
>> + *
>> + * Some EL1 registers are accessed directly by the virtual EL2 mode because
>> + * they in no way affect execution state in virtual EL2.   However, we must
>> + * still ensure that virtual EL2 observes the same state of the EL1 
>> registers
>> + * as the normal VM's EL1 mode, so copy this state as needed on 
>> setup/restore.
>> + */
>> +static void sync_shadow_el1_state(struct kvm_vcpu *vcpu, bool setup)
>> +{
>> + u64 *sys_regs = vcpu->arch.ctxt.sys_regs;
>> + u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
>> + int i;
>> +
>> + for (i = 0; i < ARRAY_SIZE(el1_non_trap_regs); i++) {
>> + const int sr = el1_non_trap_regs[i];
>> +
>> + if (setup)
>> + s_sys_regs[sr] = sys_regs[sr];
>> + else
>> + sys_regs[sr] = s_sys_regs[sr];
>> + }
>> +}
>> +
>>  /**
>>   * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
>>   * @vcpu: The VCPU pointer
>> @@ -107,6 +152,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>>   else
>>   ctxt->hw_pstate |= PSR_MODE_EL1t;
>>
>> + sync_shadow_el1_state(vcpu, true);
>>   create_shadow_el1_sysregs(vcpu);
>>   ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
>>   ctxt->hw_sp_el1 = ctxt->el2_regs[SP_EL2];
>>

[PATCH] Add driver for GOODiX GTx5 series touchsereen

2017-06-06 Thread Wang Yafei

This driver is for GOODiX GTx5 series touchscreen controllers
such as GT8589, GT7589. This driver designed with hierarchial structure,
for that can be modified to support subsequent controllers easily.
Some zones of the touchscreen can be set to buttons(according to the
hardware). That is why it handles button and multitouch events.

A brief description of driver structure
- Core Layer: This layer responsible for basic input events report,
  GPIO pinctrl, Interrupt, Power resources manager and submodules
  manager.
- Hardware Layer: This layer responsible for controllers initialization,
  irq handle as well as bus read/write.
- External Module Layer: This layer used for support more features
  such as firmware update, debug tools and gesture wakeup.

Signed-off-by: Wang Yafei 

---
 .../input/touchscreen/goodix-ts-sunrise/Kconfig|   36 +
 .../input/touchscreen/goodix-ts-sunrise/Makefile   |6 +
 .../goodix-ts-sunrise/goodix_gtx5_i2c.c|  939 
 .../goodix-ts-sunrise/goodix_gtx5_update.c | 1453 ++
 .../touchscreen/goodix-ts-sunrise/goodix_ts_core.c | 1618 
 .../touchscreen/goodix-ts-sunrise/goodix_ts_core.h |  585 +++
 .../goodix-ts-sunrise/goodix_ts_tools.c|  552 +++
 7 files changed, 5189 insertions(+)
 create mode 100755 drivers/input/touchscreen/goodix-ts-sunrise/Kconfig
 create mode 100755 drivers/input/touchscreen/goodix-ts-sunrise/Makefile
 create mode 100755 
drivers/input/touchscreen/goodix-ts-sunrise/goodix_gtx5_i2c.c
 create mode 100755 
drivers/input/touchscreen/goodix-ts-sunrise/goodix_gtx5_update.c
 create mode 100755 drivers/input/touchscreen/goodix-ts-sunrise/goodix_ts_core.c
 create mode 100755 drivers/input/touchscreen/goodix-ts-sunrise/goodix_ts_core.h
 create mode 100755 
drivers/input/touchscreen/goodix-ts-sunrise/goodix_ts_tools.c

diff --git a/drivers/input/touchscreen/goodix-ts-sunrise/Kconfig 
b/drivers/input/touchscreen/goodix-ts-sunrise/Kconfig
new file mode 100755
index 000..8e16595
--- /dev/null
+++ b/drivers/input/touchscreen/goodix-ts-sunrise/Kconfig
@@ -0,0 +1,36 @@
+#
+# Goodix touchscreen driver configuration
+#
+menuconfig TOUCHSCREEN_GOODIX_GTX5
+   bool "Goodix GTx5 touchscreen"
+   depends on I2C
+   default y
+   help
+ Say Y here if you have a Goodix GTx5xx touchscreen connected
+ to your system.
+
+ If unsure, say N.
+
+if TOUCHSCREEN_GOODIX_GTX5
+
+config TOUCHSCREEN_GOODIX_GTX5_UPDATE
+   tristate "Goodix GTx5xx firmware update module"
+   default y
+   help
+ Say Y here to enable support for doing firmware update.
+
+ If unsure, say N.
+
+ To compile this driver as a module, choose M here.
+
+config TOUCHSCREEN_GOODIX_GTX5_TOOLS
+   tristate "Goodix touch tools support"
+   default n
+   help
+ Say Y here to enable debug tools.
+
+ If unsure, say N.
+
+ To compile this driver as a module, choose M here.
+
+endif
diff --git a/drivers/input/touchscreen/goodix-ts-sunrise/Makefile 
b/drivers/input/touchscreen/goodix-ts-sunrise/Makefile
new file mode 100755
index 000..2d5b38d
--- /dev/null
+++ b/drivers/input/touchscreen/goodix-ts-sunrise/Makefile
@@ -0,0 +1,6 @@
+# Goodix Touchscreen Makefile
+
+obj-$(CONFIG_TOUCHSCREEN_GOODIX_GTX5)   += goodix_ts_core.o
+obj-$(CONFIG_TOUCHSCREEN_GOODIX_GTX5_TOOLS)+= goodix_ts_tools.o
+obj-$(CONFIG_TOUCHSCREEN_GOODIX_GTX5)   += goodix_gtx5_i2c.o
+obj-$(CONFIG_TOUCHSCREEN_GOODIX_GTX5_UPDATE)+= goodix_gtx5_update.o
diff --git a/drivers/input/touchscreen/goodix-ts-sunrise/goodix_gtx5_i2c.c 
b/drivers/input/touchscreen/goodix-ts-sunrise/goodix_gtx5_i2c.c
new file mode 100755
index 000..c2e2dee
--- /dev/null
+++ b/drivers/input/touchscreen/goodix-ts-sunrise/goodix_gtx5_i2c.c
@@ -0,0 +1,939 @@
+/*
+ * Goodix GTx5 I2C Dirver
+ * Hardware interface layer of touchdriver architecture.
+ *
+ * Copyright (C) 2015 - 2016 Goodix, Inc.
+ * Authors:  Wang Yafei 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be a reference
+ * to you, when you are integrating the GOODiX's CTP IC into your system,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "goodix_ts_core.h"
+
+#define TS_DT_COMPATIBLE   "goodix,gtx5"
+#define TS_DRIVER_NAME "goodix_i2c"
+#define I2C_MAX_TRANSFER_SIZE  256
+#define TS_ADDR_LENGTH 2
+
+#define TS_REG_COORDS_BASE 0x824E

[PATCH] Add driver for GOODiX GTx5 series touchsereen

2017-06-06 Thread Wang Yafei

This driver is for GOODiX GTx5 series touchscreen controllers
such as GT8589, GT7589. This driver designed with hierarchial structure,
for that can be modified to support subsequent controllers easily.
Some zones of the touchscreen can be set to buttons(according to the
hardware). That is why it handles button and multitouch events.

A brief description of driver structure
- Core Layer: This layer responsible for basic input events report,
  GPIO pinctrl, Interrupt, Power resources manager and submodules
  manager.
- Hardware Layer: This layer responsible for controllers initialization,
  irq handle as well as bus read/write.
- External Module Layer: This layer used for support more features
  such as firmware update, debug tools and gesture wakeup.

Signed-off-by: Wang Yafei 

---
 .../input/touchscreen/goodix-ts-sunrise/Kconfig|   36 +
 .../input/touchscreen/goodix-ts-sunrise/Makefile   |6 +
 .../goodix-ts-sunrise/goodix_gtx5_i2c.c|  939 
 .../goodix-ts-sunrise/goodix_gtx5_update.c | 1453 ++
 .../touchscreen/goodix-ts-sunrise/goodix_ts_core.c | 1618 
 .../touchscreen/goodix-ts-sunrise/goodix_ts_core.h |  585 +++
 .../goodix-ts-sunrise/goodix_ts_tools.c|  552 +++
 7 files changed, 5189 insertions(+)
 create mode 100755 drivers/input/touchscreen/goodix-ts-sunrise/Kconfig
 create mode 100755 drivers/input/touchscreen/goodix-ts-sunrise/Makefile
 create mode 100755 
drivers/input/touchscreen/goodix-ts-sunrise/goodix_gtx5_i2c.c
 create mode 100755 
drivers/input/touchscreen/goodix-ts-sunrise/goodix_gtx5_update.c
 create mode 100755 drivers/input/touchscreen/goodix-ts-sunrise/goodix_ts_core.c
 create mode 100755 drivers/input/touchscreen/goodix-ts-sunrise/goodix_ts_core.h
 create mode 100755 
drivers/input/touchscreen/goodix-ts-sunrise/goodix_ts_tools.c

diff --git a/drivers/input/touchscreen/goodix-ts-sunrise/Kconfig 
b/drivers/input/touchscreen/goodix-ts-sunrise/Kconfig
new file mode 100755
index 000..8e16595
--- /dev/null
+++ b/drivers/input/touchscreen/goodix-ts-sunrise/Kconfig
@@ -0,0 +1,36 @@
+#
+# Goodix touchscreen driver configuration
+#
+menuconfig TOUCHSCREEN_GOODIX_GTX5
+   bool "Goodix GTx5 touchscreen"
+   depends on I2C
+   default y
+   help
+ Say Y here if you have a Goodix GTx5xx touchscreen connected
+ to your system.
+
+ If unsure, say N.
+
+if TOUCHSCREEN_GOODIX_GTX5
+
+config TOUCHSCREEN_GOODIX_GTX5_UPDATE
+   tristate "Goodix GTx5xx firmware update module"
+   default y
+   help
+ Say Y here to enable support for doing firmware update.
+
+ If unsure, say N.
+
+ To compile this driver as a module, choose M here.
+
+config TOUCHSCREEN_GOODIX_GTX5_TOOLS
+   tristate "Goodix touch tools support"
+   default n
+   help
+ Say Y here to enable debug tools.
+
+ If unsure, say N.
+
+ To compile this driver as a module, choose M here.
+
+endif
diff --git a/drivers/input/touchscreen/goodix-ts-sunrise/Makefile 
b/drivers/input/touchscreen/goodix-ts-sunrise/Makefile
new file mode 100755
index 000..2d5b38d
--- /dev/null
+++ b/drivers/input/touchscreen/goodix-ts-sunrise/Makefile
@@ -0,0 +1,6 @@
+# Goodix Touchscreen Makefile
+
+obj-$(CONFIG_TOUCHSCREEN_GOODIX_GTX5)   += goodix_ts_core.o
+obj-$(CONFIG_TOUCHSCREEN_GOODIX_GTX5_TOOLS)+= goodix_ts_tools.o
+obj-$(CONFIG_TOUCHSCREEN_GOODIX_GTX5)   += goodix_gtx5_i2c.o
+obj-$(CONFIG_TOUCHSCREEN_GOODIX_GTX5_UPDATE)+= goodix_gtx5_update.o
diff --git a/drivers/input/touchscreen/goodix-ts-sunrise/goodix_gtx5_i2c.c 
b/drivers/input/touchscreen/goodix-ts-sunrise/goodix_gtx5_i2c.c
new file mode 100755
index 000..c2e2dee
--- /dev/null
+++ b/drivers/input/touchscreen/goodix-ts-sunrise/goodix_gtx5_i2c.c
@@ -0,0 +1,939 @@
+/*
+ * Goodix GTx5 I2C Dirver
+ * Hardware interface layer of touchdriver architecture.
+ *
+ * Copyright (C) 2015 - 2016 Goodix, Inc.
+ * Authors:  Wang Yafei 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be a reference
+ * to you, when you are integrating the GOODiX's CTP IC into your system,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "goodix_ts_core.h"
+
+#define TS_DT_COMPATIBLE   "goodix,gtx5"
+#define TS_DRIVER_NAME "goodix_i2c"
+#define I2C_MAX_TRANSFER_SIZE  256
+#define TS_ADDR_LENGTH 2
+
+#define TS_REG_COORDS_BASE 0x824E
+#define TS_REG_CMD 0x8040
+#define

Re: [PATCH v2] security/keys: rewrite all of big_key crypto

2017-06-06 Thread Eric Biggers

Hi Jason,

On Tue, Jun 06, 2017 at 11:51:29PM +0200, Jason A. Donenfeld wrote:
> issue now. And, some error paths forgot to zero out sensitive material, so
> this patch changes a kfree into a kzfree.

There are other places in big_key.c that should be doing kzfree() instead of
kfree().  Sorry, I actually recently did a patchset that fixed this for major
parts of the keyrings API, but I skipped the big_key type because it was one of
the more obscure key types (and frankly I have no idea what, if anything,
actually uses it).  Probably the switch to kzfree() should be its own patch
since it's a separate logical change.

>  {
>   int ret = -EINVAL;
>   struct scatterlist sgio;
> - SKCIPHER_REQUEST_ON_STACK(req, big_key_skcipher);
> -
> - if (crypto_skcipher_setkey(big_key_skcipher, key, ENC_KEY_SIZE)) {
> + u8 req_on_stack[sizeof(struct aead_request) +
> + crypto_aead_reqsize(big_key_aead)];
> + struct aead_request *aead_req = (struct aead_request *)req_on_stack;
> +

The crypto API with CONFIG_VMAP_STACK=y and CONFIG_DEBUG_SG=y is unhappy with
using an aead_request on the stack, because it can't create a scatterlist from
it.  It will need to be on the heap instead.  (Or else the crypto API fixed.)

# cat .vimrc | keyctl padd big_key fred @s
[   43.687347] [ cut here ]
[   43.692592] kernel BUG at ./include/linux/scatterlist.h:140!
[   43.697527] invalid opcode:  [#1] SMP
[   43.700843] CPU: 0 PID: 219 Comm: keyctl Not tainted 
4.12.0-rc4-next-20170606-xfstests-3-gc6e36366c198 #120
[   43.707361] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-20170228_101828-anatol 04/01/2014
[   43.712167] task: 9ff6f97d8340 task.stack: bd460049c000
[   43.714259] RIP: 0010:crypto_gcm_init_common+0x234/0x260
[   43.715786] RSP: 0018:bd460049fac8 EFLAGS: 00010246
[   43.717031] RAX:  RBX: bd460049fba8 RCX: 0028
[   43.718688] RDX: 1d4f4049fbb8 RSI: 001d RDI: bd468049fbb8
[   43.720245] RBP: bd460049fb00 R08: bd460049fbd8 R09: bd460049fbd8
[   43.721473] R10:  R11:  R12: bd460049fbb8
[   43.722704] R13: 092f R14: bd460049fb58 R15: bd460049fba8
[   43.723927] FS:  7f04df2f8700() GS:9ff6fe20() 
knlGS:
[   43.725262] CS:  0010 DS:  ES:  CR0: 80050033
[   43.726065] CR2: 7f04dec1a4c0 CR3: 3957e000 CR4: 003406f0
[   43.727059] Call Trace:
[   43.727414]  crypto_gcm_encrypt+0x37/0xe0
[   43.727896]  big_key_crypt+0x106/0x160
[   43.728324]  big_key_preparse+0xc0/0x1d0
[   43.728784]  ? down_read+0x68/0xa0
[   43.729183]  key_create_or_update+0x13f/0x440
[   43.729688]  SyS_add_key+0xa1/0x1d0
[   43.730260]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   43.730851] RIP: 0033:0x7f04dec22f19
[   43.731253] RSP: 002b:7ffd622a0c88 EFLAGS: 0206 ORIG_RAX: 
00f8
[   43.732321] RAX: ffda RBX:  RCX: 7f04dec22f19
[   43.733274] RDX: 00606260 RSI: 7ffd622a2866 RDI: 7ffd622a285e
[   43.734192] RBP:  R08: fffd R09: 001e
[   43.735026] R10: 092f R11: 0206 R12: 0010
[   43.735855] R13: 7ffd622a0ca8 R14: 7ffd622a0df8 R15: 00404606
[   43.736690] Code: c8 01 48 89 83 d8 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 
5e 41 5f 5d c3 48 c7 c0 00 00 00 80 48 2b 05 69 36 74 00 e9 43 ff ff ff <0f> 0b 
0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 48 c7 45 c8 01 
[   43.738740] RIP: crypto_gcm_init_common+0x234/0x260 RSP: bd460049fac8
[   43.739402] ---[ end trace 6df3f493a6e39d76 ]---
Segmentation fault

Re: [PATCH v2] security/keys: rewrite all of big_key crypto

2017-06-06 Thread Eric Biggers

Hi Jason,

On Tue, Jun 06, 2017 at 11:51:29PM +0200, Jason A. Donenfeld wrote:
> issue now. And, some error paths forgot to zero out sensitive material, so
> this patch changes a kfree into a kzfree.

There are other places in big_key.c that should be doing kzfree() instead of
kfree().  Sorry, I actually recently did a patchset that fixed this for major
parts of the keyrings API, but I skipped the big_key type because it was one of
the more obscure key types (and frankly I have no idea what, if anything,
actually uses it).  Probably the switch to kzfree() should be its own patch
since it's a separate logical change.

>  {
>   int ret = -EINVAL;
>   struct scatterlist sgio;
> - SKCIPHER_REQUEST_ON_STACK(req, big_key_skcipher);
> -
> - if (crypto_skcipher_setkey(big_key_skcipher, key, ENC_KEY_SIZE)) {
> + u8 req_on_stack[sizeof(struct aead_request) +
> + crypto_aead_reqsize(big_key_aead)];
> + struct aead_request *aead_req = (struct aead_request *)req_on_stack;
> +

The crypto API with CONFIG_VMAP_STACK=y and CONFIG_DEBUG_SG=y is unhappy with
using an aead_request on the stack, because it can't create a scatterlist from
it.  It will need to be on the heap instead.  (Or else the crypto API fixed.)

# cat .vimrc | keyctl padd big_key fred @s
[   43.687347] [ cut here ]
[   43.692592] kernel BUG at ./include/linux/scatterlist.h:140!
[   43.697527] invalid opcode:  [#1] SMP
[   43.700843] CPU: 0 PID: 219 Comm: keyctl Not tainted 
4.12.0-rc4-next-20170606-xfstests-3-gc6e36366c198 #120
[   43.707361] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-20170228_101828-anatol 04/01/2014
[   43.712167] task: 9ff6f97d8340 task.stack: bd460049c000
[   43.714259] RIP: 0010:crypto_gcm_init_common+0x234/0x260
[   43.715786] RSP: 0018:bd460049fac8 EFLAGS: 00010246
[   43.717031] RAX:  RBX: bd460049fba8 RCX: 0028
[   43.718688] RDX: 1d4f4049fbb8 RSI: 001d RDI: bd468049fbb8
[   43.720245] RBP: bd460049fb00 R08: bd460049fbd8 R09: bd460049fbd8
[   43.721473] R10:  R11:  R12: bd460049fbb8
[   43.722704] R13: 092f R14: bd460049fb58 R15: bd460049fba8
[   43.723927] FS:  7f04df2f8700() GS:9ff6fe20() 
knlGS:
[   43.725262] CS:  0010 DS:  ES:  CR0: 80050033
[   43.726065] CR2: 7f04dec1a4c0 CR3: 3957e000 CR4: 003406f0
[   43.727059] Call Trace:
[   43.727414]  crypto_gcm_encrypt+0x37/0xe0
[   43.727896]  big_key_crypt+0x106/0x160
[   43.728324]  big_key_preparse+0xc0/0x1d0
[   43.728784]  ? down_read+0x68/0xa0
[   43.729183]  key_create_or_update+0x13f/0x440
[   43.729688]  SyS_add_key+0xa1/0x1d0
[   43.730260]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   43.730851] RIP: 0033:0x7f04dec22f19
[   43.731253] RSP: 002b:7ffd622a0c88 EFLAGS: 0206 ORIG_RAX: 
00f8
[   43.732321] RAX: ffda RBX:  RCX: 7f04dec22f19
[   43.733274] RDX: 00606260 RSI: 7ffd622a2866 RDI: 7ffd622a285e
[   43.734192] RBP:  R08: fffd R09: 001e
[   43.735026] R10: 092f R11: 0206 R12: 0010
[   43.735855] R13: 7ffd622a0ca8 R14: 7ffd622a0df8 R15: 00404606
[   43.736690] Code: c8 01 48 89 83 d8 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 
5e 41 5f 5d c3 48 c7 c0 00 00 00 80 48 2b 05 69 36 74 00 e9 43 ff ff ff <0f> 0b 
0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 48 c7 45 c8 01 
[   43.738740] RIP: crypto_gcm_init_common+0x234/0x260 RSP: bd460049fac8
[   43.739402] ---[ end trace 6df3f493a6e39d76 ]---
Segmentation fault

[GIT PULL] ARM: SoC fixes

2017-06-06 Thread Olof Johansson

Hi Linus,

The following changes since commit 08332893e37af6ae779367e78e444f8f9571511d:

  Linux 4.12-rc2 (2017-05-21 19:30:23 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git armsoc-fixes

for you to fetch changes up to 151d1d752bb681f29898c68c966f6e17b446456c:

  MAINTAINERS: EP93XX: Update maintainership (2017-06-06 21:07:37 -0700)


ARM: SoC fixes

Been sitting on these for a couple of weeks waiting on some larger batches
to come in but it's been pretty quiet.

Just your garden variety fixes here:

 - A few maintainers updates (ep93xx, Exynos, TI, Marvell)
 - Some PM fixes for Atmel/at91 and Marvell
 - A few DT fixes for Marvell, Versatile, TI Keystone, bcm283x
 - A reset driver patch to set module license for symbol access


Alexander Sverdlin (1):
  MAINTAINERS: EP93XX: Update maintainership

Antoine Tenart (1):
  arm64: marvell: dts: fix interrupts in 7k/8k crypto nodes

Arnd Bergmann (2):
  memory: atmel-ebi: mark PM ops as __maybe_unused
  ARM: at91: select CONFIG_ARM_CPU_SUSPEND

Christophe JAILLET (2):
  ARM: davinci: PM: Free resources in error handling path in 
'davinci_pm_init'
  ARM: davinci: PM: Do not free useful resources in normal path in 
'davinci_pm_init'

Florian Fainelli (1):
  Merge tag 'bcm2835-dt-fixes-2017-05-16' into devicetree/fixes

Heiko Stuebner (1):
  arm64: defconfig: enable some core options for 64bit Rockchip socs

Javier Martinez Canillas (1):
  MAINTAINERS: Remove Javier Martinez Canillas as reviewer for Exynos

Jeremy Linton (1):
  reset: hi6220: Set module license so that it can be loaded

Masahiro Yamada (1):
  ARM: dts: versatile: use #include "..." to include local DT

Murali Karicheri (1):
  ARM: dts: keystone-k2l: fix broken Ethernet due to disabled OSR

Olof Johansson (8):
  Merge tag 'davinci-fixes-for-v4.12' of 
git://git.kernel.org/.../nsekhar/linux-davinci into fixes
  Merge tag 'arm-soc/for-4.12/devicetree-fixes-2' of 
http://github.com/Broadcom/stblinux into fixes
  Merge tag 'samsung-fixes-4.12' of git://git.kernel.org/.../krzk/linux 
into fixes
  Merge tag 'at91-4.12-fixes' of git://git.kernel.org/.../abelloni/linux 
into fixes
  Merge tag 'reset-fixes-for-4.12' of 
git://git.pengutronix.de/git/pza/linux into fixes
  Merge tag 'mvebu-fixes-4.12-1' of git://git.infradead.org/linux-mvebu 
into fixes
  Merge tag 'mvebu-fixes-non-critical-4.12-1' of 
git://git.infradead.org/linux-mvebu into fixes
  Merge tag 'davinci-fixes-for-v4.12-p2' of 
git://git.kernel.org/.../nsekhar/linux-davinci into fixes

Patrice Chotard (1):
  MAINTAINERS: remove ker...@stlinux.com obsolete mailing list

Phil Elwell (1):
  ARM: dts: bcm283x: Reserve first page for firmware

Sekhar Nori (1):
  MAINTAINERS: add device-tree files to TI DaVinci entry

Thomas Petazzoni (2):
  MAINTAINERS: sort F entries for Marvell EBU maintainers
  MAINTAINERS: add irqchip related drivers to Marvell EBU maintainers

 MAINTAINERS  | 14 +++---
 arch/arm/boot/dts/bcm283x.dtsi   |  5 +
 arch/arm/boot/dts/keystone-k2l-netcp.dtsi|  4 ++--
 arch/arm/boot/dts/keystone-k2l.dtsi  |  8 
 arch/arm/boot/dts/versatile-pb.dts   |  2 +-
 arch/arm/mach-at91/Kconfig   |  1 +
 arch/arm/mach-davinci/pm.c   |  7 ++-
 arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi |  3 +--
 arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi  |  3 +--
 arch/arm64/configs/defconfig | 10 ++
 drivers/memory/atmel-ebi.c   |  2 +-
 drivers/reset/hisilicon/hi6220_reset.c   |  2 ++
 12 files changed, 45 insertions(+), 16 deletions(-)

[GIT PULL] ARM: SoC fixes

2017-06-06 Thread Olof Johansson

Hi Linus,

The following changes since commit 08332893e37af6ae779367e78e444f8f9571511d:

  Linux 4.12-rc2 (2017-05-21 19:30:23 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git armsoc-fixes

for you to fetch changes up to 151d1d752bb681f29898c68c966f6e17b446456c:

  MAINTAINERS: EP93XX: Update maintainership (2017-06-06 21:07:37 -0700)


ARM: SoC fixes

Been sitting on these for a couple of weeks waiting on some larger batches
to come in but it's been pretty quiet.

Just your garden variety fixes here:

 - A few maintainers updates (ep93xx, Exynos, TI, Marvell)
 - Some PM fixes for Atmel/at91 and Marvell
 - A few DT fixes for Marvell, Versatile, TI Keystone, bcm283x
 - A reset driver patch to set module license for symbol access


Alexander Sverdlin (1):
  MAINTAINERS: EP93XX: Update maintainership

Antoine Tenart (1):
  arm64: marvell: dts: fix interrupts in 7k/8k crypto nodes

Arnd Bergmann (2):
  memory: atmel-ebi: mark PM ops as __maybe_unused
  ARM: at91: select CONFIG_ARM_CPU_SUSPEND

Christophe JAILLET (2):
  ARM: davinci: PM: Free resources in error handling path in 
'davinci_pm_init'
  ARM: davinci: PM: Do not free useful resources in normal path in 
'davinci_pm_init'

Florian Fainelli (1):
  Merge tag 'bcm2835-dt-fixes-2017-05-16' into devicetree/fixes

Heiko Stuebner (1):
  arm64: defconfig: enable some core options for 64bit Rockchip socs

Javier Martinez Canillas (1):
  MAINTAINERS: Remove Javier Martinez Canillas as reviewer for Exynos

Jeremy Linton (1):
  reset: hi6220: Set module license so that it can be loaded

Masahiro Yamada (1):
  ARM: dts: versatile: use #include "..." to include local DT

Murali Karicheri (1):
  ARM: dts: keystone-k2l: fix broken Ethernet due to disabled OSR

Olof Johansson (8):
  Merge tag 'davinci-fixes-for-v4.12' of 
git://git.kernel.org/.../nsekhar/linux-davinci into fixes
  Merge tag 'arm-soc/for-4.12/devicetree-fixes-2' of 
http://github.com/Broadcom/stblinux into fixes
  Merge tag 'samsung-fixes-4.12' of git://git.kernel.org/.../krzk/linux 
into fixes
  Merge tag 'at91-4.12-fixes' of git://git.kernel.org/.../abelloni/linux 
into fixes
  Merge tag 'reset-fixes-for-4.12' of 
git://git.pengutronix.de/git/pza/linux into fixes
  Merge tag 'mvebu-fixes-4.12-1' of git://git.infradead.org/linux-mvebu 
into fixes
  Merge tag 'mvebu-fixes-non-critical-4.12-1' of 
git://git.infradead.org/linux-mvebu into fixes
  Merge tag 'davinci-fixes-for-v4.12-p2' of 
git://git.kernel.org/.../nsekhar/linux-davinci into fixes

Patrice Chotard (1):
  MAINTAINERS: remove ker...@stlinux.com obsolete mailing list

Phil Elwell (1):
  ARM: dts: bcm283x: Reserve first page for firmware

Sekhar Nori (1):
  MAINTAINERS: add device-tree files to TI DaVinci entry

Thomas Petazzoni (2):
  MAINTAINERS: sort F entries for Marvell EBU maintainers
  MAINTAINERS: add irqchip related drivers to Marvell EBU maintainers

 MAINTAINERS  | 14 +++---
 arch/arm/boot/dts/bcm283x.dtsi   |  5 +
 arch/arm/boot/dts/keystone-k2l-netcp.dtsi|  4 ++--
 arch/arm/boot/dts/keystone-k2l.dtsi  |  8 
 arch/arm/boot/dts/versatile-pb.dts   |  2 +-
 arch/arm/mach-at91/Kconfig   |  1 +
 arch/arm/mach-davinci/pm.c   |  7 ++-
 arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi |  3 +--
 arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi  |  3 +--
 arch/arm64/configs/defconfig | 10 ++
 drivers/memory/atmel-ebi.c   |  2 +-
 drivers/reset/hisilicon/hi6220_reset.c   |  2 ++
 12 files changed, 45 insertions(+), 16 deletions(-)

Re: [PATCH 2/2] tick: Make sure tick timer is active when bypassing reprogramming

2017-06-06 Thread Levin, Alexander (Sasha Levin)

On Tue, Jun 06, 2017 at 04:52:29PM +0200, Frederic Weisbecker wrote:
> On Sat, Jun 03, 2017 at 01:00:53PM +, Levin, Alexander (Sasha Levin) 
> wrote:
> > On Sat, Jun 03, 2017 at 02:42:43PM +0200, Frederic Weisbecker wrote:
> > > On Sat, Jun 03, 2017 at 08:06:41AM +, Levin, Alexander (Sasha Levin) 
> > > wrote:
> > > > On Fri, Apr 21, 2017 at 04:00:55PM +0200, Frederic Weisbecker wrote:
> > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > > > > index 502b320..be7ca4d 100644
> > > > > --- a/kernel/time/tick-sched.c
> > > > > +++ b/kernel/time/tick-sched.c
> > > > > @@ -783,8 +783,13 @@ static ktime_t tick_nohz_stop_sched_tick(struct 
> > > > > tick_sched *ts,
> > > > >   tick = expires;
> > > > >  
> > > > >   /* Skip reprogram of event if its not changed */
> > > > > - if (ts->tick_stopped && (expires == ts->next_tick))
> > > > > - goto out;
> > > > > + if (ts->tick_stopped && (expires == ts->next_tick)) {
> > > > > + /* Sanity check: make sure clockevent is actually 
> > > > > programmed */
> > > > > + if (likely(dev->next_event <= ts->next_tick))
> > > > > + goto out;
> > > > > +
> > > > > + WARN_ON_ONCE(1);
> > > > > + }
> > > > 
> > > > I seem to be hitting that in a KVM vm, even without load (sometimes
> > > > right after boot):
> > > 
> > > Ah, can you tell me which tree you were using? Is it tip/master?
> > 
> > Its next-20170601: 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_next_linux-2Dnext.git_commit_-3Fh-3Dnext-2D20170601-26id-3D3ab334ebe84e0dfd1cc3ea2fe77f5ce4406f7370=DwIBAg=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=bUtaaC9mlBij4OjEG_D-KMy3t3Ka3bY06suGz7ewY7g=0ex_DoQxODGZtsXpFBMXf2fPbFxv2ogjY9fVReKqbpk=t0F0_37WnCEpxAZR6WD3d_Q4n0Lp_2HCNOx3b_iOGtI=
> >  
> > 
> > > Can you give me its HEAD and your config file?
> > 
> > Attached config.
> 
> Thanks Sasha!
> 
> I couldn't reproduce it, that config boots fine on my kvm.
> Would you have the time to dump some traces for me?
> 
> I'd need you to add this boot option: 
> trace_event=hrtimer_cancel,hrtimer_start,hrtimer_expire_entry
> And boot your kernel with the below patch. This will dump the timer traces to 
> your console.
> I would be very interested in the resulting console dump file.

Attached. Let me know if you need anything else.

-- 

Thanks,
Sasha


log.txt.gz
Description: log.txt.gz

Re: [PATCH 2/2] tick: Make sure tick timer is active when bypassing reprogramming

2017-06-06 Thread Levin, Alexander (Sasha Levin)

On Tue, Jun 06, 2017 at 04:52:29PM +0200, Frederic Weisbecker wrote:
> On Sat, Jun 03, 2017 at 01:00:53PM +, Levin, Alexander (Sasha Levin) 
> wrote:
> > On Sat, Jun 03, 2017 at 02:42:43PM +0200, Frederic Weisbecker wrote:
> > > On Sat, Jun 03, 2017 at 08:06:41AM +, Levin, Alexander (Sasha Levin) 
> > > wrote:
> > > > On Fri, Apr 21, 2017 at 04:00:55PM +0200, Frederic Weisbecker wrote:
> > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > > > > index 502b320..be7ca4d 100644
> > > > > --- a/kernel/time/tick-sched.c
> > > > > +++ b/kernel/time/tick-sched.c
> > > > > @@ -783,8 +783,13 @@ static ktime_t tick_nohz_stop_sched_tick(struct 
> > > > > tick_sched *ts,
> > > > >   tick = expires;
> > > > >  
> > > > >   /* Skip reprogram of event if its not changed */
> > > > > - if (ts->tick_stopped && (expires == ts->next_tick))
> > > > > - goto out;
> > > > > + if (ts->tick_stopped && (expires == ts->next_tick)) {
> > > > > + /* Sanity check: make sure clockevent is actually 
> > > > > programmed */
> > > > > + if (likely(dev->next_event <= ts->next_tick))
> > > > > + goto out;
> > > > > +
> > > > > + WARN_ON_ONCE(1);
> > > > > + }
> > > > 
> > > > I seem to be hitting that in a KVM vm, even without load (sometimes
> > > > right after boot):
> > > 
> > > Ah, can you tell me which tree you were using? Is it tip/master?
> > 
> > Its next-20170601: 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_next_linux-2Dnext.git_commit_-3Fh-3Dnext-2D20170601-26id-3D3ab334ebe84e0dfd1cc3ea2fe77f5ce4406f7370=DwIBAg=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=bUtaaC9mlBij4OjEG_D-KMy3t3Ka3bY06suGz7ewY7g=0ex_DoQxODGZtsXpFBMXf2fPbFxv2ogjY9fVReKqbpk=t0F0_37WnCEpxAZR6WD3d_Q4n0Lp_2HCNOx3b_iOGtI=
> >  
> > 
> > > Can you give me its HEAD and your config file?
> > 
> > Attached config.
> 
> Thanks Sasha!
> 
> I couldn't reproduce it, that config boots fine on my kvm.
> Would you have the time to dump some traces for me?
> 
> I'd need you to add this boot option: 
> trace_event=hrtimer_cancel,hrtimer_start,hrtimer_expire_entry
> And boot your kernel with the below patch. This will dump the timer traces to 
> your console.
> I would be very interested in the resulting console dump file.

Attached. Let me know if you need anything else.

-- 

Thanks,
Sasha


log.txt.gz
Description: log.txt.gz

Re: [PATCH 2/2] tcp: md5: add fields to the tcp_md5sig struct to set a key address prefix

2017-06-06 Thread Eric Dumazet

On Tue, 2017-06-06 at 17:54 -0700, Ivan Delalande wrote:
> Replace padding in the socket option structure tcp_md5sig with a new
> flag field and address prefix length so it can be specified when
> configuring a new key with the TCP_MD5SIG socket option.
> 
> Signed-off-by: Bob Gilligan 
> Signed-off-by: Eric Mowat 
> Signed-off-by: Ivan Delalande 
> ---
>  include/uapi/linux/tcp.h |  6 +-
>  net/ipv4/tcp_ipv4.c  | 13 +++--
>  net/ipv6/tcp_ipv6.c  | 20 +++-
>  3 files changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 38a2b07afdff..52ac30aa0652 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -234,9 +234,13 @@ enum {
>  /* for TCP_MD5SIG socket option */
>  #define TCP_MD5SIG_MAXKEYLEN 80
>  
> +/* tcp_md5sig flags */
> +#define TCP_MD5SIG_FLAG_PREFIX   1   /* address prefix 
> length */
> +
>  struct tcp_md5sig {
>   struct __kernel_sockaddr_storage tcpm_addr; /* address associated */
> - __u16   __tcpm_pad1;/* zero */
> + __u8tcpm_flags; /* flags */
> + __u8tcpm_prefixlen; /* address prefix */
>   __u16   tcpm_keylen;/* key length */
>   __u32   __tcpm_pad2;/* zero */
>   __u8tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 51ca3bd5a8a3..2b1bb67b3388 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1069,6 +1069,7 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, char 
> __user *optval,
>  {
>   struct tcp_md5sig cmd;
>   struct sockaddr_in *sin = (struct sockaddr_in *)_addr;
> + u8 prefixlen;
>  
>   if (optlen < sizeof(cmd))
>   return -EINVAL;
> @@ -1079,15 +1080,23 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, 
> char __user *optval,
>   if (sin->sin_family != AF_INET)
>   return -EINVAL;
>  
> + if (cmd.tcpm_flags & TCP_MD5SIG_FLAG_PREFIX) {
> + prefixlen = cmd.tcpm_prefixlen;
> + if (prefixlen > 32)
> + return -EINVAL;
> + } else {
> + prefixlen = 32;
> + }

This will break some applications that maybe did not clear the
__tcpm_pad1 field ?


You need to find another way to maintain compatibility with old
applications.

Re: [PATCH 2/2] tcp: md5: add fields to the tcp_md5sig struct to set a key address prefix

2017-06-06 Thread Eric Dumazet

On Tue, 2017-06-06 at 17:54 -0700, Ivan Delalande wrote:
> Replace padding in the socket option structure tcp_md5sig with a new
> flag field and address prefix length so it can be specified when
> configuring a new key with the TCP_MD5SIG socket option.
> 
> Signed-off-by: Bob Gilligan 
> Signed-off-by: Eric Mowat 
> Signed-off-by: Ivan Delalande 
> ---
>  include/uapi/linux/tcp.h |  6 +-
>  net/ipv4/tcp_ipv4.c  | 13 +++--
>  net/ipv6/tcp_ipv6.c  | 20 +++-
>  3 files changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 38a2b07afdff..52ac30aa0652 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -234,9 +234,13 @@ enum {
>  /* for TCP_MD5SIG socket option */
>  #define TCP_MD5SIG_MAXKEYLEN 80
>  
> +/* tcp_md5sig flags */
> +#define TCP_MD5SIG_FLAG_PREFIX   1   /* address prefix 
> length */
> +
>  struct tcp_md5sig {
>   struct __kernel_sockaddr_storage tcpm_addr; /* address associated */
> - __u16   __tcpm_pad1;/* zero */
> + __u8tcpm_flags; /* flags */
> + __u8tcpm_prefixlen; /* address prefix */
>   __u16   tcpm_keylen;/* key length */
>   __u32   __tcpm_pad2;/* zero */
>   __u8tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 51ca3bd5a8a3..2b1bb67b3388 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1069,6 +1069,7 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, char 
> __user *optval,
>  {
>   struct tcp_md5sig cmd;
>   struct sockaddr_in *sin = (struct sockaddr_in *)_addr;
> + u8 prefixlen;
>  
>   if (optlen < sizeof(cmd))
>   return -EINVAL;
> @@ -1079,15 +1080,23 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, 
> char __user *optval,
>   if (sin->sin_family != AF_INET)
>   return -EINVAL;
>  
> + if (cmd.tcpm_flags & TCP_MD5SIG_FLAG_PREFIX) {
> + prefixlen = cmd.tcpm_prefixlen;
> + if (prefixlen > 32)
> + return -EINVAL;
> + } else {
> + prefixlen = 32;
> + }

This will break some applications that maybe did not clear the
__tcpm_pad1 field ?


You need to find another way to maintain compatibility with old
applications.

Re: [PATCH] cpufreq: Find transition latency dynamically

2017-06-06 Thread Viresh Kumar

On 06-06-17, 18:48, Leonard Crestez wrote:
> I remember checking if transition latency is correct for imx6q-cpufreq
> and it does not appear to be. Maybe because i2c latency of regulator
> adjustments is not counted in?

+ software latency + other stuff based on platforms.

And that's why I am trying to introduce dynamic latencies.

> It seems to me it would be much nicer to have a special flag for this
> instead of overriding CPUFREQ_ETERNAL.

Well, CPUFREQ_ETERNAL is specially there for drivers which don't know their
latencies and so we aren't really overriding it here. We are just trying to do
something sensible for them.

Over that I intend to move as many of the current drivers as possible to specify
CPUFREQ_ETERNAL instead of passing a static transition latency. The field
transition_latency will stay for platforms which don't want kernel to find it
dynamically though.

The only case which is left from above is where a platform sets its latency to
CPUFREQ_ETERNAL and it doesn't want the core to find the latency dynamically. I
would like to learn about the reasons on why do we need that and that is still
doable by setting the latency to something high enough which disallows us to use
ondemand/conservative governors.

> Also, wouldn't it be possible to update this dynamically? Just measure
> the duration every time it happens and do an update like latency =
> (latency * 7 + latest_latency) / 8.

I had that in mind but I want to know of the cases where platforms don't get the
latencies right in the first go (at boot). Because if we update latencies then
there are other things we need to do as well, for example update rate_limit_us
of schedutil governor if it isn't overridden by the user yet. So its not that
straight forward.

Plus, we don't have to get the averages here like you suggested, we are
interested in the max rather (worst case latency).

Thanks for your reviews :)

-- 
viresh

Re: [PATCH] cpufreq: Find transition latency dynamically

2017-06-06 Thread Viresh Kumar

On 06-06-17, 18:48, Leonard Crestez wrote:
> I remember checking if transition latency is correct for imx6q-cpufreq
> and it does not appear to be. Maybe because i2c latency of regulator
> adjustments is not counted in?

+ software latency + other stuff based on platforms.

And that's why I am trying to introduce dynamic latencies.

> It seems to me it would be much nicer to have a special flag for this
> instead of overriding CPUFREQ_ETERNAL.

Well, CPUFREQ_ETERNAL is specially there for drivers which don't know their
latencies and so we aren't really overriding it here. We are just trying to do
something sensible for them.

Over that I intend to move as many of the current drivers as possible to specify
CPUFREQ_ETERNAL instead of passing a static transition latency. The field
transition_latency will stay for platforms which don't want kernel to find it
dynamically though.

The only case which is left from above is where a platform sets its latency to
CPUFREQ_ETERNAL and it doesn't want the core to find the latency dynamically. I
would like to learn about the reasons on why do we need that and that is still
doable by setting the latency to something high enough which disallows us to use
ondemand/conservative governors.

> Also, wouldn't it be possible to update this dynamically? Just measure
> the duration every time it happens and do an update like latency =
> (latency * 7 + latest_latency) / 8.

I had that in mind but I want to know of the cases where platforms don't get the
latencies right in the first go (at boot). Because if we update latencies then
there are other things we need to do as well, for example update rate_limit_us
of schedutil governor if it isn't overridden by the user yet. So its not that
straight forward.

Plus, we don't have to get the averages here like you suggested, we are
interested in the max rather (worst case latency).

Thanks for your reviews :)

-- 
viresh

Re: [PATCH] Revert "mm: vmpressure: fix sending wrong events on underflow"

2017-06-06 Thread Minchan Kim

On Wed, Jun 07, 2017 at 11:08:37AM +0800, zhongjiang wrote:
> This reverts commit e1587a4945408faa58d0485002c110eb2454740c.
> 
> THP lru page is reclaimed , THP is split to normal page and loop again.
> reclaimed pages should not be bigger than nr_scan.  because of each
> loop will increase nr_scan counter.

Unfortunately, there is still underflow issue caused by slab pages as
Vinayak reported in description of e1587a4945408 so we cannot revert.
Please correct comment instead of removing the logic.

Thanks.

> 
> Signed-off-by: zhongjiang 
> ---
>  mm/vmpressure.c | 10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 6063581..149fdf6 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -112,16 +112,9 @@ static enum vmpressure_levels 
> vmpressure_calc_level(unsigned long scanned,
>   unsigned long reclaimed)
>  {
>   unsigned long scale = scanned + reclaimed;
> - unsigned long pressure = 0;
> + unsigned long pressure;
>  
>   /*
> -  * reclaimed can be greater than scanned in cases
> -  * like THP, where the scanned is 1 and reclaimed
> -  * could be 512
> -  */
> - if (reclaimed >= scanned)
> - goto out;
> - /*
>* We calculate the ratio (in percents) of how many pages were
>* scanned vs. reclaimed in a given time frame (window). Note that
>* time is in VM reclaimer's "ticks", i.e. number of pages
> @@ -131,7 +124,6 @@ static enum vmpressure_levels 
> vmpressure_calc_level(unsigned long scanned,
>   pressure = scale - (reclaimed * scale / scanned);
>   pressure = pressure * 100 / scale;
>  
> -out:
>   pr_debug("%s: %3lu  (s: %lu  r: %lu)\n", __func__, pressure,
>scanned, reclaimed);
>  
> -- 
> 1.7.12.4
>

Re: [PATCH] Revert "mm: vmpressure: fix sending wrong events on underflow"

2017-06-06 Thread Minchan Kim

On Wed, Jun 07, 2017 at 11:08:37AM +0800, zhongjiang wrote:
> This reverts commit e1587a4945408faa58d0485002c110eb2454740c.
> 
> THP lru page is reclaimed , THP is split to normal page and loop again.
> reclaimed pages should not be bigger than nr_scan.  because of each
> loop will increase nr_scan counter.

Unfortunately, there is still underflow issue caused by slab pages as
Vinayak reported in description of e1587a4945408 so we cannot revert.
Please correct comment instead of removing the logic.

Thanks.

> 
> Signed-off-by: zhongjiang 
> ---
>  mm/vmpressure.c | 10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 6063581..149fdf6 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -112,16 +112,9 @@ static enum vmpressure_levels 
> vmpressure_calc_level(unsigned long scanned,
>   unsigned long reclaimed)
>  {
>   unsigned long scale = scanned + reclaimed;
> - unsigned long pressure = 0;
> + unsigned long pressure;
>  
>   /*
> -  * reclaimed can be greater than scanned in cases
> -  * like THP, where the scanned is 1 and reclaimed
> -  * could be 512
> -  */
> - if (reclaimed >= scanned)
> - goto out;
> - /*
>* We calculate the ratio (in percents) of how many pages were
>* scanned vs. reclaimed in a given time frame (window). Note that
>* time is in VM reclaimer's "ticks", i.e. number of pages
> @@ -131,7 +124,6 @@ static enum vmpressure_levels 
> vmpressure_calc_level(unsigned long scanned,
>   pressure = scale - (reclaimed * scale / scanned);
>   pressure = pressure * 100 / scale;
>  
> -out:
>   pr_debug("%s: %3lu  (s: %lu  r: %lu)\n", __func__, pressure,
>scanned, reclaimed);
>  
> -- 
> 1.7.12.4
>

Re: [PATCH 3/4] watchdog: Split up config options

2017-06-06 Thread Nicholas Piggin

On Tue, 6 Jun 2017 12:49:58 -0400
Don Zickus  wrote:

> On Sat, Jun 03, 2017 at 04:10:05PM +1000, Nicholas Piggin wrote:

> > > >  config HARDLOCKUP_DETECTOR
> > > > -   def_bool y
> > > > -   depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> > > > -   depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
> > > > +   bool "Detect Hard Lockups"
> > > > +   depends on LOCKUP_DETECTOR
> > > > +   depends on !HAVE_NMI_WATCHDOG && (PERF_EVENTS && 
> > > > HAVE_PERF_EVENTS_NMI)
> > > > +   select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
> > > 
> > > Here is my confusion with HAVE_NMI_WATCHDOG
> > > 
> > > It seems like you can only select HARDLOCKUP_DETECTOR if 
> > > !HAVE_NMI_DETECTOR
> > > which would break sparc, I think.
> > > 
> > > And then it always selects HARDLOCKUP_DETECTOR_PERF because of the
> > > dependency on !HAVE_NMI_WATCHDOG???  
> > 
> > I don't think so -- sparc today does not select HARDLOCKUP_DETECTOR.  
> 
> Well yes, but your patch changes the definition of HARDLOCKUP_DETECTOR to
> HARDLOCKUP_DETECTOR_PERF and then recreates HARDLOCKUP_DETECTOR.
> 
> For example look at kernel/watchdog.c::watchdog_enabled (line 38)
> 
> sparc has HAVE_NMI_WATCHDOG set, but that will disable HARDLOCKUP_DETECTOR
> which I believes means sparc's nmi_watchdog is disabled on boot and has to
> be manually enabled?

Ahh okay because sparc is using watchdog_nmi_enable/disable from the
softlockup watchdog, which checks watchdog_enabled.

> 
> 
> I _think_ having
> 
> depends on LOCKUP_DETECTOR
> depends on HAVE_NMI_WATCHDOG || HAVE_PERF_EVENTS_NMI
> select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
> 
> will work because your new definition of HARDLOCKUP_DETECTOR is a
> combination of HAVE_NMI_WATCHDOG && HARDLOCKUP_DETECTOR_PERF ??
> 
> Did I get that right?

Well in some ways, except that most of the NMI watchdogs do not seem to
heed the HARDLOCKUP_DETECTOR configuration sysctls and commands.

NMI_WATCHDOG by itself was supposed to be for an arch that wnats to do
completely it own thing.

sparc is somewhere in the middle. It uses some of the HLD stuff, but
not all. That makes it a bit tricky.

> I almost wonder if arches should set either HAVE_NMI_WATCHDOG or
> HAVE_PERF_NMI_WATCHDOG and then use those two to determine
> HARDLOCKUP_DETECTOR.  Would that make the config options slightly less
> confusing?

This would probably be the right direction to go in, but it will take
slightly more I think. We first need to remove HAVE_NMI_WATCHDOG from
meaning that an arch has its own watchdog and does not want any HLD
stuff. I think with arch_touch_nmi_watchdog(), we can probably get there.

While transitioning, we could add a new option instead,

HAVE_ARCH_HARDLOCKUP_DETECTOR

I think HAVE_PERF_EVENTS_NMI is sufficient to imply it will use the PERF
HLD. Possibly you could just change the name to be a bit more regular,
HAVE_PERF_NMI_HARDLOCKUP_DETECTOR

Thanks,
Nick

Re: [PATCH 3/4] watchdog: Split up config options

2017-06-06 Thread Nicholas Piggin

On Tue, 6 Jun 2017 12:49:58 -0400
Don Zickus  wrote:

> On Sat, Jun 03, 2017 at 04:10:05PM +1000, Nicholas Piggin wrote:

> > > >  config HARDLOCKUP_DETECTOR
> > > > -   def_bool y
> > > > -   depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> > > > -   depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
> > > > +   bool "Detect Hard Lockups"
> > > > +   depends on LOCKUP_DETECTOR
> > > > +   depends on !HAVE_NMI_WATCHDOG && (PERF_EVENTS && 
> > > > HAVE_PERF_EVENTS_NMI)
> > > > +   select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
> > > 
> > > Here is my confusion with HAVE_NMI_WATCHDOG
> > > 
> > > It seems like you can only select HARDLOCKUP_DETECTOR if 
> > > !HAVE_NMI_DETECTOR
> > > which would break sparc, I think.
> > > 
> > > And then it always selects HARDLOCKUP_DETECTOR_PERF because of the
> > > dependency on !HAVE_NMI_WATCHDOG???  
> > 
> > I don't think so -- sparc today does not select HARDLOCKUP_DETECTOR.  
> 
> Well yes, but your patch changes the definition of HARDLOCKUP_DETECTOR to
> HARDLOCKUP_DETECTOR_PERF and then recreates HARDLOCKUP_DETECTOR.
> 
> For example look at kernel/watchdog.c::watchdog_enabled (line 38)
> 
> sparc has HAVE_NMI_WATCHDOG set, but that will disable HARDLOCKUP_DETECTOR
> which I believes means sparc's nmi_watchdog is disabled on boot and has to
> be manually enabled?

Ahh okay because sparc is using watchdog_nmi_enable/disable from the
softlockup watchdog, which checks watchdog_enabled.

> 
> 
> I _think_ having
> 
> depends on LOCKUP_DETECTOR
> depends on HAVE_NMI_WATCHDOG || HAVE_PERF_EVENTS_NMI
> select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
> 
> will work because your new definition of HARDLOCKUP_DETECTOR is a
> combination of HAVE_NMI_WATCHDOG && HARDLOCKUP_DETECTOR_PERF ??
> 
> Did I get that right?

Well in some ways, except that most of the NMI watchdogs do not seem to
heed the HARDLOCKUP_DETECTOR configuration sysctls and commands.

NMI_WATCHDOG by itself was supposed to be for an arch that wnats to do
completely it own thing.

sparc is somewhere in the middle. It uses some of the HLD stuff, but
not all. That makes it a bit tricky.

> I almost wonder if arches should set either HAVE_NMI_WATCHDOG or
> HAVE_PERF_NMI_WATCHDOG and then use those two to determine
> HARDLOCKUP_DETECTOR.  Would that make the config options slightly less
> confusing?

This would probably be the right direction to go in, but it will take
slightly more I think. We first need to remove HAVE_NMI_WATCHDOG from
meaning that an arch has its own watchdog and does not want any HLD
stuff. I think with arch_touch_nmi_watchdog(), we can probably get there.

While transitioning, we could add a new option instead,

HAVE_ARCH_HARDLOCKUP_DETECTOR

I think HAVE_PERF_EVENTS_NMI is sufficient to imply it will use the PERF
HLD. Possibly you could just change the name to be a bit more regular,
HAVE_PERF_NMI_HARDLOCKUP_DETECTOR

Thanks,
Nick

Re: [PATCH] modpost: abort if a module name is too long

2017-06-06 Thread Jessica Yu


+++ Wanlong Gao [06/06/17 09:07 +0800]:



On 2017/6/5 10:09, Jessica Yu wrote:

+++ Wanlong Gao [02/06/17 11:04 +0800]:



On 2017/6/2 7:23, Jessica Yu wrote:

+++ Wanlong Gao [31/05/17 11:48 +0800]:



On 2017/5/31 11:30, Jessica Yu wrote:

+++ Wanlong Gao [31/05/17 10:23 +0800]:

Hi Jessica,

On 2017/5/29 17:10, Jessica Yu wrote:

+++ Xie XiuQi [20/05/17 15:46 +0800]:

From: Wanlong Gao 

Module name has a limited length, but currently the build system
allows the build finishing even if the module name is too long.

 CC  
/root/kprobe_example/abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz.mod.o
/root/kprobe_example/abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz.mod.c:9:2:
warning: initializer-string for array of chars is too long [enabled by default]
 .name = KBUILD_MODNAME,
 ^

but it's merely a warning.

This patch adds the check of the module name length in modpost and stops
the build properly.

Signed-off-by: Wanlong Gao 
Signed-off-by: Xie XiuQi 
---
scripts/mod/modpost.c | 11 +++
1 file changed, 11 insertions(+)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 30d752a..db11c57 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -2166,6 +2166,17 @@ static int add_versions(struct buffer *b, struct module 
*mod)
{
struct symbol *s, *exp;
int err = 0;
+const char *mod_name;
+
+mod_name = strrchr(mod->name, '/');
+if (mod_name == NULL)
+mod_name = mod->name;
+else
+mod_name++;
+if (strlen(mod_name) >= MODULE_NAME_LEN) {
+merror("module name is too long [%s.ko]\n", mod->name);
+return 1;
+}


Hi Xie,

This check shouldn't be in add_versions() (which does something else entirely),
it should probably be put in a separate helper function called from main. But
I'm not a big fan of the extra string manipulation to do something this simple.

I think this check can be vastly simplified, how about something like the
following?


This looks better, would you apply your following patch?

Reviewed-by: Wanlong Gao 
Tested-by: Wanlong Gao 


Sure, thanks for testing. I'll go ahead and format this into a proper
patch and resend.


Please wait, I just found that this patch makes the built module can't
be inserted by the following error:

# insmod abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc.ko
insmod: ERROR: could not insert module 
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc.ko: Invalid parameters

# dmesg
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc: Unknown symbol 
__fentry__ (err -22)


Hm, I am unable to reproduce this. It looks like __fentry__ is missing
from your kernel, you may have a mismatch between the kernel config
that you're running and the config you are using to build the module.
In other words, it seems like you might have built the module with
CONFIG_FTRACE but built the kernel without.

Few questions -

What is the output of running `grep __fentry__ /proc/kallsyms`?



Sure it has.


Does your module correspond to the running kernel version?


Sure.



Do you have CONFIG_FTRACE/FUNCTION_TRACER enabled in your running
kernel?



Sure.



Is that the full dmesg output (are there any other error messages)?


Even when I compiled the kernel with your patch, the kernel module load
failed at the boot time with the following error:

[1.656708] libcrc32c: no symbol version for __fentry__
[1.656709] libcrc32c: Unknown symbol __fentry__ (err -22)

But my above patch in add_versions() doesn't have such problem, I've no
idea why. Maybe your patch breaks some sections?


Hm, I am still unable to reproduce this on my system with modversions
enabled and the -rc2 kernel. But judging by the errno (-22) it looks
like this is failing in check_version()/resolve_symbol() for you,
which leads me to think that this is somehow messing with the
__versions table generated by modpost (not sure why).

Does the  versions[] array in the generated *.mod.c file for your
test module look different with and without the patch? Also: what
version of gcc and binutils are you using, and what kernel version are
you testing on?


The *.mod.c file are same except the added __modname_test section, the gcc
,binutils and kernel are all from centos 7.2 (3.10.0-327).


Thanks for the additional info. Just FYI, I'm going to be out this
week and part of next week due to travelling, but I'll be able to take
another look at this next Thurs/Fri. If we can't resolve the issue, we
can just work on your original patch.

Thanks!

Jessica

Re: [PATCH] modpost: abort if a module name is too long

2017-06-06 Thread Jessica Yu


+++ Wanlong Gao [06/06/17 09:07 +0800]:



On 2017/6/5 10:09, Jessica Yu wrote:

+++ Wanlong Gao [02/06/17 11:04 +0800]:



On 2017/6/2 7:23, Jessica Yu wrote:

+++ Wanlong Gao [31/05/17 11:48 +0800]:



On 2017/5/31 11:30, Jessica Yu wrote:

+++ Wanlong Gao [31/05/17 10:23 +0800]:

Hi Jessica,

On 2017/5/29 17:10, Jessica Yu wrote:

+++ Xie XiuQi [20/05/17 15:46 +0800]:

From: Wanlong Gao 

Module name has a limited length, but currently the build system
allows the build finishing even if the module name is too long.

 CC  
/root/kprobe_example/abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz.mod.o
/root/kprobe_example/abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz.mod.c:9:2:
warning: initializer-string for array of chars is too long [enabled by default]
 .name = KBUILD_MODNAME,
 ^

but it's merely a warning.

This patch adds the check of the module name length in modpost and stops
the build properly.

Signed-off-by: Wanlong Gao 
Signed-off-by: Xie XiuQi 
---
scripts/mod/modpost.c | 11 +++
1 file changed, 11 insertions(+)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 30d752a..db11c57 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -2166,6 +2166,17 @@ static int add_versions(struct buffer *b, struct module 
*mod)
{
struct symbol *s, *exp;
int err = 0;
+const char *mod_name;
+
+mod_name = strrchr(mod->name, '/');
+if (mod_name == NULL)
+mod_name = mod->name;
+else
+mod_name++;
+if (strlen(mod_name) >= MODULE_NAME_LEN) {
+merror("module name is too long [%s.ko]\n", mod->name);
+return 1;
+}


Hi Xie,

This check shouldn't be in add_versions() (which does something else entirely),
it should probably be put in a separate helper function called from main. But
I'm not a big fan of the extra string manipulation to do something this simple.

I think this check can be vastly simplified, how about something like the
following?


This looks better, would you apply your following patch?

Reviewed-by: Wanlong Gao 
Tested-by: Wanlong Gao 


Sure, thanks for testing. I'll go ahead and format this into a proper
patch and resend.


Please wait, I just found that this patch makes the built module can't
be inserted by the following error:

# insmod abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc.ko
insmod: ERROR: could not insert module 
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc.ko: Invalid parameters

# dmesg
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc: Unknown symbol 
__fentry__ (err -22)


Hm, I am unable to reproduce this. It looks like __fentry__ is missing
from your kernel, you may have a mismatch between the kernel config
that you're running and the config you are using to build the module.
In other words, it seems like you might have built the module with
CONFIG_FTRACE but built the kernel without.

Few questions -

What is the output of running `grep __fentry__ /proc/kallsyms`?



Sure it has.


Does your module correspond to the running kernel version?


Sure.



Do you have CONFIG_FTRACE/FUNCTION_TRACER enabled in your running
kernel?



Sure.



Is that the full dmesg output (are there any other error messages)?


Even when I compiled the kernel with your patch, the kernel module load
failed at the boot time with the following error:

[1.656708] libcrc32c: no symbol version for __fentry__
[1.656709] libcrc32c: Unknown symbol __fentry__ (err -22)

But my above patch in add_versions() doesn't have such problem, I've no
idea why. Maybe your patch breaks some sections?


Hm, I am still unable to reproduce this on my system with modversions
enabled and the -rc2 kernel. But judging by the errno (-22) it looks
like this is failing in check_version()/resolve_symbol() for you,
which leads me to think that this is somehow messing with the
__versions table generated by modpost (not sure why).

Does the  versions[] array in the generated *.mod.c file for your
test module look different with and without the patch? Also: what
version of gcc and binutils are you using, and what kernel version are
you testing on?


The *.mod.c file are same except the added __modname_test section, the gcc
,binutils and kernel are all from centos 7.2 (3.10.0-327).


Thanks for the additional info. Just FYI, I'm going to be out this
week and part of next week due to travelling, but I'll be able to take
another look at this next Thurs/Fri. If we can't resolve the issue, we
can just work on your original patch.

Thanks!

Jessica

Re: [PATCH v2 0/2] ARM: imx6ull: Make suspend/resume work like on 6ul

2017-06-06 Thread Shawn Guo

On Tue, Jun 06, 2017 at 08:50:41PM +0300, Leonard Crestez wrote:
> Leonard Crestez (2):
>   ARM: imx: Add MXC_CPU_IMX6ULL and cpu_is_imx6ull
>   ARM: imx6ull: Make suspend/resume work like on 6ul

Applied both, thanks.

Re: [PATCH v2 0/2] ARM: imx6ull: Make suspend/resume work like on 6ul

2017-06-06 Thread Shawn Guo

On Tue, Jun 06, 2017 at 08:50:41PM +0300, Leonard Crestez wrote:
> Leonard Crestez (2):
>   ARM: imx: Add MXC_CPU_IMX6ULL and cpu_is_imx6ull
>   ARM: imx6ull: Make suspend/resume work like on 6ul

Applied both, thanks.

Re: [PATCH 2/2] acpi, gicv3-its, numa: Adding numa node mapping for gic-its units

2017-06-06 Thread Ganapatrao Kulkarni

Adding  Lorenzo and  Hanzun.

On Tue, Jun 6, 2017 at 6:26 PM, Ganapatrao Kulkarni
 wrote:
> Add code to parse SRAT ITS Affinity sub table as defined in ACPI 6.2
> Later in driver probe, its devices are mapped to numa node
> using its id to proximity domain mapping.
>
> Signed-off-by: Ganapatrao Kulkarni 
> ---
>  arch/arm64/include/asm/acpi.h|  2 ++
>  arch/arm64/kernel/acpi_numa.c| 59 
> 
>  drivers/acpi/numa.c  | 32 +-
>  drivers/irqchip/irq-gic-v3-its.c |  3 +-
>  include/linux/acpi.h |  3 ++
>  5 files changed, 97 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
> index 0e99978..60ea7d1 100644
> --- a/arch/arm64/include/asm/acpi.h
> +++ b/arch/arm64/include/asm/acpi.h
> @@ -143,9 +143,11 @@ static inline void arch_apei_flush_tlb_one(unsigned long 
> addr)
>  #ifdef CONFIG_ACPI_NUMA
>  int arm64_acpi_numa_init(void);
>  int acpi_numa_get_nid(unsigned int cpu, u64 hwid);
> +int acpi_numa_get_its_nid(u32 its_id);
>  #else
>  static inline int arm64_acpi_numa_init(void) { return -ENOSYS; }
>  static inline int acpi_numa_get_nid(unsigned int cpu, u64 hwid) { return 
> NUMA_NO_NODE; }
> +static inline int acpi_numa_get_its_nid(u32 its_id) { return NUMA_NO_NODE; }
>  #endif /* CONFIG_ACPI_NUMA */
>
>  #define ACPI_TABLE_UPGRADE_MAX_PHYS MEMBLOCK_ALLOC_ACCESSIBLE
> diff --git a/arch/arm64/kernel/acpi_numa.c b/arch/arm64/kernel/acpi_numa.c
> index f01fab6..f7b2ea6 100644
> --- a/arch/arm64/kernel/acpi_numa.c
> +++ b/arch/arm64/kernel/acpi_numa.c
> @@ -29,15 +29,24 @@
>  #include 
>
>  static int cpus_in_srat;
> +static int its_in_srat;
>
>  struct __node_cpu_hwid {
> u32 node_id;/* logical node containing this CPU */
> u64 cpu_hwid;   /* MPIDR for this CPU */
>  };
>
> +struct __node_its_id {
> +   u32 node_id;/* numa node id */
> +   u32 its_id ;   /* GIC ITS ID */
> +};
> +
>  static struct __node_cpu_hwid early_node_cpu_hwid[NR_CPUS] = {
>  [0 ... NR_CPUS - 1] = {NUMA_NO_NODE, PHYS_CPUID_INVALID} };
>
> +static struct __node_its_id early_node_its_id[MAX_NUMNODES] = {
> +[0 ... MAX_NUMNODES - 1] = {NUMA_NO_NODE, UINT_MAX} };
> +
>  int acpi_numa_get_nid(unsigned int cpu, u64 hwid)
>  {
> int i;
> @@ -50,6 +59,18 @@ int acpi_numa_get_nid(unsigned int cpu, u64 hwid)
> return NUMA_NO_NODE;
>  }
>
> +int acpi_numa_get_its_nid(u32 its_id)
> +{
> +   int i;
> +
> +   for (i = 0; i < its_in_srat; i++) {
> +   if (its_id == early_node_its_id[i].its_id)
> +   return early_node_its_id[i].node_id;
> +   }
> +
> +   return NUMA_NO_NODE;
> +}
> +
>  /* Callback for Proximity Domain -> ACPI processor UID mapping */
>  void __init acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa)
>  {
> @@ -100,6 +121,44 @@ void __init acpi_numa_gicc_affinity_init(struct 
> acpi_srat_gicc_affinity *pa)
> pxm, mpidr, node);
>  }
>
> +/* Callback for ITS ACPI Proximity Domain mapping */
> +void __init acpi_numa_its_affinity_init(struct acpi_srat_its_affinity *pa)
> +{
> +   int pxm, node;
> +
> +   if (srat_disabled())
> +   return;
> +
> +   if (pa->header.length < sizeof(struct acpi_srat_its_affinity)) {
> +   pr_err("SRAT: Invalid SRAT header length: %d\n",
> +   pa->header.length);
> +   bad_srat();
> +   return;
> +   }
> +
> +   if (its_in_srat >= MAX_NUMNODES) {
> +   pr_warn_once("SRAT: its count exceeding max count[%d]\n",
> +MAX_NUMNODES);
> +   return;
> +   }
> +
> +   pxm = pa->proximity_domain;
> +   node = acpi_map_pxm_to_node(pxm);
> +
> +   if (node == NUMA_NO_NODE || node >= MAX_NUMNODES) {
> +   pr_err("SRAT: Too many proximity domains %d\n", pxm);
> +   bad_srat();
> +   return;
> +   }
> +
> +   early_node_its_id[its_in_srat].node_id = node;
> +   early_node_its_id[its_in_srat].its_id =  its_in_srat;
> +   node_set(node, numa_nodes_parsed);
> +   pr_info("SRAT: PXM %d -> ITS %d -> Node %d\n",
> +   pxm, its_in_srat, node);
> +   its_in_srat++;
> +}
> +
>  int __init arm64_acpi_numa_init(void)
>  {
> int ret;
> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> index edb0c79..9c04dba 100644
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -182,6 +182,16 @@ int acpi_map_pxm_to_online_node(int pxm)
> }
> break;
>
> +   case ACPI_SRAT_TYPE_GIC_ITS_AFFINITY:
> +   {
> +   struct acpi_srat_its_affinity *p =
> +   (struct acpi_srat_its_affinity *)header;
> +   pr_debug("SRAT ITS [0x%04x]) in proximity domain 
> %d\n",
> +

Re: [PATCH 2/2] acpi, gicv3-its, numa: Adding numa node mapping for gic-its units

2017-06-06 Thread Ganapatrao Kulkarni

Adding  Lorenzo and  Hanzun.

On Tue, Jun 6, 2017 at 6:26 PM, Ganapatrao Kulkarni
 wrote:
> Add code to parse SRAT ITS Affinity sub table as defined in ACPI 6.2
> Later in driver probe, its devices are mapped to numa node
> using its id to proximity domain mapping.
>
> Signed-off-by: Ganapatrao Kulkarni 
> ---
>  arch/arm64/include/asm/acpi.h|  2 ++
>  arch/arm64/kernel/acpi_numa.c| 59 
> 
>  drivers/acpi/numa.c  | 32 +-
>  drivers/irqchip/irq-gic-v3-its.c |  3 +-
>  include/linux/acpi.h |  3 ++
>  5 files changed, 97 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
> index 0e99978..60ea7d1 100644
> --- a/arch/arm64/include/asm/acpi.h
> +++ b/arch/arm64/include/asm/acpi.h
> @@ -143,9 +143,11 @@ static inline void arch_apei_flush_tlb_one(unsigned long 
> addr)
>  #ifdef CONFIG_ACPI_NUMA
>  int arm64_acpi_numa_init(void);
>  int acpi_numa_get_nid(unsigned int cpu, u64 hwid);
> +int acpi_numa_get_its_nid(u32 its_id);
>  #else
>  static inline int arm64_acpi_numa_init(void) { return -ENOSYS; }
>  static inline int acpi_numa_get_nid(unsigned int cpu, u64 hwid) { return 
> NUMA_NO_NODE; }
> +static inline int acpi_numa_get_its_nid(u32 its_id) { return NUMA_NO_NODE; }
>  #endif /* CONFIG_ACPI_NUMA */
>
>  #define ACPI_TABLE_UPGRADE_MAX_PHYS MEMBLOCK_ALLOC_ACCESSIBLE
> diff --git a/arch/arm64/kernel/acpi_numa.c b/arch/arm64/kernel/acpi_numa.c
> index f01fab6..f7b2ea6 100644
> --- a/arch/arm64/kernel/acpi_numa.c
> +++ b/arch/arm64/kernel/acpi_numa.c
> @@ -29,15 +29,24 @@
>  #include 
>
>  static int cpus_in_srat;
> +static int its_in_srat;
>
>  struct __node_cpu_hwid {
> u32 node_id;/* logical node containing this CPU */
> u64 cpu_hwid;   /* MPIDR for this CPU */
>  };
>
> +struct __node_its_id {
> +   u32 node_id;/* numa node id */
> +   u32 its_id ;   /* GIC ITS ID */
> +};
> +
>  static struct __node_cpu_hwid early_node_cpu_hwid[NR_CPUS] = {
>  [0 ... NR_CPUS - 1] = {NUMA_NO_NODE, PHYS_CPUID_INVALID} };
>
> +static struct __node_its_id early_node_its_id[MAX_NUMNODES] = {
> +[0 ... MAX_NUMNODES - 1] = {NUMA_NO_NODE, UINT_MAX} };
> +
>  int acpi_numa_get_nid(unsigned int cpu, u64 hwid)
>  {
> int i;
> @@ -50,6 +59,18 @@ int acpi_numa_get_nid(unsigned int cpu, u64 hwid)
> return NUMA_NO_NODE;
>  }
>
> +int acpi_numa_get_its_nid(u32 its_id)
> +{
> +   int i;
> +
> +   for (i = 0; i < its_in_srat; i++) {
> +   if (its_id == early_node_its_id[i].its_id)
> +   return early_node_its_id[i].node_id;
> +   }
> +
> +   return NUMA_NO_NODE;
> +}
> +
>  /* Callback for Proximity Domain -> ACPI processor UID mapping */
>  void __init acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa)
>  {
> @@ -100,6 +121,44 @@ void __init acpi_numa_gicc_affinity_init(struct 
> acpi_srat_gicc_affinity *pa)
> pxm, mpidr, node);
>  }
>
> +/* Callback for ITS ACPI Proximity Domain mapping */
> +void __init acpi_numa_its_affinity_init(struct acpi_srat_its_affinity *pa)
> +{
> +   int pxm, node;
> +
> +   if (srat_disabled())
> +   return;
> +
> +   if (pa->header.length < sizeof(struct acpi_srat_its_affinity)) {
> +   pr_err("SRAT: Invalid SRAT header length: %d\n",
> +   pa->header.length);
> +   bad_srat();
> +   return;
> +   }
> +
> +   if (its_in_srat >= MAX_NUMNODES) {
> +   pr_warn_once("SRAT: its count exceeding max count[%d]\n",
> +MAX_NUMNODES);
> +   return;
> +   }
> +
> +   pxm = pa->proximity_domain;
> +   node = acpi_map_pxm_to_node(pxm);
> +
> +   if (node == NUMA_NO_NODE || node >= MAX_NUMNODES) {
> +   pr_err("SRAT: Too many proximity domains %d\n", pxm);
> +   bad_srat();
> +   return;
> +   }
> +
> +   early_node_its_id[its_in_srat].node_id = node;
> +   early_node_its_id[its_in_srat].its_id =  its_in_srat;
> +   node_set(node, numa_nodes_parsed);
> +   pr_info("SRAT: PXM %d -> ITS %d -> Node %d\n",
> +   pxm, its_in_srat, node);
> +   its_in_srat++;
> +}
> +
>  int __init arm64_acpi_numa_init(void)
>  {
> int ret;
> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> index edb0c79..9c04dba 100644
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -182,6 +182,16 @@ int acpi_map_pxm_to_online_node(int pxm)
> }
> break;
>
> +   case ACPI_SRAT_TYPE_GIC_ITS_AFFINITY:
> +   {
> +   struct acpi_srat_its_affinity *p =
> +   (struct acpi_srat_its_affinity *)header;
> +   pr_debug("SRAT ITS [0x%04x]) in proximity domain 
> %d\n",
> +p->its_id,
> +

RE: [PATCH] ARM: imx6ull: Make suspend/resume work like on 6ul

2017-06-06 Thread Anson Huang



Best Regards!
Anson Huang



> -Original Message-
> From: Shawn Guo [mailto:shawn...@kernel.org]
> Sent: 2017-06-07 11:21 AM
> To: Anson Huang 
> Cc: Leonard Crestez ; Peter Chen
> ; linux-kernel@vger.kernel.org; Fabio Estevam
> ; linux-arm-ker...@lists.infradead.org; Lucas Stach
> 
> Subject: Re: [PATCH] ARM: imx6ull: Make suspend/resume work like on 6ul
> 
> On Tue, Jun 06, 2017 at 01:51:53PM +0300, Leonard Crestez wrote:
> > On Mon, 2017-06-05 at 13:37 +0800, Shawn Guo wrote:
> > > On Tue, May 30, 2017 at 07:11:19PM +0300, Leonard Crestez wrote:
> > > >
> > > > Suspend and resume on imx6ull is currenty not working because of
> > > > some missed checks where behavior should match imx6ul.
> > > >
> > > > Signed-off-by: Leonard Crestez 
> > > > ---
> > > >  arch/arm/mach-imx/mxc.h | 6 ++
> > > >  arch/arm/mach-imx/pm-imx6.c | 6 --
> > > >  2 files changed, 10 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm/mach-imx/mxc.h b/arch/arm/mach-imx/mxc.h
> > > > index 34f2ff6..e00d626 100644
> > > > --- a/arch/arm/mach-imx/mxc.h
> > > > +++ b/arch/arm/mach-imx/mxc.h
> > > > @@ -39,6 +39,7 @@
> > > >  #define MXC_CPU_IMX6SX 0x62
> > > >  #define MXC_CPU_IMX6Q  0x63
> > > >  #define MXC_CPU_IMX6UL 0x64
> > > > +#define MXC_CPU_IMX6ULL0x65
> > > Since you are adding a new CPU type, you should probably patch
> > > imx_soc_device_init() for it as well.
> >
> > Ok, I will resend as a 2-patch series.
> >
> > BTW, it actually seems to me that setting
> BM_CLPCR_BYP_MMDC_CH0_LPM_HS
> > on imx6sl/sx/ul/ull is not actually needed. That bit (19) is
> > documented as "reserved" in the Reference Manual and likely ignored by
> hardware.
> >
> > As far as I understand the MMDC on imx6qdl has two channels and unless
> > 2-channel mode is enabled (not currently supported) the handshake with
> > CH1 needs to be disabled. Other reduced chips only have one MMDC
> > channel and that is CH1 (CH0 was removed) and nothing needs to be done
> > from them. The only important thing is to avoid setting
> > BM_CLPCR_BYP_MMDC_CH1_LPM_HS.
> >
> > However perhaps what I am saying is wrong for some early chip versions?
> > Because this behavior of setting BM_CLPCR_BYP_MMDC_CH0_LPM_HS has
> been
> > in the kernel for a long time.

As far as I know, since i.MX6SX, we only use MMDC_CH1, but MMDC_CH0 is still
there and we need to bypass its handshake when entering low power mode.

The bit 19 in DOC is incorrect, I remembered I ever tried it and discuss with 
design
time, they request DOC team to update DOC,
but I think doc team forgot to do it. You can try removing this bit 19 setting
and see if standby/mem suspend can still work? And try to modify this bit 19 
value
to see if it can be modified.

Anson.


> 
> @Anson, you might be the right person to comment here?
> 
> Shawn

RE: [PATCH] ARM: imx6ull: Make suspend/resume work like on 6ul

2017-06-06 Thread Anson Huang



Best Regards!
Anson Huang



> -Original Message-
> From: Shawn Guo [mailto:shawn...@kernel.org]
> Sent: 2017-06-07 11:21 AM
> To: Anson Huang 
> Cc: Leonard Crestez ; Peter Chen
> ; linux-kernel@vger.kernel.org; Fabio Estevam
> ; linux-arm-ker...@lists.infradead.org; Lucas Stach
> 
> Subject: Re: [PATCH] ARM: imx6ull: Make suspend/resume work like on 6ul
> 
> On Tue, Jun 06, 2017 at 01:51:53PM +0300, Leonard Crestez wrote:
> > On Mon, 2017-06-05 at 13:37 +0800, Shawn Guo wrote:
> > > On Tue, May 30, 2017 at 07:11:19PM +0300, Leonard Crestez wrote:
> > > >
> > > > Suspend and resume on imx6ull is currenty not working because of
> > > > some missed checks where behavior should match imx6ul.
> > > >
> > > > Signed-off-by: Leonard Crestez 
> > > > ---
> > > >  arch/arm/mach-imx/mxc.h | 6 ++
> > > >  arch/arm/mach-imx/pm-imx6.c | 6 --
> > > >  2 files changed, 10 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm/mach-imx/mxc.h b/arch/arm/mach-imx/mxc.h
> > > > index 34f2ff6..e00d626 100644
> > > > --- a/arch/arm/mach-imx/mxc.h
> > > > +++ b/arch/arm/mach-imx/mxc.h
> > > > @@ -39,6 +39,7 @@
> > > >  #define MXC_CPU_IMX6SX 0x62
> > > >  #define MXC_CPU_IMX6Q  0x63
> > > >  #define MXC_CPU_IMX6UL 0x64
> > > > +#define MXC_CPU_IMX6ULL0x65
> > > Since you are adding a new CPU type, you should probably patch
> > > imx_soc_device_init() for it as well.
> >
> > Ok, I will resend as a 2-patch series.
> >
> > BTW, it actually seems to me that setting
> BM_CLPCR_BYP_MMDC_CH0_LPM_HS
> > on imx6sl/sx/ul/ull is not actually needed. That bit (19) is
> > documented as "reserved" in the Reference Manual and likely ignored by
> hardware.
> >
> > As far as I understand the MMDC on imx6qdl has two channels and unless
> > 2-channel mode is enabled (not currently supported) the handshake with
> > CH1 needs to be disabled. Other reduced chips only have one MMDC
> > channel and that is CH1 (CH0 was removed) and nothing needs to be done
> > from them. The only important thing is to avoid setting
> > BM_CLPCR_BYP_MMDC_CH1_LPM_HS.
> >
> > However perhaps what I am saying is wrong for some early chip versions?
> > Because this behavior of setting BM_CLPCR_BYP_MMDC_CH0_LPM_HS has
> been
> > in the kernel for a long time.

As far as I know, since i.MX6SX, we only use MMDC_CH1, but MMDC_CH0 is still
there and we need to bypass its handshake when entering low power mode.

The bit 19 in DOC is incorrect, I remembered I ever tried it and discuss with 
design
time, they request DOC team to update DOC,
but I think doc team forgot to do it. You can try removing this bit 19 setting
and see if standby/mem suspend can still work? And try to modify this bit 19 
value
to see if it can be modified.

Anson.


> 
> @Anson, you might be the right person to comment here?
> 
> Shawn

Re: [LTP] [BUG] Unable to handle kernel paging request for unaligned access at address 0xc0000001c52c53df

2017-06-06 Thread Eryu Guan

On Tue, Jun 06, 2017 at 06:00:34PM +0800, Li Wang wrote:
> Hi,
> 
> ltp/access04 always panic the latest mainstream kernel-4.12-rc4 on
> ppc64le. From the calltrace
> I guess the reason is probably that the tests mount ext2 file system
> using ext4 driver.
> 
> A simple way to reproduce:
> 
> # dd of=wangli if=/dev/zero count=1024 bs=1024
> # mkfs -t ext2 wangli
> # mount -t ext4 wangli /mnt/

I can't reproduce this crash either by your reproducer nor by ltp
access04 test on ppc64le host.

> 
> 
> Are there any new changes in ext4 (on kernel-4.12-rc4) recently?

I don't think it's an ext4 bug, I've seen similar crashes twice in
4.12-rc4 kernel testings, once testing XFS running fstests, and once
running ltp on ext3. But it seems not related to filesystem code.

[  828.119270] run fstests generic/034 at 2017-06-06 19:16:10 
[  828.720341] XFS (sda5): Unmounting Filesystem 
[  828.814003] device-mapper: uevent: version 1.0.3 
[  828.814096] Unable to handle kernel paging request for unaligned access at 
address 0xc001c52c5e7f 
[  828.814103] Faulting instruction address: 0xc04d214c 
[  828.814109] Oops: Kernel access of bad area, sig: 7 [#1] 
[  828.814113] SMP NR_CPUS=2048  
[  828.814114] NUMA  
[  828.814117] pSeries 
[  828.814122] Modules linked in: dm_mod(+) sg pseries_rng ghash_generic 
gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables 
xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp 
[  828.814150] CPU: 10 PID: 137772 Comm: modprobe Not tainted 4.12.0-rc4 #1 
[  828.814155] task: c003fe13c800 task.stack: c0046ec68000 
[  828.814163] NIP: c04d214c LR: c011c884 CTR: c0130900 
[  828.814168] REGS: c0046ec6b3d0 TRAP: 0600   Not tainted  (4.12.0-rc4) 
[  828.814173] MSR: 80010280b033  
[  828.814184]   CR: 28228244  XER: 0005 
[  828.814191] CFAR: c011c880 DAR: c001c52c5e7f DSISR:  
SOFTE: 0  
[  828.814191] GPR00: c011c848 c0046ec6b650 c1049100 
c003f3b77020  
[  828.814191] GPR04: c003f3b77020 c001c52c5e7f  
0001  
[  828.814191] GPR08: 0008f92d89943c42 002448b7 0008 
  
[  828.814191] GPR12: c0130900 cfac6900 d7dd3908 
d7dd3908  
[  828.814191] GPR16: c0046ec6bdec c0046ec6bda0 ff20 
  
[  828.814191] GPR20: 52f8  4000 
c0cc5780  
[  828.814191] GPR24: 0001c45ffc5f  0001c45ffc5f 
c107dd00  
[  828.814191] GPR28: c003f3b77834 0004 0800 
c003f3b77000  
[  828.814257] NIP [c04d214c] llist_add_batch+0xc/0x40 
[  828.814263] LR [c011c884] try_to_wake_up+0x4a4/0x5b0 
[  828.814268] Call Trace: 
[  828.814273] [c0046ec6b650] [c011c848] try_to_wake_up+0x468/0x5b0 
(unreliable) 
[  828.814282] [c0046ec6b6d0] [c0102828] create_worker+0x148/0x250 
[  828.814290] [c0046ec6b770] [c01059dc] 
alloc_unbound_pwq+0x3bc/0x4c0 
[  828.814296] [c0046ec6b7d0] [c010601c] 
apply_wqattrs_prepare+0x2ac/0x320 
[  828.814304] [c0046ec6b840] [c01060cc] 
apply_workqueue_attrs_locked+0x3c/0xa0 
[  828.814313] [c0046ec6b870] [c010662c] 
apply_workqueue_attrs+0x4c/0x80 
[  828.814322] [c0046ec6b8b0] [c01081cc] 
__alloc_workqueue_key+0x16c/0x4e0 
[  828.814343] [c0046ec6b970] [d7e04748] local_init+0xdc/0x1a4 
[dm_mod] 
[  828.814362] [c0046ec6b9f0] [d7e04854] dm_init+0x44/0xc4 [dm_mod] 
[  828.814375] [c0046ec6ba30] [c000ccf0] do_one_initcall+0x60/0x1c0 
[  828.814390] [c0046ec6baf0] [c091e748] do_init_module+0x8c/0x244 
[  828.814405] [c0046ec6bb80] [c0197e08] load_module+0x12f8/0x1600 
[  828.814414] [c0046ec6bd30] [c0198388] 
SyS_finit_module+0xa8/0x110 
[  828.814424] [c0046ec6be30] [c000af84] system_call+0x38/0xe0 
[  828.814429] Instruction dump: 
[  828.814436] 6042 3860 4e800020 6000 6042 7c832378 4e800020 
6000  
[  828.814448] 6000 e925 f924 7c0004ac <7d4028a8> 7c2a4800 40c20010 
7c6029ad  
[  828.814466] ---[ end trace 87ec4ff1fa8e1a3d ]--- 

I suspect it's a regression introduced in 4.12-rc4 kernel, I didn't see
such crashes when testing 4.12-rc3 kernel. I'll do bisect once I worked
out a reliable reproducer (unless you can reliably reproduce it with
your reproducer :).

Thanks,
Eryu

Re: [LTP] [BUG] Unable to handle kernel paging request for unaligned access at address 0xc0000001c52c53df

2017-06-06 Thread Eryu Guan

On Tue, Jun 06, 2017 at 06:00:34PM +0800, Li Wang wrote:
> Hi,
> 
> ltp/access04 always panic the latest mainstream kernel-4.12-rc4 on
> ppc64le. From the calltrace
> I guess the reason is probably that the tests mount ext2 file system
> using ext4 driver.
> 
> A simple way to reproduce:
> 
> # dd of=wangli if=/dev/zero count=1024 bs=1024
> # mkfs -t ext2 wangli
> # mount -t ext4 wangli /mnt/

I can't reproduce this crash either by your reproducer nor by ltp
access04 test on ppc64le host.

> 
> 
> Are there any new changes in ext4 (on kernel-4.12-rc4) recently?

I don't think it's an ext4 bug, I've seen similar crashes twice in
4.12-rc4 kernel testings, once testing XFS running fstests, and once
running ltp on ext3. But it seems not related to filesystem code.

[  828.119270] run fstests generic/034 at 2017-06-06 19:16:10 
[  828.720341] XFS (sda5): Unmounting Filesystem 
[  828.814003] device-mapper: uevent: version 1.0.3 
[  828.814096] Unable to handle kernel paging request for unaligned access at 
address 0xc001c52c5e7f 
[  828.814103] Faulting instruction address: 0xc04d214c 
[  828.814109] Oops: Kernel access of bad area, sig: 7 [#1] 
[  828.814113] SMP NR_CPUS=2048  
[  828.814114] NUMA  
[  828.814117] pSeries 
[  828.814122] Modules linked in: dm_mod(+) sg pseries_rng ghash_generic 
gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables 
xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp 
[  828.814150] CPU: 10 PID: 137772 Comm: modprobe Not tainted 4.12.0-rc4 #1 
[  828.814155] task: c003fe13c800 task.stack: c0046ec68000 
[  828.814163] NIP: c04d214c LR: c011c884 CTR: c0130900 
[  828.814168] REGS: c0046ec6b3d0 TRAP: 0600   Not tainted  (4.12.0-rc4) 
[  828.814173] MSR: 80010280b033  
[  828.814184]   CR: 28228244  XER: 0005 
[  828.814191] CFAR: c011c880 DAR: c001c52c5e7f DSISR:  
SOFTE: 0  
[  828.814191] GPR00: c011c848 c0046ec6b650 c1049100 
c003f3b77020  
[  828.814191] GPR04: c003f3b77020 c001c52c5e7f  
0001  
[  828.814191] GPR08: 0008f92d89943c42 002448b7 0008 
  
[  828.814191] GPR12: c0130900 cfac6900 d7dd3908 
d7dd3908  
[  828.814191] GPR16: c0046ec6bdec c0046ec6bda0 ff20 
  
[  828.814191] GPR20: 52f8  4000 
c0cc5780  
[  828.814191] GPR24: 0001c45ffc5f  0001c45ffc5f 
c107dd00  
[  828.814191] GPR28: c003f3b77834 0004 0800 
c003f3b77000  
[  828.814257] NIP [c04d214c] llist_add_batch+0xc/0x40 
[  828.814263] LR [c011c884] try_to_wake_up+0x4a4/0x5b0 
[  828.814268] Call Trace: 
[  828.814273] [c0046ec6b650] [c011c848] try_to_wake_up+0x468/0x5b0 
(unreliable) 
[  828.814282] [c0046ec6b6d0] [c0102828] create_worker+0x148/0x250 
[  828.814290] [c0046ec6b770] [c01059dc] 
alloc_unbound_pwq+0x3bc/0x4c0 
[  828.814296] [c0046ec6b7d0] [c010601c] 
apply_wqattrs_prepare+0x2ac/0x320 
[  828.814304] [c0046ec6b840] [c01060cc] 
apply_workqueue_attrs_locked+0x3c/0xa0 
[  828.814313] [c0046ec6b870] [c010662c] 
apply_workqueue_attrs+0x4c/0x80 
[  828.814322] [c0046ec6b8b0] [c01081cc] 
__alloc_workqueue_key+0x16c/0x4e0 
[  828.814343] [c0046ec6b970] [d7e04748] local_init+0xdc/0x1a4 
[dm_mod] 
[  828.814362] [c0046ec6b9f0] [d7e04854] dm_init+0x44/0xc4 [dm_mod] 
[  828.814375] [c0046ec6ba30] [c000ccf0] do_one_initcall+0x60/0x1c0 
[  828.814390] [c0046ec6baf0] [c091e748] do_init_module+0x8c/0x244 
[  828.814405] [c0046ec6bb80] [c0197e08] load_module+0x12f8/0x1600 
[  828.814414] [c0046ec6bd30] [c0198388] 
SyS_finit_module+0xa8/0x110 
[  828.814424] [c0046ec6be30] [c000af84] system_call+0x38/0xe0 
[  828.814429] Instruction dump: 
[  828.814436] 6042 3860 4e800020 6000 6042 7c832378 4e800020 
6000  
[  828.814448] 6000 e925 f924 7c0004ac <7d4028a8> 7c2a4800 40c20010 
7c6029ad  
[  828.814466] ---[ end trace 87ec4ff1fa8e1a3d ]--- 

I suspect it's a regression introduced in 4.12-rc4 kernel, I didn't see
such crashes when testing 4.12-rc3 kernel. I'll do bisect once I worked
out a reliable reproducer (unless you can reliably reproduce it with
your reproducer :).

Thanks,
Eryu

Re: [PATCH] ARM: imx6ull: Make suspend/resume work like on 6ul

2017-06-06 Thread Shawn Guo

On Tue, Jun 06, 2017 at 01:51:53PM +0300, Leonard Crestez wrote:
> On Mon, 2017-06-05 at 13:37 +0800, Shawn Guo wrote:
> > On Tue, May 30, 2017 at 07:11:19PM +0300, Leonard Crestez wrote:
> > > 
> > > Suspend and resume on imx6ull is currenty not working because of some
> > > missed checks where behavior should match imx6ul.
> > > 
> > > Signed-off-by: Leonard Crestez 
> > > ---
> > >  arch/arm/mach-imx/mxc.h | 6 ++
> > >  arch/arm/mach-imx/pm-imx6.c | 6 --
> > >  2 files changed, 10 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm/mach-imx/mxc.h b/arch/arm/mach-imx/mxc.h
> > > index 34f2ff6..e00d626 100644
> > > --- a/arch/arm/mach-imx/mxc.h
> > > +++ b/arch/arm/mach-imx/mxc.h
> > > @@ -39,6 +39,7 @@
> > >  #define MXC_CPU_IMX6SX   0x62
> > >  #define MXC_CPU_IMX6Q0x63
> > >  #define MXC_CPU_IMX6UL   0x64
> > > +#define MXC_CPU_IMX6ULL  0x65
> > Since you are adding a new CPU type, you should probably patch
> > imx_soc_device_init() for it as well.
> 
> Ok, I will resend as a 2-patch series.
> 
> BTW, it actually seems to me that setting BM_CLPCR_BYP_MMDC_CH0_LPM_HS
> on imx6sl/sx/ul/ull is not actually needed. That bit (19) is documented
> as "reserved" in the Reference Manual and likely ignored by hardware.
> 
> As far as I understand the MMDC on imx6qdl has two channels and unless
> 2-channel mode is enabled (not currently supported) the handshake with
> CH1 needs to be disabled. Other reduced chips only have one MMDC
> channel and that is CH1 (CH0 was removed) and nothing needs to be done
> from them. The only important thing is to avoid setting
> BM_CLPCR_BYP_MMDC_CH1_LPM_HS.
> 
> However perhaps what I am saying is wrong for some early chip versions?
> Because this behavior of setting BM_CLPCR_BYP_MMDC_CH0_LPM_HS has been
> in the kernel for a long time.

@Anson, you might be the right person to comment here?

Shawn

Re: [PATCH] ARM: imx6ull: Make suspend/resume work like on 6ul

2017-06-06 Thread Shawn Guo

On Tue, Jun 06, 2017 at 01:51:53PM +0300, Leonard Crestez wrote:
> On Mon, 2017-06-05 at 13:37 +0800, Shawn Guo wrote:
> > On Tue, May 30, 2017 at 07:11:19PM +0300, Leonard Crestez wrote:
> > > 
> > > Suspend and resume on imx6ull is currenty not working because of some
> > > missed checks where behavior should match imx6ul.
> > > 
> > > Signed-off-by: Leonard Crestez 
> > > ---
> > >  arch/arm/mach-imx/mxc.h | 6 ++
> > >  arch/arm/mach-imx/pm-imx6.c | 6 --
> > >  2 files changed, 10 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm/mach-imx/mxc.h b/arch/arm/mach-imx/mxc.h
> > > index 34f2ff6..e00d626 100644
> > > --- a/arch/arm/mach-imx/mxc.h
> > > +++ b/arch/arm/mach-imx/mxc.h
> > > @@ -39,6 +39,7 @@
> > >  #define MXC_CPU_IMX6SX   0x62
> > >  #define MXC_CPU_IMX6Q0x63
> > >  #define MXC_CPU_IMX6UL   0x64
> > > +#define MXC_CPU_IMX6ULL  0x65
> > Since you are adding a new CPU type, you should probably patch
> > imx_soc_device_init() for it as well.
> 
> Ok, I will resend as a 2-patch series.
> 
> BTW, it actually seems to me that setting BM_CLPCR_BYP_MMDC_CH0_LPM_HS
> on imx6sl/sx/ul/ull is not actually needed. That bit (19) is documented
> as "reserved" in the Reference Manual and likely ignored by hardware.
> 
> As far as I understand the MMDC on imx6qdl has two channels and unless
> 2-channel mode is enabled (not currently supported) the handshake with
> CH1 needs to be disabled. Other reduced chips only have one MMDC
> channel and that is CH1 (CH0 was removed) and nothing needs to be done
> from them. The only important thing is to avoid setting
> BM_CLPCR_BYP_MMDC_CH1_LPM_HS.
> 
> However perhaps what I am saying is wrong for some early chip versions?
> Because this behavior of setting BM_CLPCR_BYP_MMDC_CH0_LPM_HS has been
> in the kernel for a long time.

@Anson, you might be the right person to comment here?

Shawn

[PATCH] Revert "mm: vmpressure: fix sending wrong events on underflow"

2017-06-06 Thread zhongjiang

This reverts commit e1587a4945408faa58d0485002c110eb2454740c.

THP lru page is reclaimed , THP is split to normal page and loop again.
reclaimed pages should not be bigger than nr_scan.  because of each
loop will increase nr_scan counter.

Signed-off-by: zhongjiang 
---
 mm/vmpressure.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 6063581..149fdf6 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -112,16 +112,9 @@ static enum vmpressure_levels 
vmpressure_calc_level(unsigned long scanned,
unsigned long reclaimed)
 {
unsigned long scale = scanned + reclaimed;
-   unsigned long pressure = 0;
+   unsigned long pressure;
 
/*
-* reclaimed can be greater than scanned in cases
-* like THP, where the scanned is 1 and reclaimed
-* could be 512
-*/
-   if (reclaimed >= scanned)
-   goto out;
-   /*
 * We calculate the ratio (in percents) of how many pages were
 * scanned vs. reclaimed in a given time frame (window). Note that
 * time is in VM reclaimer's "ticks", i.e. number of pages
@@ -131,7 +124,6 @@ static enum vmpressure_levels 
vmpressure_calc_level(unsigned long scanned,
pressure = scale - (reclaimed * scale / scanned);
pressure = pressure * 100 / scale;
 
-out:
pr_debug("%s: %3lu  (s: %lu  r: %lu)\n", __func__, pressure,
 scanned, reclaimed);
 
-- 
1.7.12.4

[PATCH] Revert "mm: vmpressure: fix sending wrong events on underflow"

2017-06-06 Thread zhongjiang

This reverts commit e1587a4945408faa58d0485002c110eb2454740c.

THP lru page is reclaimed , THP is split to normal page and loop again.
reclaimed pages should not be bigger than nr_scan.  because of each
loop will increase nr_scan counter.

Signed-off-by: zhongjiang 
---
 mm/vmpressure.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 6063581..149fdf6 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -112,16 +112,9 @@ static enum vmpressure_levels 
vmpressure_calc_level(unsigned long scanned,
unsigned long reclaimed)
 {
unsigned long scale = scanned + reclaimed;
-   unsigned long pressure = 0;
+   unsigned long pressure;
 
/*
-* reclaimed can be greater than scanned in cases
-* like THP, where the scanned is 1 and reclaimed
-* could be 512
-*/
-   if (reclaimed >= scanned)
-   goto out;
-   /*
 * We calculate the ratio (in percents) of how many pages were
 * scanned vs. reclaimed in a given time frame (window). Note that
 * time is in VM reclaimer's "ticks", i.e. number of pages
@@ -131,7 +124,6 @@ static enum vmpressure_levels 
vmpressure_calc_level(unsigned long scanned,
pressure = scale - (reclaimed * scale / scanned);
pressure = pressure * 100 / scale;
 
-out:
pr_debug("%s: %3lu  (s: %lu  r: %lu)\n", __func__, pressure,
 scanned, reclaimed);
 
-- 
1.7.12.4

linux-next: manual merge of the tip tree with the unicore32 tree

2017-06-06 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  arch/unicore32/include/asm/Kbuild

between commit:

  bc27113620ca ("unicore32-oldabi: add oldabi syscall interface")

from the unicore32 tree and commit:

  6bc51cbaa9d7 ("signal: Remove non-uapi ")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/unicore32/include/asm/Kbuild
index ea47c4bf3aa7,7a53a55341de..
--- a/arch/unicore32/include/asm/Kbuild
+++ b/arch/unicore32/include/asm/Kbuild
@@@ -44,7 -44,7 +44,6 @@@ generic-y += serial.
  generic-y += setup.h
  generic-y += shmbuf.h
  generic-y += shmparam.h
- generic-y += siginfo.h
 -generic-y += signal.h
  generic-y += sizes.h
  generic-y += socket.h
  generic-y += sockios.h

linux-next: manual merge of the tip tree with the unicore32 tree

2017-06-06 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  arch/unicore32/include/asm/Kbuild

between commit:

  bc27113620ca ("unicore32-oldabi: add oldabi syscall interface")

from the unicore32 tree and commit:

  6bc51cbaa9d7 ("signal: Remove non-uapi ")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/unicore32/include/asm/Kbuild
index ea47c4bf3aa7,7a53a55341de..
--- a/arch/unicore32/include/asm/Kbuild
+++ b/arch/unicore32/include/asm/Kbuild
@@@ -44,7 -44,7 +44,6 @@@ generic-y += serial.
  generic-y += setup.h
  generic-y += shmbuf.h
  generic-y += shmparam.h
- generic-y += siginfo.h
 -generic-y += signal.h
  generic-y += sizes.h
  generic-y += socket.h
  generic-y += sockios.h

[PATCH] f2fs: fix to avoid panic when encountering corrupt node

2017-06-06 Thread Chao Yu

With fault_injection option, generic/361 of fstests will complain us
with below message:

Call Trace:
 get_node_page+0x12/0x20 [f2fs]
 f2fs_iget+0x92/0x7d0 [f2fs]
 f2fs_fill_super+0x10fb/0x15e0 [f2fs]
 mount_bdev+0x184/0x1c0
 f2fs_mount+0x15/0x20 [f2fs]
 mount_fs+0x39/0x150
 vfs_kern_mount+0x67/0x110
 do_mount+0x1bb/0xc70
 SyS_mount+0x83/0xd0
 do_syscall_64+0x6e/0x160
 entry_SYSCALL64_slow_path+0x25/0x25

Since mkfs loop device in f2fs partition can be failed silently due to
checkpoint error injection, so root inode page can be corrupted, in order
to avoid needless panic, in get_node_page, it's better to leave message
and return error to caller, and let fsck repaire it later.

Signed-off-by: Chao Yu 
---
 fs/f2fs/node.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 05700e54f91e..f522378224aa 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1157,6 +1157,7 @@ static struct page *__get_node_page(struct f2fs_sb_info 
*sbi, pgoff_t nid,
f2fs_put_page(page, 1);
return ERR_PTR(err);
} else if (err == LOCKED_PAGE) {
+   err = 0;
goto page_hit;
}
 
@@ -1170,15 +1171,22 @@ static struct page *__get_node_page(struct f2fs_sb_info 
*sbi, pgoff_t nid,
goto repeat;
}
 
-   if (unlikely(!PageUptodate(page)))
+   if (unlikely(!PageUptodate(page))) {
+   err = -EIO;
goto out_err;
+   }
 page_hit:
if(unlikely(nid != nid_of_node(page))) {
-   f2fs_bug_on(sbi, 1);
+   f2fs_msg(sbi->sb, KERN_WARNING, "inconsistent node block, "
+   "nid:%lu, 
node_footer[nid:%u,ino:%u,ofs:%u,cpver:%llu,blkaddr:%u]",
+   nid, nid_of_node(page), ino_of_node(page),
+   ofs_of_node(page), cpver_of_node(page),
+   next_blkaddr_of_node(page));
ClearPageUptodate(page);
+   err = -EINVAL;
 out_err:
f2fs_put_page(page, 1);
-   return ERR_PTR(-EIO);
+   return ERR_PTR(err);
}
return page;
 }
-- 
2.13.1.388.g69e6b9b4f4a9

[PATCH] f2fs: fix to avoid panic when encountering corrupt node

2017-06-06 Thread Chao Yu

With fault_injection option, generic/361 of fstests will complain us
with below message:

Call Trace:
 get_node_page+0x12/0x20 [f2fs]
 f2fs_iget+0x92/0x7d0 [f2fs]
 f2fs_fill_super+0x10fb/0x15e0 [f2fs]
 mount_bdev+0x184/0x1c0
 f2fs_mount+0x15/0x20 [f2fs]
 mount_fs+0x39/0x150
 vfs_kern_mount+0x67/0x110
 do_mount+0x1bb/0xc70
 SyS_mount+0x83/0xd0
 do_syscall_64+0x6e/0x160
 entry_SYSCALL64_slow_path+0x25/0x25

Since mkfs loop device in f2fs partition can be failed silently due to
checkpoint error injection, so root inode page can be corrupted, in order
to avoid needless panic, in get_node_page, it's better to leave message
and return error to caller, and let fsck repaire it later.

Signed-off-by: Chao Yu 
---
 fs/f2fs/node.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 05700e54f91e..f522378224aa 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1157,6 +1157,7 @@ static struct page *__get_node_page(struct f2fs_sb_info 
*sbi, pgoff_t nid,
f2fs_put_page(page, 1);
return ERR_PTR(err);
} else if (err == LOCKED_PAGE) {
+   err = 0;
goto page_hit;
}
 
@@ -1170,15 +1171,22 @@ static struct page *__get_node_page(struct f2fs_sb_info 
*sbi, pgoff_t nid,
goto repeat;
}
 
-   if (unlikely(!PageUptodate(page)))
+   if (unlikely(!PageUptodate(page))) {
+   err = -EIO;
goto out_err;
+   }
 page_hit:
if(unlikely(nid != nid_of_node(page))) {
-   f2fs_bug_on(sbi, 1);
+   f2fs_msg(sbi->sb, KERN_WARNING, "inconsistent node block, "
+   "nid:%lu, 
node_footer[nid:%u,ino:%u,ofs:%u,cpver:%llu,blkaddr:%u]",
+   nid, nid_of_node(page), ino_of_node(page),
+   ofs_of_node(page), cpver_of_node(page),
+   next_blkaddr_of_node(page));
ClearPageUptodate(page);
+   err = -EINVAL;
 out_err:
f2fs_put_page(page, 1);
-   return ERR_PTR(-EIO);
+   return ERR_PTR(err);
}
return page;
 }
-- 
2.13.1.388.g69e6b9b4f4a9

Re: [PATCH 2/6] efi: Avoid fortify checks in EFI stub

2017-06-06 Thread Kees Cook

On Tue, Jun 6, 2017 at 10:17 AM, Mark Rutland  wrote:
> On Tue, Jun 06, 2017 at 05:13:07PM +, Ard Biesheuvel wrote:
>> (+ Mark, Matt)
>>
>> On 6 June 2017 at 04:52, Kees Cook  wrote:
>> > This avoids CONFIG_FORTIFY_SOURCE from being enabled during the EFI stub
>> > build, as adding a panic() implementation may not work well. This can be
>> > adjusted in the future.
>> >
>> > Suggested-by: Daniel Micay 
>> > Signed-off-by: Kees Cook 
>> > Cc; Matt Fleming 
>> > Cc: Ard Biesheuvel 
>
> I believe for arm64 the immediate breakage is implicitly fixed by the
>  definition, but I agree it makes sense to be explicit
> anyhow.
>
> FWIW:
>
> Acked-by: Mark Rutland 
>
> Kees, as an aside, do you want me to patchify the vdso fixup? Or are
> you going to handle that?

I sent that separately but discovered that my invocation of git
send-email failed to include a CC to you, even though I had it listed
as Suggested-by, etc. I think it's going to get queued for the arm64
tree.

>
> Thanks,
> Mark.
>
>> > ---
>> >  drivers/firmware/efi/libstub/Makefile | 1 +
>> >  1 file changed, 1 insertion(+)
>> >
>> > diff --git a/drivers/firmware/efi/libstub/Makefile 
>> > b/drivers/firmware/efi/libstub/Makefile
>> > index f7425960f6a5..37e24f525162 100644
>> > --- a/drivers/firmware/efi/libstub/Makefile
>> > +++ b/drivers/firmware/efi/libstub/Makefile
>> > @@ -17,6 +17,7 @@ cflags-$(CONFIG_ARM)  := $(subst 
>> > -pg,,$(KBUILD_CFLAGS)) \
>> >  cflags-$(CONFIG_EFI_ARMSTUB)   += -I$(srctree)/scripts/dtc/libfdt
>> >
>> >  KBUILD_CFLAGS  := $(cflags-y) -DDISABLE_BRANCH_PROFILING \
>> > +  -D__NO_FORTIFY \
>> >$(call cc-option,-ffreestanding) \
>> >$(call cc-option,-fno-stack-protector)
>> >
>>
>> Reviewed-by: Ard Biesheuvel 
>>
>> This is unlikely to conflict with anything going through the EFI tree,
>> so feel free to queue it elsewhere.

If it can go through the EFI tree, that'd be great. Less for akpm to wrangle. :)

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH 2/6] efi: Avoid fortify checks in EFI stub

2017-06-06 Thread Kees Cook

On Tue, Jun 6, 2017 at 10:17 AM, Mark Rutland  wrote:
> On Tue, Jun 06, 2017 at 05:13:07PM +, Ard Biesheuvel wrote:
>> (+ Mark, Matt)
>>
>> On 6 June 2017 at 04:52, Kees Cook  wrote:
>> > This avoids CONFIG_FORTIFY_SOURCE from being enabled during the EFI stub
>> > build, as adding a panic() implementation may not work well. This can be
>> > adjusted in the future.
>> >
>> > Suggested-by: Daniel Micay 
>> > Signed-off-by: Kees Cook 
>> > Cc; Matt Fleming 
>> > Cc: Ard Biesheuvel 
>
> I believe for arm64 the immediate breakage is implicitly fixed by the
>  definition, but I agree it makes sense to be explicit
> anyhow.
>
> FWIW:
>
> Acked-by: Mark Rutland 
>
> Kees, as an aside, do you want me to patchify the vdso fixup? Or are
> you going to handle that?

I sent that separately but discovered that my invocation of git
send-email failed to include a CC to you, even though I had it listed
as Suggested-by, etc. I think it's going to get queued for the arm64
tree.

>
> Thanks,
> Mark.
>
>> > ---
>> >  drivers/firmware/efi/libstub/Makefile | 1 +
>> >  1 file changed, 1 insertion(+)
>> >
>> > diff --git a/drivers/firmware/efi/libstub/Makefile 
>> > b/drivers/firmware/efi/libstub/Makefile
>> > index f7425960f6a5..37e24f525162 100644
>> > --- a/drivers/firmware/efi/libstub/Makefile
>> > +++ b/drivers/firmware/efi/libstub/Makefile
>> > @@ -17,6 +17,7 @@ cflags-$(CONFIG_ARM)  := $(subst 
>> > -pg,,$(KBUILD_CFLAGS)) \
>> >  cflags-$(CONFIG_EFI_ARMSTUB)   += -I$(srctree)/scripts/dtc/libfdt
>> >
>> >  KBUILD_CFLAGS  := $(cflags-y) -DDISABLE_BRANCH_PROFILING \
>> > +  -D__NO_FORTIFY \
>> >$(call cc-option,-ffreestanding) \
>> >$(call cc-option,-fno-stack-protector)
>> >
>>
>> Reviewed-by: Ard Biesheuvel 
>>
>> This is unlikely to conflict with anything going through the EFI tree,
>> so feel free to queue it elsewhere.

If it can go through the EFI tree, that'd be great. Less for akpm to wrangle. :)

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v4 04/23] mtd: nand: denali: avoid hard-coding ECC step, strength, bytes

2017-06-06 Thread Masahiro Yamada

Hi Boris,


2017-06-07 7:01 GMT+09:00 Boris Brezillon :
> On Tue,  6 Jun 2017 08:21:43 +0900
> Masahiro Yamada  wrote:
>
>> This driver was originally written for the Intel MRST platform with
>> several platform-specific parameters hard-coded.
>>
>> Currently, the ECC settings are hard-coded as follows:
>>
>>   #define ECC_SECTOR_SIZE 512
>>   #define ECC_8BITS   14
>>   #define ECC_15BITS  26
>>
>> Therefore, the driver can only support two cases.
>>  - ecc.size = 512, ecc.strength = 8--> ecc.bytes = 14
>>  - ecc.size = 512, ecc.strength = 15   --> ecc.bytes = 26
>>
>> However, these are actually customizable parameters, for example,
>> UniPhier platform supports the following:
>>
>>  - ecc.size = 1024, ecc.strength = 8   --> ecc.bytes = 14
>>  - ecc.size = 1024, ecc.strength = 16  --> ecc.bytes = 28
>>  - ecc.size = 1024, ecc.strength = 24  --> ecc.bytes = 42
>>
>> So, we need to handle the ECC parameters in a more generic manner.
>> Fortunately, the Denali User's Guide explains how to calculate the
>> ecc.bytes.  The formula is:
>>
>>   ecc.bytes = 2 * CEIL(13 * ecc.strength / 16)  (for ecc.size = 512)
>>   ecc.bytes = 2 * CEIL(14 * ecc.strength / 16)  (for ecc.size = 1024)
>>
>> For DT platforms, it would be reasonable to allow DT to specify ECC
>> strength by either "nand-ecc-strength" or "nand-ecc-maximize".  If
>> none of them is specified, the driver will try to meet the chip's ECC
>> requirement.
>>
>> For PCI platforms, the max ECC strength is used to keep the original
>> behavior.
>>
>> Newer versions of this IP need ecc.size and ecc.steps explicitly
>> set up via the following registers:
>>   CFG_DATA_BLOCK_SIZE   (0x6b0)
>>   CFG_LAST_DATA_BLOCK_SIZE  (0x6c0)
>>   CFG_NUM_DATA_BLOCKS   (0x6d0)
>>
>> For older IP versions, write accesses to these registers are just
>> ignored.
>>
>> Signed-off-by: Masahiro Yamada 
>> Acked-by: Rob Herring 
>> ---
>>
>> Changes in v4:
>>   - Rewrite by using generic helpers, nand_check_caps(),
>> nand_match_ecc_req(), nand_maximize_ecc().
>>
>> Changes in v3:
>>   - Move DENALI_CAP_ define out of struct denali_nand_info
>>   - Use chip->ecc_step_ds as a hint to choose chip->ecc.size
>> where possible
>>
>> Changes in v2:
>>   - Change the capability prefix DENALI_CAPS_ -> DENALI_CAP_
>>   - Make ECC 512 cap and ECC 1024 cap independent
>>   - Set up three CFG_... registers
>>
>>  .../devicetree/bindings/mtd/denali-nand.txt|   7 ++
>>  drivers/mtd/nand/denali.c  | 103 
>> ++---
>>  drivers/mtd/nand/denali.h  |  11 ++-
>>  drivers/mtd/nand/denali_dt.c   |   8 ++
>>  drivers/mtd/nand/denali_pci.c  |   9 ++
>>  5 files changed, 101 insertions(+), 37 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/mtd/denali-nand.txt 
>> b/Documentation/devicetree/bindings/mtd/denali-nand.txt
>> index e593bbe..b7742a7 100644
>> --- a/Documentation/devicetree/bindings/mtd/denali-nand.txt
>> +++ b/Documentation/devicetree/bindings/mtd/denali-nand.txt
>> @@ -7,6 +7,13 @@ Required properties:
>>- reg-names: Should contain the reg names "nand_data" and "denali_reg"
>>- interrupts : The interrupt number.
>>
>> +Optional properties:
>> +  - nand-ecc-step-size: see nand.txt for details.  If present, the value 
>> must be
>> +  512for "altr,socfpga-denali-nand"
>> +  - nand-ecc-strength: see nand.txt for details.  Valid values are:
>> +  8, 15  for "altr,socfpga-denali-nand"
>> +  - nand-ecc-maximize: see nand.txt for details
>> +
>>  The device tree may optionally contain sub-nodes describing partitions of 
>> the
>>  address space. See partition.txt for more detail.
>>
>> diff --git a/drivers/mtd/nand/denali.c b/drivers/mtd/nand/denali.c
>> index 16634df..3204c51 100644
>> --- a/drivers/mtd/nand/denali.c
>> +++ b/drivers/mtd/nand/denali.c
>> @@ -886,8 +886,6 @@ static int denali_hw_ecc_fixup(struct mtd_info *mtd,
>>   return max_bitflips;
>>  }
>>
>> -#define ECC_SECTOR_SIZE 512
>> -
>>  #define ECC_SECTOR(x)(((x) & ECC_ERROR_ADDRESS__SECTOR_NR) >> 12)
>>  #define ECC_BYTE(x)  (((x) & ECC_ERROR_ADDRESS__OFFSET))
>>  #define ECC_CORRECTION_VALUE(x) ((x) & ERR_CORRECTION_INFO__BYTEMASK)
>> @@ -899,6 +897,7 @@ static int denali_sw_ecc_fixup(struct mtd_info *mtd,
>>  struct denali_nand_info *denali,
>>  unsigned long *uncor_ecc_flags, uint8_t *buf)
>>  {
>> + unsigned int ecc_size = denali->nand.ecc.size;
>>   unsigned int bitflips = 0;
>>   unsigned int max_bitflips = 0;
>>   uint32_t err_addr, err_cor_info;
>> @@ -928,9 +927,9 @@ static int denali_sw_ecc_fixup(struct mtd_info *mtd,
>>* an erased sector.
>>*/
>>   *uncor_ecc_flags |=

Re: [PATCH v4 04/23] mtd: nand: denali: avoid hard-coding ECC step, strength, bytes

2017-06-06 Thread Masahiro Yamada

Hi Boris,


2017-06-07 7:01 GMT+09:00 Boris Brezillon :
> On Tue,  6 Jun 2017 08:21:43 +0900
> Masahiro Yamada  wrote:
>
>> This driver was originally written for the Intel MRST platform with
>> several platform-specific parameters hard-coded.
>>
>> Currently, the ECC settings are hard-coded as follows:
>>
>>   #define ECC_SECTOR_SIZE 512
>>   #define ECC_8BITS   14
>>   #define ECC_15BITS  26
>>
>> Therefore, the driver can only support two cases.
>>  - ecc.size = 512, ecc.strength = 8--> ecc.bytes = 14
>>  - ecc.size = 512, ecc.strength = 15   --> ecc.bytes = 26
>>
>> However, these are actually customizable parameters, for example,
>> UniPhier platform supports the following:
>>
>>  - ecc.size = 1024, ecc.strength = 8   --> ecc.bytes = 14
>>  - ecc.size = 1024, ecc.strength = 16  --> ecc.bytes = 28
>>  - ecc.size = 1024, ecc.strength = 24  --> ecc.bytes = 42
>>
>> So, we need to handle the ECC parameters in a more generic manner.
>> Fortunately, the Denali User's Guide explains how to calculate the
>> ecc.bytes.  The formula is:
>>
>>   ecc.bytes = 2 * CEIL(13 * ecc.strength / 16)  (for ecc.size = 512)
>>   ecc.bytes = 2 * CEIL(14 * ecc.strength / 16)  (for ecc.size = 1024)
>>
>> For DT platforms, it would be reasonable to allow DT to specify ECC
>> strength by either "nand-ecc-strength" or "nand-ecc-maximize".  If
>> none of them is specified, the driver will try to meet the chip's ECC
>> requirement.
>>
>> For PCI platforms, the max ECC strength is used to keep the original
>> behavior.
>>
>> Newer versions of this IP need ecc.size and ecc.steps explicitly
>> set up via the following registers:
>>   CFG_DATA_BLOCK_SIZE   (0x6b0)
>>   CFG_LAST_DATA_BLOCK_SIZE  (0x6c0)
>>   CFG_NUM_DATA_BLOCKS   (0x6d0)
>>
>> For older IP versions, write accesses to these registers are just
>> ignored.
>>
>> Signed-off-by: Masahiro Yamada 
>> Acked-by: Rob Herring 
>> ---
>>
>> Changes in v4:
>>   - Rewrite by using generic helpers, nand_check_caps(),
>> nand_match_ecc_req(), nand_maximize_ecc().
>>
>> Changes in v3:
>>   - Move DENALI_CAP_ define out of struct denali_nand_info
>>   - Use chip->ecc_step_ds as a hint to choose chip->ecc.size
>> where possible
>>
>> Changes in v2:
>>   - Change the capability prefix DENALI_CAPS_ -> DENALI_CAP_
>>   - Make ECC 512 cap and ECC 1024 cap independent
>>   - Set up three CFG_... registers
>>
>>  .../devicetree/bindings/mtd/denali-nand.txt|   7 ++
>>  drivers/mtd/nand/denali.c  | 103 
>> ++---
>>  drivers/mtd/nand/denali.h  |  11 ++-
>>  drivers/mtd/nand/denali_dt.c   |   8 ++
>>  drivers/mtd/nand/denali_pci.c  |   9 ++
>>  5 files changed, 101 insertions(+), 37 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/mtd/denali-nand.txt 
>> b/Documentation/devicetree/bindings/mtd/denali-nand.txt
>> index e593bbe..b7742a7 100644
>> --- a/Documentation/devicetree/bindings/mtd/denali-nand.txt
>> +++ b/Documentation/devicetree/bindings/mtd/denali-nand.txt
>> @@ -7,6 +7,13 @@ Required properties:
>>- reg-names: Should contain the reg names "nand_data" and "denali_reg"
>>- interrupts : The interrupt number.
>>
>> +Optional properties:
>> +  - nand-ecc-step-size: see nand.txt for details.  If present, the value 
>> must be
>> +  512for "altr,socfpga-denali-nand"
>> +  - nand-ecc-strength: see nand.txt for details.  Valid values are:
>> +  8, 15  for "altr,socfpga-denali-nand"
>> +  - nand-ecc-maximize: see nand.txt for details
>> +
>>  The device tree may optionally contain sub-nodes describing partitions of 
>> the
>>  address space. See partition.txt for more detail.
>>
>> diff --git a/drivers/mtd/nand/denali.c b/drivers/mtd/nand/denali.c
>> index 16634df..3204c51 100644
>> --- a/drivers/mtd/nand/denali.c
>> +++ b/drivers/mtd/nand/denali.c
>> @@ -886,8 +886,6 @@ static int denali_hw_ecc_fixup(struct mtd_info *mtd,
>>   return max_bitflips;
>>  }
>>
>> -#define ECC_SECTOR_SIZE 512
>> -
>>  #define ECC_SECTOR(x)(((x) & ECC_ERROR_ADDRESS__SECTOR_NR) >> 12)
>>  #define ECC_BYTE(x)  (((x) & ECC_ERROR_ADDRESS__OFFSET))
>>  #define ECC_CORRECTION_VALUE(x) ((x) & ERR_CORRECTION_INFO__BYTEMASK)
>> @@ -899,6 +897,7 @@ static int denali_sw_ecc_fixup(struct mtd_info *mtd,
>>  struct denali_nand_info *denali,
>>  unsigned long *uncor_ecc_flags, uint8_t *buf)
>>  {
>> + unsigned int ecc_size = denali->nand.ecc.size;
>>   unsigned int bitflips = 0;
>>   unsigned int max_bitflips = 0;
>>   uint32_t err_addr, err_cor_info;
>> @@ -928,9 +927,9 @@ static int denali_sw_ecc_fixup(struct mtd_info *mtd,
>>* an erased sector.
>>*/
>>   *uncor_ecc_flags |= BIT(err_sector);
>> - } else if (err_byte < ECC_SECTOR_SIZE) {
>> + } else if (err_byte < ecc_size) {
>>

Re: [PATCH] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-06-06 Thread zhong jiang

On 2017/6/7 10:53, Minchan Kim wrote:
> Hi,
>
> On Tue, Jun 06, 2017 at 09:00:55PM +0800, zhong jiang wrote:
>> On 2017/1/31 7:40, Minchan Kim wrote:
>>> Hi Vinayak,
>>> Sorry for late response. It was Lunar New Year holidays.
>>>
>>> On Fri, Jan 27, 2017 at 01:43:23PM +0530, vinayak menon wrote:
> Thanks for the explain. However, such case can happen with THP page
> as well as slab. In case of THP page, nr_scanned is 1 but nr_reclaimed
> could be 512 so I think vmpressure should have a logic to prevent undeflow
> regardless of slab shrinking.
>
 I see. Going to send a vmpressure fix. But, wouldn't the THP case
 result in incorrect
 vmpressure reporting even if we fix the vmpressure underflow problem ?
>>> If a THP page is reclaimed, it reports lower pressure due to bigger
>>> reclaim ratio(ie, reclaimed/scanned) compared to normal pages but
>>> it's not a problem, is it? Because VM reclaimed more memory than
>>> expected so memory pressure isn't severe now.
>>   Hi, Minchan
>>
>>   THP lru page is reclaimed, reclaim ratio bigger make sense. but I read the 
>> code, I found
>>   THP is split to normal pages and loop again.  reclaimed pages should not 
>> be bigger
>>than nr_scan.  because of each loop will increase nr_scan counter.
>>  
>>It is likely  I miss something.  you can point out the point please.
> You are absolutely right.
>
> I got confused by nr_scanned from isolate_lru_pages and sc->nr_scanned
> from shrink_page_list.
>
> Thanks.
>
>
> .
>
 Hi, Minchan

 I will send the revert patch shortly. how do you think?

 Thanks
 zhongjiang

Re: [PATCH] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-06-06 Thread zhong jiang

On 2017/6/7 10:53, Minchan Kim wrote:
> Hi,
>
> On Tue, Jun 06, 2017 at 09:00:55PM +0800, zhong jiang wrote:
>> On 2017/1/31 7:40, Minchan Kim wrote:
>>> Hi Vinayak,
>>> Sorry for late response. It was Lunar New Year holidays.
>>>
>>> On Fri, Jan 27, 2017 at 01:43:23PM +0530, vinayak menon wrote:
> Thanks for the explain. However, such case can happen with THP page
> as well as slab. In case of THP page, nr_scanned is 1 but nr_reclaimed
> could be 512 so I think vmpressure should have a logic to prevent undeflow
> regardless of slab shrinking.
>
 I see. Going to send a vmpressure fix. But, wouldn't the THP case
 result in incorrect
 vmpressure reporting even if we fix the vmpressure underflow problem ?
>>> If a THP page is reclaimed, it reports lower pressure due to bigger
>>> reclaim ratio(ie, reclaimed/scanned) compared to normal pages but
>>> it's not a problem, is it? Because VM reclaimed more memory than
>>> expected so memory pressure isn't severe now.
>>   Hi, Minchan
>>
>>   THP lru page is reclaimed, reclaim ratio bigger make sense. but I read the 
>> code, I found
>>   THP is split to normal pages and loop again.  reclaimed pages should not 
>> be bigger
>>than nr_scan.  because of each loop will increase nr_scan counter.
>>  
>>It is likely  I miss something.  you can point out the point please.
> You are absolutely right.
>
> I got confused by nr_scanned from isolate_lru_pages and sc->nr_scanned
> from shrink_page_list.
>
> Thanks.
>
>
> .
>
 Hi, Minchan

 I will send the revert patch shortly. how do you think?

 Thanks
 zhongjiang

Re: [PATCH] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-06-06 Thread Minchan Kim

Hi,

On Tue, Jun 06, 2017 at 09:00:55PM +0800, zhong jiang wrote:
> On 2017/1/31 7:40, Minchan Kim wrote:
> > Hi Vinayak,
> > Sorry for late response. It was Lunar New Year holidays.
> >
> > On Fri, Jan 27, 2017 at 01:43:23PM +0530, vinayak menon wrote:
> >>> Thanks for the explain. However, such case can happen with THP page
> >>> as well as slab. In case of THP page, nr_scanned is 1 but nr_reclaimed
> >>> could be 512 so I think vmpressure should have a logic to prevent undeflow
> >>> regardless of slab shrinking.
> >>>
> >> I see. Going to send a vmpressure fix. But, wouldn't the THP case
> >> result in incorrect
> >> vmpressure reporting even if we fix the vmpressure underflow problem ?
> > If a THP page is reclaimed, it reports lower pressure due to bigger
> > reclaim ratio(ie, reclaimed/scanned) compared to normal pages but
> > it's not a problem, is it? Because VM reclaimed more memory than
> > expected so memory pressure isn't severe now.
>   Hi, Minchan
> 
>   THP lru page is reclaimed, reclaim ratio bigger make sense. but I read the 
> code, I found
>   THP is split to normal pages and loop again.  reclaimed pages should not be 
> bigger
>than nr_scan.  because of each loop will increase nr_scan counter.
>  
>It is likely  I miss something.  you can point out the point please.

You are absolutely right.

I got confused by nr_scanned from isolate_lru_pages and sc->nr_scanned
from shrink_page_list.

Thanks.

Re: [PATCH] mm: vmscan: do not pass reclaimed slab to vmpressure

2017-06-06 Thread Minchan Kim

Hi,

On Tue, Jun 06, 2017 at 09:00:55PM +0800, zhong jiang wrote:
> On 2017/1/31 7:40, Minchan Kim wrote:
> > Hi Vinayak,
> > Sorry for late response. It was Lunar New Year holidays.
> >
> > On Fri, Jan 27, 2017 at 01:43:23PM +0530, vinayak menon wrote:
> >>> Thanks for the explain. However, such case can happen with THP page
> >>> as well as slab. In case of THP page, nr_scanned is 1 but nr_reclaimed
> >>> could be 512 so I think vmpressure should have a logic to prevent undeflow
> >>> regardless of slab shrinking.
> >>>
> >> I see. Going to send a vmpressure fix. But, wouldn't the THP case
> >> result in incorrect
> >> vmpressure reporting even if we fix the vmpressure underflow problem ?
> > If a THP page is reclaimed, it reports lower pressure due to bigger
> > reclaim ratio(ie, reclaimed/scanned) compared to normal pages but
> > it's not a problem, is it? Because VM reclaimed more memory than
> > expected so memory pressure isn't severe now.
>   Hi, Minchan
> 
>   THP lru page is reclaimed, reclaim ratio bigger make sense. but I read the 
> code, I found
>   THP is split to normal pages and loop again.  reclaimed pages should not be 
> bigger
>than nr_scan.  because of each loop will increase nr_scan counter.
>  
>It is likely  I miss something.  you can point out the point please.

You are absolutely right.

I got confused by nr_scanned from isolate_lru_pages and sc->nr_scanned
from shrink_page_list.

Thanks.

[PATCH] sched/idle: Add deferrable vmstat_updater back

2017-06-06 Thread Aubrey Li

Deferrable vmstat_updater was missing in commit c1de45ca831a
("sched/idle: Add support for tasks that inject idle"), add it back

Signed-off-by: Aubrey Li 
---
 kernel/sched/idle.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index ef63adc..6c23e30 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -219,6 +219,7 @@ static void do_idle(void)
 */
 
__current_set_polling();
+   quiet_vmstat();
tick_nohz_idle_enter();
 
while (!need_resched()) {
-- 
2.7.4

[PATCH] sched/idle: Add deferrable vmstat_updater back

2017-06-06 Thread Aubrey Li

Deferrable vmstat_updater was missing in commit c1de45ca831a
("sched/idle: Add support for tasks that inject idle"), add it back

Signed-off-by: Aubrey Li 
---
 kernel/sched/idle.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index ef63adc..6c23e30 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -219,6 +219,7 @@ static void do_idle(void)
 */
 
__current_set_polling();
+   quiet_vmstat();
tick_nohz_idle_enter();
 
while (!need_resched()) {
-- 
2.7.4

Re: [PATCH v3] hwmon: Add support for MAX31785 intelligent fan controller

2017-06-06 Thread Joel Stanley

On Wed, Jun 7, 2017 at 1:50 AM, Matthew Barth
 wrote:
>
> On 06/06/17 8:33 AM, Guenter Roeck wrote:
>>
>> On 06/06/2017 12:02 AM, Andrew Jeffery wrote:
>>> Over and above the features of the original patch is support for a
>>> secondary
>>> rotor measurement value that is provided by MAX31785 chips with a revised
>>> firmware. The feature(s) of the firmware are determined at probe time and
>>> extra
>>> attributes exposed accordingly. Specifically, the MFR_REVISION 0x3040 of
>>> the
>>> firmware supports 'slow' and 'fast' rotor reads. The feature is
>>> implemented by
>>> command 0x90 (READ_FAN_SPEED_1) providing a 4-byte response (with the
>>> 'fast'
>>> measurement in the second word) rather than the 2-bytes response in the
>>> original firmware (MFR_REVISION 0x3030).
>>>
>>
>> Taking the pmbus driver question out, why would this warrant another
>> non-standard
>> attribute outside the ABI ? I could see the desire to replace the 'slow'
>> access
>> with the 'fast' one, but provide two attributes ? No, I don't see the
>> point, sorry,
>> even more so without detailed explanation why the second attribute in
>> addition
>> to the first one would add any value.
>
> In the case of counter-rotating(CR) fans which contain two rotors to provide
> more airflow there are then two tach feedbacks. These CR fans take a single
> target speed and provide individual feedbacks for each rotor contained
> within the fan enclosure. Providing these individual feedbacks assists in
> fan fault driven speed changes, improved thermal characterization among
> other things.
>
> Maxim provided this as a 'slow' / 'fast' set of bytes to be user
> compatible(so the 'slow' rotor speed, regardless of which rotor, is in the
> first 2 bytes with the 'slow' version of firmware as well). In some cases,
> mfg systems could have a mix of these revisions.

Andrew, instead of creating the _fast sysfs nodes, I think you could
expose the extra rotors are separate fan devices in sysfs. This would
not introduce new ABI.

Guenter, would this be acceptable to you?

Cheers,

Joel


>
>>
>>> This feature is not documented in the public datasheet[1].
>>>
>>> [1] https://datasheets.maximintegrated.com/en/ds/MAX31785.pdf
>>>
>>> The need to read a 4-byte value drives the addition of a helper that is a
>>> cut-down version of i2c_smbus_xfer_emulated(), as 4-byte transactions
>>> aren't a
>>> defined transaction type in the PMBus spec. This seemed more tasteful
>>> than
>>> hacking the PMBus core to support the quirks of a single device.
>>>
>>
>> That is why we have PMBus helper drivers.
>>
>> Guenter
>>
>>> Also changed from Timothy's original posting is I've massaged the locking
>>> a bit
>>> and removed what seemed to be a copy/paste bug around
>>> max31785_fan_set_pulses()
>>> setting the fan_command member.
>>>
>>> Tested on an IBM Witherspoon machine.
>>>
>>> Cheers,
>>>
>>> Andrew
>>>
>>>   Documentation/hwmon/max31785 |  44 +++
>>>   drivers/hwmon/Kconfig|  10 +
>>>   drivers/hwmon/Makefile   |   1 +
>>>   drivers/hwmon/max31785.c | 824
>>> +++
>>>   4 files changed, 879 insertions(+)
>>>   create mode 100644 Documentation/hwmon/max31785
>>>   create mode 100644 drivers/hwmon/max31785.c
>>>
>>> diff --git a/Documentation/hwmon/max31785 b/Documentation/hwmon/max31785
>>> new file mode 100644
>>> index ..dd891c06401e
>>> --- /dev/null
>>> +++ b/Documentation/hwmon/max31785
>>> @@ -0,0 +1,44 @@
>>> +Kernel driver max31785
>>> +==
>>> +
>>> +Supported chips:
>>> +  * Maxim MAX31785
>>> +Prefix: 'max31785'
>>> +Addresses scanned: 0x52 0x53 0x54 0x55
>>> +Datasheet: http://pdfserv.maximintegrated.com/en/ds/MAX31785.pdf
>>> +
>>> +Author: Timothy Pearson 
>>> +
>>> +
>>> +Description
>>> +---
>>> +
>>> +This driver implements support for the Maxim MAX31785 chip.
>>> +
>>> +The MAX31785 controls the speeds of up to six fans using six independent
>>> +PWM outputs. The desired fan speeds (or PWM duty cycles) are written
>>> +through the I2C interface. The outputs drive "4-wire" fans directly,
>>> +or can be used to modulate the fan's power terminals using an external
>>> +pass transistor.
>>> +
>>> +Tachometer inputs monitor fan tachometer logic outputs for precise
>>> (+/-1%)
>>> +monitoring and control of fan RPM as well as detection of fan failure.
>>> +
>>> +
>>> +Sysfs entries
>>> +-
>>> +
>>> +fan[1-6]_input   RO  fan tachometer speed in RPM
>>> +fan[1-6]_fault   RO  fan experienced fault
>>> +fan[1-6]_pulses  RW  tachometer pulses per fan revolution
>>> +fan[1-6]_target  RW  desired fan speed in RPM
>>> +pwm[1-6]_enable  RW  pwm mode, 0=disabled, 1=pwm, 2=rpm,
>>> 3=automatic
>>> +pwm[1-6] RW  fan target duty cycle (0-255)
>>> +
>>> +Dynamic sysfs entries
>>> +
>>> +
>>> +Whether these entries are

Re: [PATCH v3] hwmon: Add support for MAX31785 intelligent fan controller

2017-06-06 Thread Joel Stanley

On Wed, Jun 7, 2017 at 1:50 AM, Matthew Barth
 wrote:
>
> On 06/06/17 8:33 AM, Guenter Roeck wrote:
>>
>> On 06/06/2017 12:02 AM, Andrew Jeffery wrote:
>>> Over and above the features of the original patch is support for a
>>> secondary
>>> rotor measurement value that is provided by MAX31785 chips with a revised
>>> firmware. The feature(s) of the firmware are determined at probe time and
>>> extra
>>> attributes exposed accordingly. Specifically, the MFR_REVISION 0x3040 of
>>> the
>>> firmware supports 'slow' and 'fast' rotor reads. The feature is
>>> implemented by
>>> command 0x90 (READ_FAN_SPEED_1) providing a 4-byte response (with the
>>> 'fast'
>>> measurement in the second word) rather than the 2-bytes response in the
>>> original firmware (MFR_REVISION 0x3030).
>>>
>>
>> Taking the pmbus driver question out, why would this warrant another
>> non-standard
>> attribute outside the ABI ? I could see the desire to replace the 'slow'
>> access
>> with the 'fast' one, but provide two attributes ? No, I don't see the
>> point, sorry,
>> even more so without detailed explanation why the second attribute in
>> addition
>> to the first one would add any value.
>
> In the case of counter-rotating(CR) fans which contain two rotors to provide
> more airflow there are then two tach feedbacks. These CR fans take a single
> target speed and provide individual feedbacks for each rotor contained
> within the fan enclosure. Providing these individual feedbacks assists in
> fan fault driven speed changes, improved thermal characterization among
> other things.
>
> Maxim provided this as a 'slow' / 'fast' set of bytes to be user
> compatible(so the 'slow' rotor speed, regardless of which rotor, is in the
> first 2 bytes with the 'slow' version of firmware as well). In some cases,
> mfg systems could have a mix of these revisions.

Andrew, instead of creating the _fast sysfs nodes, I think you could
expose the extra rotors are separate fan devices in sysfs. This would
not introduce new ABI.

Guenter, would this be acceptable to you?

Cheers,

Joel


>
>>
>>> This feature is not documented in the public datasheet[1].
>>>
>>> [1] https://datasheets.maximintegrated.com/en/ds/MAX31785.pdf
>>>
>>> The need to read a 4-byte value drives the addition of a helper that is a
>>> cut-down version of i2c_smbus_xfer_emulated(), as 4-byte transactions
>>> aren't a
>>> defined transaction type in the PMBus spec. This seemed more tasteful
>>> than
>>> hacking the PMBus core to support the quirks of a single device.
>>>
>>
>> That is why we have PMBus helper drivers.
>>
>> Guenter
>>
>>> Also changed from Timothy's original posting is I've massaged the locking
>>> a bit
>>> and removed what seemed to be a copy/paste bug around
>>> max31785_fan_set_pulses()
>>> setting the fan_command member.
>>>
>>> Tested on an IBM Witherspoon machine.
>>>
>>> Cheers,
>>>
>>> Andrew
>>>
>>>   Documentation/hwmon/max31785 |  44 +++
>>>   drivers/hwmon/Kconfig|  10 +
>>>   drivers/hwmon/Makefile   |   1 +
>>>   drivers/hwmon/max31785.c | 824
>>> +++
>>>   4 files changed, 879 insertions(+)
>>>   create mode 100644 Documentation/hwmon/max31785
>>>   create mode 100644 drivers/hwmon/max31785.c
>>>
>>> diff --git a/Documentation/hwmon/max31785 b/Documentation/hwmon/max31785
>>> new file mode 100644
>>> index ..dd891c06401e
>>> --- /dev/null
>>> +++ b/Documentation/hwmon/max31785
>>> @@ -0,0 +1,44 @@
>>> +Kernel driver max31785
>>> +==
>>> +
>>> +Supported chips:
>>> +  * Maxim MAX31785
>>> +Prefix: 'max31785'
>>> +Addresses scanned: 0x52 0x53 0x54 0x55
>>> +Datasheet: http://pdfserv.maximintegrated.com/en/ds/MAX31785.pdf
>>> +
>>> +Author: Timothy Pearson 
>>> +
>>> +
>>> +Description
>>> +---
>>> +
>>> +This driver implements support for the Maxim MAX31785 chip.
>>> +
>>> +The MAX31785 controls the speeds of up to six fans using six independent
>>> +PWM outputs. The desired fan speeds (or PWM duty cycles) are written
>>> +through the I2C interface. The outputs drive "4-wire" fans directly,
>>> +or can be used to modulate the fan's power terminals using an external
>>> +pass transistor.
>>> +
>>> +Tachometer inputs monitor fan tachometer logic outputs for precise
>>> (+/-1%)
>>> +monitoring and control of fan RPM as well as detection of fan failure.
>>> +
>>> +
>>> +Sysfs entries
>>> +-
>>> +
>>> +fan[1-6]_input   RO  fan tachometer speed in RPM
>>> +fan[1-6]_fault   RO  fan experienced fault
>>> +fan[1-6]_pulses  RW  tachometer pulses per fan revolution
>>> +fan[1-6]_target  RW  desired fan speed in RPM
>>> +pwm[1-6]_enable  RW  pwm mode, 0=disabled, 1=pwm, 2=rpm,
>>> 3=automatic
>>> +pwm[1-6] RW  fan target duty cycle (0-255)
>>> +
>>> +Dynamic sysfs entries
>>> +
>>> +
>>> +Whether these entries are present depends on the firmware features
>>> detected on
>>>

[PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-06-06 Thread Hoeun Ryu

 Reading TTBCR in early boot stage might return the value of the previous
kernel's configuration, especially in case of kexec. For example, if
normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <=
PAGE_OFFSET and crash kernel (second kernel) is running on a configuration
PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
reserved area for crash kernel, reading TTBCR and using the value to OR
other bit fields might be risky because it doesn't have a reset value for
TTBCR.

Suggested-by: Robin Murphy 
Signed-off-by: Hoeun Ryu 
---

* v1: amended based on
  - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
 PHYS_OFFSET > PAGE_OFFSET"
  - https://lkml.org/lkml/2017/6/5/239

 arch/arm/mm/proc-v7-3level.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 5e5720e..7d16bbc 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
.macro  v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
ldr \tmp, =swapper_pg_dir   @ swapper_pg_dir virtual address
cmp \ttbr1, \tmp, lsr #12   @ PHYS_OFFSET > PAGE_OFFSET?
-   mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister
-   orr \tmp, \tmp, #TTB_EAE
+   mov \tmp, #TTB_EAE  @ for TTB control egister
ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP)
ALT_UP(orr  \tmp, \tmp, #TTB_FLAGS_UP)
ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)
-- 
2.7.4

[PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-06-06 Thread Hoeun Ryu

 Reading TTBCR in early boot stage might return the value of the previous
kernel's configuration, especially in case of kexec. For example, if
normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <=
PAGE_OFFSET and crash kernel (second kernel) is running on a configuration
PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
reserved area for crash kernel, reading TTBCR and using the value to OR
other bit fields might be risky because it doesn't have a reset value for
TTBCR.

Suggested-by: Robin Murphy 
Signed-off-by: Hoeun Ryu 
---

* v1: amended based on
  - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
 PHYS_OFFSET > PAGE_OFFSET"
  - https://lkml.org/lkml/2017/6/5/239

 arch/arm/mm/proc-v7-3level.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 5e5720e..7d16bbc 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
.macro  v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
ldr \tmp, =swapper_pg_dir   @ swapper_pg_dir virtual address
cmp \ttbr1, \tmp, lsr #12   @ PHYS_OFFSET > PAGE_OFFSET?
-   mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister
-   orr \tmp, \tmp, #TTB_EAE
+   mov \tmp, #TTB_EAE  @ for TTB control egister
ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP)
ALT_UP(orr  \tmp, \tmp, #TTB_FLAGS_UP)
ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)
-- 
2.7.4

Re: [lkp-robot] [tty] 52b03e644f: calltrace:flush_to_ldisc

2017-06-06 Thread Mike Galbraith

On Wed, 2017-06-07 at 10:07 +0800, kernel test robot wrote:
> FYI, we noticed the following commit:
> 
> commit: 52b03e644f702dbcb252f6aec92fc0f0d6e29f78 ("tty: fix port buffer 
> locking V2")
> url: 
> https://github.com/0day-ci/linux/commits/Mike-Galbraith/tty-fix-port-buffer-locking-V2/20170605-134432
> base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing

?  Patchlet has been properly vaporized, no longer exists.

-Mike

Re: [lkp-robot] [tty] 52b03e644f: calltrace:flush_to_ldisc

2017-06-06 Thread Mike Galbraith

On Wed, 2017-06-07 at 10:07 +0800, kernel test robot wrote:
> FYI, we noticed the following commit:
> 
> commit: 52b03e644f702dbcb252f6aec92fc0f0d6e29f78 ("tty: fix port buffer 
> locking V2")
> url: 
> https://github.com/0day-ci/linux/commits/Mike-Galbraith/tty-fix-port-buffer-locking-V2/20170605-134432
> base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing

?  Patchlet has been properly vaporized, no longer exists.

-Mike

Re: double call identical release when there is a race hitting

2017-06-06 Thread zhong jiang

On 2017/6/6 23:56, Oleg Nesterov wrote:
> I can't answer authoritatively, but
>
> On 06/06, zhong jiang wrote:
>> Hi
>>
>> when I review the code, I find the following scenario will lead to a race ,
>> but I am not sure whether the real issue will hit or not.
>>
>> cpu1  
>> cpu2
>> exit_mmap   
>> mmu_notifier_unregister
>>__mmu_notifier_release srcu_read_lock
>> srcu_read_lock
>> mm->ops->release(mn, mm) mm->ops->release(mn,mm)
>>srcu_read_unlock 
>> srcu_read_unlock
>>
>>
>> obviously,  the specified mm will call identical release function when
>> the related condition satisfy.  is it right?
> I think you are right, this is possible, perhaps the comments should mention
> this explicitly.
>
> See the changelog in d34883d4e35c0a994e91dd847a82b4c9e0c31d83 "mm: 
> mmu_notifier:
> re-fix freed page still mapped in secondary MMU":
>
>   "multiple ->release() callouts", we needn't care it too much ...
>
> Oleg.
>
>
> .
>
Thank you for clarification.
 yes,  I see that the author admit that this is a issue.   The patch describe 
that it is really rare.
 Anyway, this issue should be fixed in a separate patch.

but so far  the issue still exist unfortunately.

Regards
zhongjiang

Re: double call identical release when there is a race hitting

2017-06-06 Thread zhong jiang

On 2017/6/6 23:56, Oleg Nesterov wrote:
> I can't answer authoritatively, but
>
> On 06/06, zhong jiang wrote:
>> Hi
>>
>> when I review the code, I find the following scenario will lead to a race ,
>> but I am not sure whether the real issue will hit or not.
>>
>> cpu1  
>> cpu2
>> exit_mmap   
>> mmu_notifier_unregister
>>__mmu_notifier_release srcu_read_lock
>> srcu_read_lock
>> mm->ops->release(mn, mm) mm->ops->release(mn,mm)
>>srcu_read_unlock 
>> srcu_read_unlock
>>
>>
>> obviously,  the specified mm will call identical release function when
>> the related condition satisfy.  is it right?
> I think you are right, this is possible, perhaps the comments should mention
> this explicitly.
>
> See the changelog in d34883d4e35c0a994e91dd847a82b4c9e0c31d83 "mm: 
> mmu_notifier:
> re-fix freed page still mapped in secondary MMU":
>
>   "multiple ->release() callouts", we needn't care it too much ...
>
> Oleg.
>
>
> .
>
Thank you for clarification.
 yes,  I see that the author admit that this is a issue.   The patch describe 
that it is really rare.
 Anyway, this issue should be fixed in a separate patch.

but so far  the issue still exist unfortunately.

Regards
zhongjiang

Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic

2017-06-06 Thread Wei Yang

On Tue, Jun 06, 2017 at 02:03:15PM +0200, Michal Hocko wrote:
>On Tue 06-06-17 11:04:01, Wei Yang wrote:
>> On Mon, Jun 05, 2017 at 08:43:43AM +0200, Michal Hocko wrote:
>> >On Sat 03-06-17 10:24:40, Wei Yang wrote:
>> >> Hi, Michal
>> >> 
>> >> Just go through your patch.
>> >> 
>> >> I have one question and one suggestion as below.
>> >> 
>> >> One suggestion:
>> >> 
>> >> This patch does two things to me:
>> >> 1. Replace __GFP_REPEAT with __GFP_RETRY_MAYFAIL
>> >> 2. Adjust the logic in page_alloc to provide the middle semantic
>> >> 
>> >> My suggestion is to split these two task into two patches, so that readers
>> >> could catch your fundamental logic change easily.
>> >
>> >Well, the rename and the change is intentionally tight together. My
>> >previous patches have removed all __GFP_REPEAT users for low order
>> >requests which didn't have any implemented semantic. So as of now we
>> >should only have those users which semantic will not change. I do not
>> >add any new low order user in this patch so it in fact doesn't change
>> >any existing semnatic.
>> >
>> >> 
>> >> On Tue, Mar 07, 2017 at 04:48:41PM +0100, Michal Hocko wrote:
>> >> >From: Michal Hocko 
>> >[...]
>> >> >@@ -3776,9 +3784,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned 
>> >> >int order,
>> >> > 
>> >> > /*
>> >> >  * Do not retry costly high order allocations unless they are
>> >> >- * __GFP_REPEAT
>> >> >+ * __GFP_RETRY_MAYFAIL
>> >> >  */
>> >> >-if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & 
>> >> >__GFP_REPEAT))
>> >> >+if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & 
>> >> >__GFP_RETRY_MAYFAIL))
>> >> > goto nopage;
>> >> 
>> >> One question:
>> >> 
>> >> From your change log, it mentions will provide the same semantic for 
>> >> !costly
>> >> allocations. While the logic here is the same as before.
>> >> 
>> >> For a !costly allocation with __GFP_REPEAT flag, the difference after this
>> >> patch is no OOM will be invoked, while it will still continue in the loop.
>> >
>> >Not really. There are two things. The above will shortcut retrying if
>> >there is _no_ __GFP_RETRY_MAYFAIL. If the flags _is_ specified we will
>> >back of in __alloc_pages_may_oom.
>> > 
>> >> Maybe I don't catch your point in this message:
>> >> 
>> >>   __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
>> >>   the page allocator. This has been true but only for allocations requests
>> >>   larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for
>> >>   smaller sizes. This is a bit unfortunate because there is no way to
>> >>   express the same semantic for those requests and they are considered too
>> >>   important to fail so they might end up looping in the page allocator for
>> >>   ever, similarly to GFP_NOFAIL requests.
>> >> 
>> >> I thought you will provide the same semantic to !costly allocation, or I
>> >> misunderstand?
>> >
>> >yes and that is the case. __alloc_pages_may_oom will back off before OOM
>> >killer is invoked and the allocator slow path will fail because
>> >did_some_progress == 0;
>> 
>> Thanks for your explanation.
>> 
>> So same "semantic" doesn't mean same "behavior".
>> 1. costly allocations will pick up the shut cut
>
>yes and there are no such allocations yet (based on my previous
>cleanups)
>
>> 2. !costly allocations will try something more but finally fail without
>> invoking OOM.
>
>no, the behavior will not change for those.
> 

Hmm... Let me be more specific. With two factors, costly or not, flag set or
not, we have four combinations. Here it is classified into two categories.

1. __GFP_RETRY_MAYFAIL not set

Brief description on behavior:
costly: pick up the shortcut, so no OOM
!costly: no shortcut and will OOM I think

Impact from this patch set:
No.

My personal understanding:
The allocation without __GFP_RETRY_MAYFAIL is not effected by this patch
set.  Since !costly allocation will trigger OOM, this is the reason why
"small allocations never fail _practically_", as mentioned in
https://lwn.net/Articles/723317/.


3. __GFP_RETRY_MAYFAIL set

Brief description on behavior:
costly/!costly: no shortcut here and no OOM invoked

Impact from this patch set:
For those allocations with __GFP_RETRY_MAYFAIL, OOM is not invoked for
both.

My personal understanding:
This is the semantic you are willing to introduce in this patch set. By
cutting off the OOM invoke when __GFP_RETRY_MAYFAIL is set, you makes this
a middle situation between NOFAIL and NORETRY.

page_alloc will try some luck to get some free pages without disturb other
part of the system. By doing so, the never fail allocation for !costly
pages will be "fixed". If I understand correctly, you are willing to make
this the default behavior in the future?

>> Hope this time I catch your point.
>> 
>> BTW, did_some_progress mostly means the OOM works to

Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic

2017-06-06 Thread Wei Yang

On Tue, Jun 06, 2017 at 02:03:15PM +0200, Michal Hocko wrote:
>On Tue 06-06-17 11:04:01, Wei Yang wrote:
>> On Mon, Jun 05, 2017 at 08:43:43AM +0200, Michal Hocko wrote:
>> >On Sat 03-06-17 10:24:40, Wei Yang wrote:
>> >> Hi, Michal
>> >> 
>> >> Just go through your patch.
>> >> 
>> >> I have one question and one suggestion as below.
>> >> 
>> >> One suggestion:
>> >> 
>> >> This patch does two things to me:
>> >> 1. Replace __GFP_REPEAT with __GFP_RETRY_MAYFAIL
>> >> 2. Adjust the logic in page_alloc to provide the middle semantic
>> >> 
>> >> My suggestion is to split these two task into two patches, so that readers
>> >> could catch your fundamental logic change easily.
>> >
>> >Well, the rename and the change is intentionally tight together. My
>> >previous patches have removed all __GFP_REPEAT users for low order
>> >requests which didn't have any implemented semantic. So as of now we
>> >should only have those users which semantic will not change. I do not
>> >add any new low order user in this patch so it in fact doesn't change
>> >any existing semnatic.
>> >
>> >> 
>> >> On Tue, Mar 07, 2017 at 04:48:41PM +0100, Michal Hocko wrote:
>> >> >From: Michal Hocko 
>> >[...]
>> >> >@@ -3776,9 +3784,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned 
>> >> >int order,
>> >> > 
>> >> > /*
>> >> >  * Do not retry costly high order allocations unless they are
>> >> >- * __GFP_REPEAT
>> >> >+ * __GFP_RETRY_MAYFAIL
>> >> >  */
>> >> >-if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & 
>> >> >__GFP_REPEAT))
>> >> >+if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & 
>> >> >__GFP_RETRY_MAYFAIL))
>> >> > goto nopage;
>> >> 
>> >> One question:
>> >> 
>> >> From your change log, it mentions will provide the same semantic for 
>> >> !costly
>> >> allocations. While the logic here is the same as before.
>> >> 
>> >> For a !costly allocation with __GFP_REPEAT flag, the difference after this
>> >> patch is no OOM will be invoked, while it will still continue in the loop.
>> >
>> >Not really. There are two things. The above will shortcut retrying if
>> >there is _no_ __GFP_RETRY_MAYFAIL. If the flags _is_ specified we will
>> >back of in __alloc_pages_may_oom.
>> > 
>> >> Maybe I don't catch your point in this message:
>> >> 
>> >>   __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
>> >>   the page allocator. This has been true but only for allocations requests
>> >>   larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for
>> >>   smaller sizes. This is a bit unfortunate because there is no way to
>> >>   express the same semantic for those requests and they are considered too
>> >>   important to fail so they might end up looping in the page allocator for
>> >>   ever, similarly to GFP_NOFAIL requests.
>> >> 
>> >> I thought you will provide the same semantic to !costly allocation, or I
>> >> misunderstand?
>> >
>> >yes and that is the case. __alloc_pages_may_oom will back off before OOM
>> >killer is invoked and the allocator slow path will fail because
>> >did_some_progress == 0;
>> 
>> Thanks for your explanation.
>> 
>> So same "semantic" doesn't mean same "behavior".
>> 1. costly allocations will pick up the shut cut
>
>yes and there are no such allocations yet (based on my previous
>cleanups)
>
>> 2. !costly allocations will try something more but finally fail without
>> invoking OOM.
>
>no, the behavior will not change for those.
> 

Hmm... Let me be more specific. With two factors, costly or not, flag set or
not, we have four combinations. Here it is classified into two categories.

1. __GFP_RETRY_MAYFAIL not set

Brief description on behavior:
costly: pick up the shortcut, so no OOM
!costly: no shortcut and will OOM I think

Impact from this patch set:
No.

My personal understanding:
The allocation without __GFP_RETRY_MAYFAIL is not effected by this patch
set.  Since !costly allocation will trigger OOM, this is the reason why
"small allocations never fail _practically_", as mentioned in
https://lwn.net/Articles/723317/.


3. __GFP_RETRY_MAYFAIL set

Brief description on behavior:
costly/!costly: no shortcut here and no OOM invoked

Impact from this patch set:
For those allocations with __GFP_RETRY_MAYFAIL, OOM is not invoked for
both.

My personal understanding:
This is the semantic you are willing to introduce in this patch set. By
cutting off the OOM invoke when __GFP_RETRY_MAYFAIL is set, you makes this
a middle situation between NOFAIL and NORETRY.

page_alloc will try some luck to get some free pages without disturb other
part of the system. By doing so, the never fail allocation for !costly
pages will be "fixed". If I understand correctly, you are willing to make
this the default behavior in the future?

>> Hope this time I catch your point.
>> 
>> BTW, did_some_progress mostly means the OOM works to me. Are there

[PATCH] autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL

2017-06-06 Thread NeilBrown


If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL
ioctl, autofs4_d_automount() will return
   ERR_PTR(status)
with that status to follow_automount(), which will then
dereference an invalid pointer.

So treat a positive status the same as zero, and map
to ENOENT.

See comment in systemd src/core/automount.c::automount_send_ready().

Signed-off-by: NeilBrown 
---
 fs/autofs4/dev-ioctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 734cbf8d9676..dd9f1bebb5a3 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -344,7 +344,7 @@ static int autofs_dev_ioctl_fail(struct file *fp,
int status;
 
token = (autofs_wqt_t) param->fail.token;
-   status = param->fail.status ? param->fail.status : -ENOENT;
+   status = param->fail.status < 0 ? param->fail.status : -ENOENT;
return autofs4_wait_release(sbi, token, status);
 }
 
-- 
2.12.2



signature.asc
Description: PGP signature

[PATCH] autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL

2017-06-06 Thread NeilBrown


If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL
ioctl, autofs4_d_automount() will return
   ERR_PTR(status)
with that status to follow_automount(), which will then
dereference an invalid pointer.

So treat a positive status the same as zero, and map
to ENOENT.

See comment in systemd src/core/automount.c::automount_send_ready().

Signed-off-by: NeilBrown 
---
 fs/autofs4/dev-ioctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 734cbf8d9676..dd9f1bebb5a3 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -344,7 +344,7 @@ static int autofs_dev_ioctl_fail(struct file *fp,
int status;
 
token = (autofs_wqt_t) param->fail.token;
-   status = param->fail.status ? param->fail.status : -ENOENT;
+   status = param->fail.status < 0 ? param->fail.status : -ENOENT;
return autofs4_wait_release(sbi, token, status);
 }
 
-- 
2.12.2



signature.asc
Description: PGP signature

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2164 matches

Mail list logo