Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-28 Thread Sedat Dilek
On Mon, Sep 28, 2015 at 8:50 AM, Sedat Dilek  wrote:
> [ CC only relevant people plus Paul as he took care in another thread ]
>
> First of all, sorry for flooding anybody or any mailing-list.
>
> Of course, using LLVM/Clang for the Linux-kernel is still WIP, but
> this does not mean using a different compiler does not find any
> bugs...
>
> Fascinated somehow of this problem I entered the "tracing" area.
> I played a bit with tracing and irq-flags (hardirqs).
>
> This is what I discovered now.
>
> [ TEST-1 ]
>
> --- a/drivers/hid/usbhid/hid-core.c
> +++ b/drivers/hid/usbhid/hid-core.c
> @@ -731,6 +731,7 @@ void usbhid_close(struct hid_device *hid)
>  */
> spin_lock_irq(>lock);
> if (!--hid->open) {
> +   trace_hardirqs_off();
> spin_unlock_irq(>lock);
> hid_cancel_delayed_stuff(usbhid);
> if (!(hid->quirks & HID_QUIRK_ALWAYS_POLL)) {
> @@ -1392,6 +1393,8 @@ static void usbhid_disconnect(struct usb_interface 
> *intf)
>
>  static void hid_cancel_delayed_stuff(struct usbhid_device *usbhid)
>  {
> +   might_sleep();
> +
> del_timer_sync(>io_retry);
> cancel_work_sync(>reset_work);
>  }
>
> This makes the BUG line go away - only changes in the HID area.
>
> [ TEST-2 ]
>
> --- a/drivers/hid/usbhid/hid-core.c
> +++ b/drivers/hid/usbhid/hid-core.c
> @@ -732,6 +732,7 @@ void usbhid_close(struct hid_device *hid)
> spin_lock_irq(>lock);
> if (!--hid->open) {
> spin_unlock_irq(>lock);
> +   trace_hardirqs_off();
> hid_cancel_delayed_stuff(usbhid);
> if (!(hid->quirks & HID_QUIRK_ALWAYS_POLL)) {
> usb_kill_urb(usbhid->urbin);
> @@ -1392,6 +1393,8 @@ static void usbhid_disconnect(struct usb_interface 
> *intf)
>
>  static void hid_cancel_delayed_stuff(struct usbhid_device *usbhid)
>  {
> +   might_sleep();
> +
> del_timer_sync(>io_retry);
> cancel_work_sync(>reset_work);
>  }
>
> This results in...
>
> [   23.236024] BUG: sleeping function called from invalid context at
> drivers/hid/usbhid/hid-core.c:1396
> [   23.236148] in_atomic(): 0, irqs_disabled(): 1, pid: 1419, name: acpid
> [   23.236233] 3 locks held by acpid/1419:
> [   23.236235]  #0:  (>mutex){+.+...}, at: []
> evdev_release+0xbc/0xf0
> [   23.236250]  #1:  (>mutex#2){+.+...}, at: []
> input_close_device+0x27/0x70
> [   23.236264]  #2:  (hid_open_mut){+.+...}, at: []
> usbhid_close+0x28/0xd0 [usbhid]
> [   23.236277] irq event stamp: 3406
> [   23.236280] hardirqs last  enabled at (3405): []
> _raw_spin_unlock_irq+0x32/0x60
> [   23.236286] hardirqs last disabled at (3406): []
> usbhid_close+0x5d/0xd0 [usbhid]
> [   23.236292] softirqs last  enabled at (2806): []
> local_bh_enable+0x9/0x20
> [   23.236298] softirqs last disabled at (2804): []
> local_bh_disable+0x9/0x20
> [   23.236306] CPU: 3 PID: 1419 Comm: acpid Not tainted
> 4.3.0-rc3-4-llvmlinux-amd64 #3
> [   23.236309] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [   23.236313]  8800c760b000 0096 
> 8800bce53c88
> [   23.236320]  8149cabd 8800bce53cb8 810cd5aa
> a009780c
> [   23.236328]  8800c5fd5740  0574
> 8800bce53cf8
> [   23.236334] Call Trace:
> [   23.236342]  [] dump_stack+0x7d/0xa0
> [   23.236347]  [] ___might_sleep+0x28a/0x2a0
> [   23.236351]  [] __might_sleep+0x4f/0xc0
> [   23.236357]  [] usbhid_close+0x75/0xd0 [usbhid]
> [   23.236366]  [] hidinput_close+0x31/0x40 [hid]
> [   23.236373]  [] ? hidinput_open+0x40/0x40 [hid]
> [   23.236378]  [] input_close_device+0x48/0x70
> [   23.236382]  [] evdev_release+0xd6/0xf0
> [   23.236387]  [] __fput+0x107/0x240
> [   23.236392]  [] fput+0x16/0x20
> [   23.236397]  [] task_work_run+0x6c/0xe0
> [   23.236403]  [] prepare_exit_to_usermode+0x13a/0x140
> [   23.236408]  [] syscall_return_slowpath+0x281/0x2f0
> [   23.236414]  [] ? filp_close+0x65/0x90
> [   23.236420]  [] ? trace_hardirqs_on_caller+0x19/0x290
> [   23.236425]  [] ? trace_hardirqs_on_thunk+0x17/0x19
> [   23.236431]  [] int_ret_from_sys_call+0x25/0x9f
>
> I will not do a statement :-).
>
> Looking at TEST-1, the spin_unlock_irq() within the if-statement seems
> to do cause problems with tracing / irq-flags.
>
> I have added my kernel-config, llvmlinux patchset, diffs with
> corresponding dmesg-logs (look into the attached tarball).
>
> Hope this really helps you.
>

I looked over some commits which had problems with tracing and
irq-flags (hardirq)...

[ TEST-3 ]

I am not an locking expert, when I replace...

spin_{un}lock_irq() with spin_{un}lock_bh()

...within usbhid_close() in drivers/hid/usbhid/hid-core.c...

This makes the BUG line go away for me.

--- a/drivers/hid/usbhid/hid-core.c
+++ b/drivers/hid/usbhid/hid-core.c
@@ -729,16 +729,16 @@ void usbhid_close(struct 

Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-28 Thread Sedat Dilek
[ CC only relevant people plus Paul as he took care in another thread ]

First of all, sorry for flooding anybody or any mailing-list.

Of course, using LLVM/Clang for the Linux-kernel is still WIP, but
this does not mean using a different compiler does not find any
bugs...

Fascinated somehow of this problem I entered the "tracing" area.
I played a bit with tracing and irq-flags (hardirqs).

This is what I discovered now.

[ TEST-1 ]

--- a/drivers/hid/usbhid/hid-core.c
+++ b/drivers/hid/usbhid/hid-core.c
@@ -731,6 +731,7 @@ void usbhid_close(struct hid_device *hid)
 */
spin_lock_irq(>lock);
if (!--hid->open) {
+   trace_hardirqs_off();
spin_unlock_irq(>lock);
hid_cancel_delayed_stuff(usbhid);
if (!(hid->quirks & HID_QUIRK_ALWAYS_POLL)) {
@@ -1392,6 +1393,8 @@ static void usbhid_disconnect(struct usb_interface *intf)

 static void hid_cancel_delayed_stuff(struct usbhid_device *usbhid)
 {
+   might_sleep();
+
del_timer_sync(>io_retry);
cancel_work_sync(>reset_work);
 }

This makes the BUG line go away - only changes in the HID area.

[ TEST-2 ]

--- a/drivers/hid/usbhid/hid-core.c
+++ b/drivers/hid/usbhid/hid-core.c
@@ -732,6 +732,7 @@ void usbhid_close(struct hid_device *hid)
spin_lock_irq(>lock);
if (!--hid->open) {
spin_unlock_irq(>lock);
+   trace_hardirqs_off();
hid_cancel_delayed_stuff(usbhid);
if (!(hid->quirks & HID_QUIRK_ALWAYS_POLL)) {
usb_kill_urb(usbhid->urbin);
@@ -1392,6 +1393,8 @@ static void usbhid_disconnect(struct usb_interface *intf)

 static void hid_cancel_delayed_stuff(struct usbhid_device *usbhid)
 {
+   might_sleep();
+
del_timer_sync(>io_retry);
cancel_work_sync(>reset_work);
 }

This results in...

[   23.236024] BUG: sleeping function called from invalid context at
drivers/hid/usbhid/hid-core.c:1396
[   23.236148] in_atomic(): 0, irqs_disabled(): 1, pid: 1419, name: acpid
[   23.236233] 3 locks held by acpid/1419:
[   23.236235]  #0:  (>mutex){+.+...}, at: []
evdev_release+0xbc/0xf0
[   23.236250]  #1:  (>mutex#2){+.+...}, at: []
input_close_device+0x27/0x70
[   23.236264]  #2:  (hid_open_mut){+.+...}, at: []
usbhid_close+0x28/0xd0 [usbhid]
[   23.236277] irq event stamp: 3406
[   23.236280] hardirqs last  enabled at (3405): []
_raw_spin_unlock_irq+0x32/0x60
[   23.236286] hardirqs last disabled at (3406): []
usbhid_close+0x5d/0xd0 [usbhid]
[   23.236292] softirqs last  enabled at (2806): []
local_bh_enable+0x9/0x20
[   23.236298] softirqs last disabled at (2804): []
local_bh_disable+0x9/0x20
[   23.236306] CPU: 3 PID: 1419 Comm: acpid Not tainted
4.3.0-rc3-4-llvmlinux-amd64 #3
[   23.236309] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[   23.236313]  8800c760b000 0096 
8800bce53c88
[   23.236320]  8149cabd 8800bce53cb8 810cd5aa
a009780c
[   23.236328]  8800c5fd5740  0574
8800bce53cf8
[   23.236334] Call Trace:
[   23.236342]  [] dump_stack+0x7d/0xa0
[   23.236347]  [] ___might_sleep+0x28a/0x2a0
[   23.236351]  [] __might_sleep+0x4f/0xc0
[   23.236357]  [] usbhid_close+0x75/0xd0 [usbhid]
[   23.236366]  [] hidinput_close+0x31/0x40 [hid]
[   23.236373]  [] ? hidinput_open+0x40/0x40 [hid]
[   23.236378]  [] input_close_device+0x48/0x70
[   23.236382]  [] evdev_release+0xd6/0xf0
[   23.236387]  [] __fput+0x107/0x240
[   23.236392]  [] fput+0x16/0x20
[   23.236397]  [] task_work_run+0x6c/0xe0
[   23.236403]  [] prepare_exit_to_usermode+0x13a/0x140
[   23.236408]  [] syscall_return_slowpath+0x281/0x2f0
[   23.236414]  [] ? filp_close+0x65/0x90
[   23.236420]  [] ? trace_hardirqs_on_caller+0x19/0x290
[   23.236425]  [] ? trace_hardirqs_on_thunk+0x17/0x19
[   23.236431]  [] int_ret_from_sys_call+0x25/0x9f

I will not do a statement :-).

Looking at TEST-1, the spin_unlock_irq() within the if-statement seems
to do cause problems with tracing / irq-flags.

I have added my kernel-config, llvmlinux patchset, diffs with
corresponding dmesg-logs (look into the attached tarball).

Hope this really helps you.

- Sedat -


for-jikos.tar.gz
Description: GNU Zip compressed data


for-jikos.tar.gz.md5sum
Description: Binary data


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-28 Thread Sedat Dilek
On Mon, Sep 28, 2015 at 8:50 AM, Sedat Dilek  wrote:
> [ CC only relevant people plus Paul as he took care in another thread ]
>
> First of all, sorry for flooding anybody or any mailing-list.
>
> Of course, using LLVM/Clang for the Linux-kernel is still WIP, but
> this does not mean using a different compiler does not find any
> bugs...
>
> Fascinated somehow of this problem I entered the "tracing" area.
> I played a bit with tracing and irq-flags (hardirqs).
>
> This is what I discovered now.
>
> [ TEST-1 ]
>
> --- a/drivers/hid/usbhid/hid-core.c
> +++ b/drivers/hid/usbhid/hid-core.c
> @@ -731,6 +731,7 @@ void usbhid_close(struct hid_device *hid)
>  */
> spin_lock_irq(>lock);
> if (!--hid->open) {
> +   trace_hardirqs_off();
> spin_unlock_irq(>lock);
> hid_cancel_delayed_stuff(usbhid);
> if (!(hid->quirks & HID_QUIRK_ALWAYS_POLL)) {
> @@ -1392,6 +1393,8 @@ static void usbhid_disconnect(struct usb_interface 
> *intf)
>
>  static void hid_cancel_delayed_stuff(struct usbhid_device *usbhid)
>  {
> +   might_sleep();
> +
> del_timer_sync(>io_retry);
> cancel_work_sync(>reset_work);
>  }
>
> This makes the BUG line go away - only changes in the HID area.
>
> [ TEST-2 ]
>
> --- a/drivers/hid/usbhid/hid-core.c
> +++ b/drivers/hid/usbhid/hid-core.c
> @@ -732,6 +732,7 @@ void usbhid_close(struct hid_device *hid)
> spin_lock_irq(>lock);
> if (!--hid->open) {
> spin_unlock_irq(>lock);
> +   trace_hardirqs_off();
> hid_cancel_delayed_stuff(usbhid);
> if (!(hid->quirks & HID_QUIRK_ALWAYS_POLL)) {
> usb_kill_urb(usbhid->urbin);
> @@ -1392,6 +1393,8 @@ static void usbhid_disconnect(struct usb_interface 
> *intf)
>
>  static void hid_cancel_delayed_stuff(struct usbhid_device *usbhid)
>  {
> +   might_sleep();
> +
> del_timer_sync(>io_retry);
> cancel_work_sync(>reset_work);
>  }
>
> This results in...
>
> [   23.236024] BUG: sleeping function called from invalid context at
> drivers/hid/usbhid/hid-core.c:1396
> [   23.236148] in_atomic(): 0, irqs_disabled(): 1, pid: 1419, name: acpid
> [   23.236233] 3 locks held by acpid/1419:
> [   23.236235]  #0:  (>mutex){+.+...}, at: []
> evdev_release+0xbc/0xf0
> [   23.236250]  #1:  (>mutex#2){+.+...}, at: []
> input_close_device+0x27/0x70
> [   23.236264]  #2:  (hid_open_mut){+.+...}, at: []
> usbhid_close+0x28/0xd0 [usbhid]
> [   23.236277] irq event stamp: 3406
> [   23.236280] hardirqs last  enabled at (3405): []
> _raw_spin_unlock_irq+0x32/0x60
> [   23.236286] hardirqs last disabled at (3406): []
> usbhid_close+0x5d/0xd0 [usbhid]
> [   23.236292] softirqs last  enabled at (2806): []
> local_bh_enable+0x9/0x20
> [   23.236298] softirqs last disabled at (2804): []
> local_bh_disable+0x9/0x20
> [   23.236306] CPU: 3 PID: 1419 Comm: acpid Not tainted
> 4.3.0-rc3-4-llvmlinux-amd64 #3
> [   23.236309] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [   23.236313]  8800c760b000 0096 
> 8800bce53c88
> [   23.236320]  8149cabd 8800bce53cb8 810cd5aa
> a009780c
> [   23.236328]  8800c5fd5740  0574
> 8800bce53cf8
> [   23.236334] Call Trace:
> [   23.236342]  [] dump_stack+0x7d/0xa0
> [   23.236347]  [] ___might_sleep+0x28a/0x2a0
> [   23.236351]  [] __might_sleep+0x4f/0xc0
> [   23.236357]  [] usbhid_close+0x75/0xd0 [usbhid]
> [   23.236366]  [] hidinput_close+0x31/0x40 [hid]
> [   23.236373]  [] ? hidinput_open+0x40/0x40 [hid]
> [   23.236378]  [] input_close_device+0x48/0x70
> [   23.236382]  [] evdev_release+0xd6/0xf0
> [   23.236387]  [] __fput+0x107/0x240
> [   23.236392]  [] fput+0x16/0x20
> [   23.236397]  [] task_work_run+0x6c/0xe0
> [   23.236403]  [] prepare_exit_to_usermode+0x13a/0x140
> [   23.236408]  [] syscall_return_slowpath+0x281/0x2f0
> [   23.236414]  [] ? filp_close+0x65/0x90
> [   23.236420]  [] ? trace_hardirqs_on_caller+0x19/0x290
> [   23.236425]  [] ? trace_hardirqs_on_thunk+0x17/0x19
> [   23.236431]  [] int_ret_from_sys_call+0x25/0x9f
>
> I will not do a statement :-).
>
> Looking at TEST-1, the spin_unlock_irq() within the if-statement seems
> to do cause problems with tracing / irq-flags.
>
> I have added my kernel-config, llvmlinux patchset, diffs with
> corresponding dmesg-logs (look into the attached tarball).
>
> Hope this really helps you.
>

I looked over some commits which had problems with tracing and
irq-flags (hardirq)...

[ TEST-3 ]

I am not an locking expert, when I replace...

spin_{un}lock_irq() with spin_{un}lock_bh()

...within usbhid_close() in drivers/hid/usbhid/hid-core.c...

This makes the BUG line go away for me.

--- a/drivers/hid/usbhid/hid-core.c
+++ b/drivers/hid/usbhid/hid-core.c
@@ -729,16 +729,16 @@ void 

Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-28 Thread Sedat Dilek
[ CC only relevant people plus Paul as he took care in another thread ]

First of all, sorry for flooding anybody or any mailing-list.

Of course, using LLVM/Clang for the Linux-kernel is still WIP, but
this does not mean using a different compiler does not find any
bugs...

Fascinated somehow of this problem I entered the "tracing" area.
I played a bit with tracing and irq-flags (hardirqs).

This is what I discovered now.

[ TEST-1 ]

--- a/drivers/hid/usbhid/hid-core.c
+++ b/drivers/hid/usbhid/hid-core.c
@@ -731,6 +731,7 @@ void usbhid_close(struct hid_device *hid)
 */
spin_lock_irq(>lock);
if (!--hid->open) {
+   trace_hardirqs_off();
spin_unlock_irq(>lock);
hid_cancel_delayed_stuff(usbhid);
if (!(hid->quirks & HID_QUIRK_ALWAYS_POLL)) {
@@ -1392,6 +1393,8 @@ static void usbhid_disconnect(struct usb_interface *intf)

 static void hid_cancel_delayed_stuff(struct usbhid_device *usbhid)
 {
+   might_sleep();
+
del_timer_sync(>io_retry);
cancel_work_sync(>reset_work);
 }

This makes the BUG line go away - only changes in the HID area.

[ TEST-2 ]

--- a/drivers/hid/usbhid/hid-core.c
+++ b/drivers/hid/usbhid/hid-core.c
@@ -732,6 +732,7 @@ void usbhid_close(struct hid_device *hid)
spin_lock_irq(>lock);
if (!--hid->open) {
spin_unlock_irq(>lock);
+   trace_hardirqs_off();
hid_cancel_delayed_stuff(usbhid);
if (!(hid->quirks & HID_QUIRK_ALWAYS_POLL)) {
usb_kill_urb(usbhid->urbin);
@@ -1392,6 +1393,8 @@ static void usbhid_disconnect(struct usb_interface *intf)

 static void hid_cancel_delayed_stuff(struct usbhid_device *usbhid)
 {
+   might_sleep();
+
del_timer_sync(>io_retry);
cancel_work_sync(>reset_work);
 }

This results in...

[   23.236024] BUG: sleeping function called from invalid context at
drivers/hid/usbhid/hid-core.c:1396
[   23.236148] in_atomic(): 0, irqs_disabled(): 1, pid: 1419, name: acpid
[   23.236233] 3 locks held by acpid/1419:
[   23.236235]  #0:  (>mutex){+.+...}, at: []
evdev_release+0xbc/0xf0
[   23.236250]  #1:  (>mutex#2){+.+...}, at: []
input_close_device+0x27/0x70
[   23.236264]  #2:  (hid_open_mut){+.+...}, at: []
usbhid_close+0x28/0xd0 [usbhid]
[   23.236277] irq event stamp: 3406
[   23.236280] hardirqs last  enabled at (3405): []
_raw_spin_unlock_irq+0x32/0x60
[   23.236286] hardirqs last disabled at (3406): []
usbhid_close+0x5d/0xd0 [usbhid]
[   23.236292] softirqs last  enabled at (2806): []
local_bh_enable+0x9/0x20
[   23.236298] softirqs last disabled at (2804): []
local_bh_disable+0x9/0x20
[   23.236306] CPU: 3 PID: 1419 Comm: acpid Not tainted
4.3.0-rc3-4-llvmlinux-amd64 #3
[   23.236309] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[   23.236313]  8800c760b000 0096 
8800bce53c88
[   23.236320]  8149cabd 8800bce53cb8 810cd5aa
a009780c
[   23.236328]  8800c5fd5740  0574
8800bce53cf8
[   23.236334] Call Trace:
[   23.236342]  [] dump_stack+0x7d/0xa0
[   23.236347]  [] ___might_sleep+0x28a/0x2a0
[   23.236351]  [] __might_sleep+0x4f/0xc0
[   23.236357]  [] usbhid_close+0x75/0xd0 [usbhid]
[   23.236366]  [] hidinput_close+0x31/0x40 [hid]
[   23.236373]  [] ? hidinput_open+0x40/0x40 [hid]
[   23.236378]  [] input_close_device+0x48/0x70
[   23.236382]  [] evdev_release+0xd6/0xf0
[   23.236387]  [] __fput+0x107/0x240
[   23.236392]  [] fput+0x16/0x20
[   23.236397]  [] task_work_run+0x6c/0xe0
[   23.236403]  [] prepare_exit_to_usermode+0x13a/0x140
[   23.236408]  [] syscall_return_slowpath+0x281/0x2f0
[   23.236414]  [] ? filp_close+0x65/0x90
[   23.236420]  [] ? trace_hardirqs_on_caller+0x19/0x290
[   23.236425]  [] ? trace_hardirqs_on_thunk+0x17/0x19
[   23.236431]  [] int_ret_from_sys_call+0x25/0x9f

I will not do a statement :-).

Looking at TEST-1, the spin_unlock_irq() within the if-statement seems
to do cause problems with tracing / irq-flags.

I have added my kernel-config, llvmlinux patchset, diffs with
corresponding dmesg-logs (look into the attached tarball).

Hope this really helps you.

- Sedat -


for-jikos.tar.gz
Description: GNU Zip compressed data


for-jikos.tar.gz.md5sum
Description: Binary data


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-27 Thread Sedat Dilek
On Sun, Sep 27, 2015 at 10:10 AM, Sedat Dilek  wrote:
> On Fri, Sep 25, 2015 at 3:13 PM, Jiri Kosina  wrote:
>> On Fri, 25 Sep 2015, Sedat Dilek wrote:
>>
>>> > The sequence looks correct. So I don't really see what call sequence could
>>> > lead to calling flush_work() from __cancel_work_timer() with IRQs
>>> > disabled (which is what your stacktrace is suggesting).
>>> >
>>> > The fact that this doesn't happen with GCC-compiled kernels is really
>>> > suspicious.
>>> >
>>>
>>> Hm, and now?
>>
>> I guess one of the things to try is to generate ftrace graph trace to see
>> what exact function sequence leads to this warning.
>>
>
> Which kernel-config settings are helpful for ftrace and tracing at all?
>
> I have here...
>
> CONFIG_IRQSOFF_TRACER=n
>
> ...is that helpful to trigger why I see "irqs_disabled"...
>
> [   23.874836] BUG: sleeping function called from invalid context at
> kernel/workqueue.c:2678
> [   23.874902] in_atomic(): 0, irqs_disabled(): 1, pid: 1411, name: acpid
>
> Also independent of this issue I would like to set...
>
> CONFIG_LOCKDEP=y
> CONFIG_DEBUG_LOCKDEP=y <--- XXX: New setting
>
> I recall something like a trace/tracing filesystem?
> I cannot find or remember the Kconfig - do you?
>

It is CONFIG_TRACING=y (see [1] and "tracefs: Add new tracefs file
system" from [2]).

- Sedat -

[1] https://lwn.net/Articles/631798/
[2] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4282d60689d4f21b40692029080440cc58e8a17d
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-27 Thread Sedat Dilek
On Sun, Sep 27, 2015 at 10:10 AM, Sedat Dilek  wrote:
> On Fri, Sep 25, 2015 at 3:13 PM, Jiri Kosina  wrote:
>> On Fri, 25 Sep 2015, Sedat Dilek wrote:
>>
>>> > The sequence looks correct. So I don't really see what call sequence could
>>> > lead to calling flush_work() from __cancel_work_timer() with IRQs
>>> > disabled (which is what your stacktrace is suggesting).
>>> >
>>> > The fact that this doesn't happen with GCC-compiled kernels is really
>>> > suspicious.
>>> >
>>>
>>> Hm, and now?
>>
>> I guess one of the things to try is to generate ftrace graph trace to see
>> what exact function sequence leads to this warning.
>>
>
> Which kernel-config settings are helpful for ftrace and tracing at all?
>
> I have here...
>
> CONFIG_IRQSOFF_TRACER=n
>
> ...is that helpful to trigger why I see "irqs_disabled"...
>
> [   23.874836] BUG: sleeping function called from invalid context at
> kernel/workqueue.c:2678
> [   23.874902] in_atomic(): 0, irqs_disabled(): 1, pid: 1411, name: acpid
>
> Also independent of this issue I would like to set...
>
> CONFIG_LOCKDEP=y
> CONFIG_DEBUG_LOCKDEP=y <--- XXX: New setting
>
> I recall something like a trace/tracing filesystem?
> I cannot find or remember the Kconfig - do you?
>

It is CONFIG_TRACING=y (see [1] and "tracefs: Add new tracefs file
system" from [2]).

- Sedat -

[1] https://lwn.net/Articles/631798/
[2] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4282d60689d4f21b40692029080440cc58e8a17d
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Sedat Dilek
On Fri, Sep 25, 2015 at 3:13 PM, Jiri Kosina  wrote:
> On Fri, 25 Sep 2015, Sedat Dilek wrote:
>
>> > The sequence looks correct. So I don't really see what call sequence could
>> > lead to calling flush_work() from __cancel_work_timer() with IRQs
>> > disabled (which is what your stacktrace is suggesting).
>> >
>> > The fact that this doesn't happen with GCC-compiled kernels is really
>> > suspicious.
>> >
>>
>> Hm, and now?
>
> I guess one of the things to try is to generate ftrace graph trace to see
> what exact function sequence leads to this warning.
>

Can I do this via kernel-boot-parameter?
No experiences with ftrace here.

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Jiri Kosina
On Fri, 25 Sep 2015, Sedat Dilek wrote:

> > The sequence looks correct. So I don't really see what call sequence could
> > lead to calling flush_work() from __cancel_work_timer() with IRQs
> > disabled (which is what your stacktrace is suggesting).
> >
> > The fact that this doesn't happen with GCC-compiled kernels is really
> > suspicious.
> >
> 
> Hm, and now?

I guess one of the things to try is to generate ftrace graph trace to see 
what exact function sequence leads to this warning.

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Sedat Dilek
On Fri, Sep 25, 2015 at 2:40 PM, Jiri Kosina  wrote:
> On Fri, 25 Sep 2015, Sedat Dilek wrote:
>
>> >> $ egrep -nr 'save|restore|acquire|release'
>> >> objdump-Dr_kernel-workqueue_o_CLANG-3-7.txt | egrep 'irq|map'
>> >> 5718:   4601: R_X86_64_PC32 
>> >> _raw_spin_unlock_irqrestore-0x4
>> >> 5766:   4699: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
>> >> 6173:   4bd9: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
>> >> 6265:   4d05: R_X86_64_PC32 
>> >> _raw_spin_unlock_irqrestore-0x4
>> >>
>> >> ...this is what I have.
>> >>
>> >> Does that help?
>> >
>> > It doesn't. I am asking for disassembly of del_timer_sync() (that's in
>> > kernel/time/timer.o).
>> >
>>
>> See attached file.
>
> The sequence looks correct. So I don't really see what call sequence could
> lead to calling flush_work() from __cancel_work_timer() with IRQs
> disabled (which is what your stacktrace is suggesting).
>
> The fact that this doesn't happen with GCC-compiled kernels is really
> suspicious.
>

Hm, and now?

Shall I try a different OptLevel than default -O2?

CONFIG_LOCKDEP=n ?

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Jiri Kosina
On Fri, 25 Sep 2015, Sedat Dilek wrote:

> >> $ egrep -nr 'save|restore|acquire|release'
> >> objdump-Dr_kernel-workqueue_o_CLANG-3-7.txt | egrep 'irq|map'
> >> 5718:   4601: R_X86_64_PC32 
> >> _raw_spin_unlock_irqrestore-0x4
> >> 5766:   4699: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
> >> 6173:   4bd9: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
> >> 6265:   4d05: R_X86_64_PC32 
> >> _raw_spin_unlock_irqrestore-0x4
> >>
> >> ...this is what I have.
> >>
> >> Does that help?
> >
> > It doesn't. I am asking for disassembly of del_timer_sync() (that's in
> > kernel/time/timer.o).
> >
> 
> See attached file.

The sequence looks correct. So I don't really see what call sequence could 
lead to calling flush_work() from __cancel_work_timer() with IRQs 
disabled (which is what your stacktrace is suggesting).

The fact that this doesn't happen with GCC-compiled kernels is really 
suspicious.

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Sedat Dilek
On Fri, Sep 25, 2015 at 2:00 PM, Jiri Kosina  wrote:
> On Thu, 24 Sep 2015, Sedat Dilek wrote:
>
>> >> >> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
>> >> >> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
>> >> >> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
>> >> >> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
>> >> >> > >> [   24.705789]  [] flush_work+0x5c/0x80
>> >> >> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
>> >> >> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
>> >> >> > >> [   24.705804]  [] ? 
>> >> >> > >> try_to_grab_pending+0x48/0x360
>> >> >> > >> [   24.705810]  [] ? 
>> >> >> > >> _raw_spin_lock_irqsave+0x73/0x80
>> >> >> > >> [   24.705814]  [] 
>> >> >> > >> __cancel_work_timer+0x179/0x260
>> >> >>
>> >> >> This one is even more strange. It says that flush_work() is being 
>> >> >> called
>> >> >> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
>> >> >> restored just one statement before that, and usbhid_close() explicitly
>> >> >> calls cancel_work_sync() after unconditionally enabling interrupts.
>> >> >>
>> >> >> So I am not able to make any sense of either of the traces really.
>> >> >>
>> >> >> Are you seeing this with the same .config with GCC-compiled kernel as
>> >> >> well?
>> >> >
>> >> > Actually could you please provide disassembly of your
>> >> > __cancel_work_timer()?
>> >> >
>> >>
>> >> Disassembly of which file - corresponding workqueue or hid file?
>> >
>> > make kernel/workqueue.o
>> > objdump -Dr kernel/workqueue.o
>> >
>> > and copy/paste output for __cancel_work_timer function.
>> >
>>
>> It was a bit long, so I attached the full output.
>
> 216a:   ff 14 25 00 00 00 00callq  *0x0
> 216d: R_X86_64_32S  pv_irq_ops+0x8
> 2171:   4c 89 efmov%r13,%rdi
> 2174:   e8 00 00 00 00  callq  2179 
> <__cancel_work_timer+0x189>
> 2175: R_X86_64_PC32 flush_work-0x4
>
> So the ordering generated by compiler is correct here.
>
> This however:
>
> [   24.824639] hardirqs last  enabled at (7913): [] 
> _raw_spin_unlock_irq+0x32/0x60
> [   24.824646] hardirqs last disabled at (7914): [] 
> del_timer_sync+0x37/0x110
>
> combined with the stacktrace above, doesn't still make too much sense; IRQ
> disable/enable is properly paired in del_timer_sync() and timer base lock
> locking.
>
> Could you look into the generated assembly whether the sequence
>
> local_irq_save(flags);
> lock_map_acquire(>lockdep_map);
> lock_map_release(>lockdep_map);
> local_irq_restore(flags);
>
> in del_timer_sync() hasn't been optimized in some damaging way?
>

OK, now I see what you mean that snippet...

[ kernel/time/timer.c ]
...
int del_timer_sync(struct timer_list *timer)
{
#ifdef CONFIG_LOCKDEP
unsigned long flags;

/*
* If lockdep gives a backtrace here, please reference
* the synchronization rules above.
*/
local_irq_save(flags);
lock_map_acquire(>lockdep_map);
lock_map_release(>lockdep_map);
local_irq_restore(flags);
#endif
...

So, I could try to see with CONFIG_LOCKDEP=n ?

- Sedat -

[1] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/time/timer.c?id=v4.3-rc2#n1080
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Jiri Kosina
On Fri, 25 Sep 2015, Sedat Dilek wrote:

> > This however:
> >
> > [   24.824639] hardirqs last  enabled at (7913): [] 
> > _raw_spin_unlock_irq+0x32/0x60
> > [   24.824646] hardirqs last disabled at (7914): [] 
> > del_timer_sync+0x37/0x110
> >
> > combined with the stacktrace above, doesn't still make too much sense; IRQ
> > disable/enable is properly paired in del_timer_sync() and timer base lock
> > locking.
> >
> > Could you look into the generated assembly whether the sequence
> >
> > local_irq_save(flags);
> > lock_map_acquire(>lockdep_map);
> > lock_map_release(>lockdep_map);
> > local_irq_restore(flags);
> >
> > in del_timer_sync() hasn't been optimized in some damaging way?
> >
> 
> I am looking into my CLANG objdump...
> 
> $ egrep -nr 'save|restore|acquire|release'
> objdump-Dr_kernel-workqueue_o_CLANG-3-7.txt | egrep 'irq|map'
> 5718:   4601: R_X86_64_PC32 
> _raw_spin_unlock_irqrestore-0x4
> 5766:   4699: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
> 6173:   4bd9: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
> 6265:   4d05: R_X86_64_PC32 
> _raw_spin_unlock_irqrestore-0x4
> 
> ...this is what I have.
> 
> Does that help?

It doesn't. I am asking for disassembly of del_timer_sync() (that's in 
kernel/time/timer.o).

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Sedat Dilek
On Fri, Sep 25, 2015 at 2:00 PM, Jiri Kosina  wrote:
> On Thu, 24 Sep 2015, Sedat Dilek wrote:
>
>> >> >> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
>> >> >> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
>> >> >> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
>> >> >> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
>> >> >> > >> [   24.705789]  [] flush_work+0x5c/0x80
>> >> >> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
>> >> >> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
>> >> >> > >> [   24.705804]  [] ? 
>> >> >> > >> try_to_grab_pending+0x48/0x360
>> >> >> > >> [   24.705810]  [] ? 
>> >> >> > >> _raw_spin_lock_irqsave+0x73/0x80
>> >> >> > >> [   24.705814]  [] 
>> >> >> > >> __cancel_work_timer+0x179/0x260
>> >> >>
>> >> >> This one is even more strange. It says that flush_work() is being 
>> >> >> called
>> >> >> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
>> >> >> restored just one statement before that, and usbhid_close() explicitly
>> >> >> calls cancel_work_sync() after unconditionally enabling interrupts.
>> >> >>
>> >> >> So I am not able to make any sense of either of the traces really.
>> >> >>
>> >> >> Are you seeing this with the same .config with GCC-compiled kernel as
>> >> >> well?
>> >> >
>> >> > Actually could you please provide disassembly of your
>> >> > __cancel_work_timer()?
>> >> >
>> >>
>> >> Disassembly of which file - corresponding workqueue or hid file?
>> >
>> > make kernel/workqueue.o
>> > objdump -Dr kernel/workqueue.o
>> >
>> > and copy/paste output for __cancel_work_timer function.
>> >
>>
>> It was a bit long, so I attached the full output.
>
> 216a:   ff 14 25 00 00 00 00callq  *0x0
> 216d: R_X86_64_32S  pv_irq_ops+0x8
> 2171:   4c 89 efmov%r13,%rdi
> 2174:   e8 00 00 00 00  callq  2179 
> <__cancel_work_timer+0x189>
> 2175: R_X86_64_PC32 flush_work-0x4
>
> So the ordering generated by compiler is correct here.
>
> This however:
>
> [   24.824639] hardirqs last  enabled at (7913): [] 
> _raw_spin_unlock_irq+0x32/0x60
> [   24.824646] hardirqs last disabled at (7914): [] 
> del_timer_sync+0x37/0x110
>
> combined with the stacktrace above, doesn't still make too much sense; IRQ
> disable/enable is properly paired in del_timer_sync() and timer base lock
> locking.
>
> Could you look into the generated assembly whether the sequence
>
> local_irq_save(flags);
> lock_map_acquire(>lockdep_map);
> lock_map_release(>lockdep_map);
> local_irq_restore(flags);
>
> in del_timer_sync() hasn't been optimized in some damaging way?
>

I am looking into my CLANG objdump...

$ egrep -nr 'save|restore|acquire|release'
objdump-Dr_kernel-workqueue_o_CLANG-3-7.txt | egrep 'irq|map'
5718:   4601: R_X86_64_PC32 _raw_spin_unlock_irqrestore-0x4
5766:   4699: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
6173:   4bd9: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
6265:   4d05: R_X86_64_PC32 _raw_spin_unlock_irqrestore-0x4

...this is what I have.

Does that help?

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Jiri Kosina
On Thu, 24 Sep 2015, Sedat Dilek wrote:

> >> >> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
> >> >> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
> >> >> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
> >> >> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
> >> >> > >> [   24.705789]  [] flush_work+0x5c/0x80
> >> >> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
> >> >> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
> >> >> > >> [   24.705804]  [] ? 
> >> >> > >> try_to_grab_pending+0x48/0x360
> >> >> > >> [   24.705810]  [] ? 
> >> >> > >> _raw_spin_lock_irqsave+0x73/0x80
> >> >> > >> [   24.705814]  [] 
> >> >> > >> __cancel_work_timer+0x179/0x260
> >> >>
> >> >> This one is even more strange. It says that flush_work() is being called
> >> >> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
> >> >> restored just one statement before that, and usbhid_close() explicitly
> >> >> calls cancel_work_sync() after unconditionally enabling interrupts.
> >> >>
> >> >> So I am not able to make any sense of either of the traces really.
> >> >>
> >> >> Are you seeing this with the same .config with GCC-compiled kernel as
> >> >> well?
> >> >
> >> > Actually could you please provide disassembly of your
> >> > __cancel_work_timer()?
> >> >
> >>
> >> Disassembly of which file - corresponding workqueue or hid file?
> >
> > make kernel/workqueue.o
> > objdump -Dr kernel/workqueue.o
> >
> > and copy/paste output for __cancel_work_timer function.
> >
> 
> It was a bit long, so I attached the full output.

216a:   ff 14 25 00 00 00 00callq  *0x0
216d: R_X86_64_32S  pv_irq_ops+0x8
2171:   4c 89 efmov%r13,%rdi
2174:   e8 00 00 00 00  callq  2179 <__cancel_work_timer+0x189>
2175: R_X86_64_PC32 flush_work-0x4

So the ordering generated by compiler is correct here.

This however:

[   24.824639] hardirqs last  enabled at (7913): [] 
_raw_spin_unlock_irq+0x32/0x60
[   24.824646] hardirqs last disabled at (7914): [] 
del_timer_sync+0x37/0x110

combined with the stacktrace above, doesn't still make too much sense; IRQ 
disable/enable is properly paired in del_timer_sync() and timer base lock 
locking.

Could you look into the generated assembly whether the sequence

local_irq_save(flags);
lock_map_acquire(>lockdep_map);
lock_map_release(>lockdep_map);
local_irq_restore(flags);

in del_timer_sync() hasn't been optimized in some damaging way?

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Sedat Dilek
On Thu, Sep 24, 2015 at 10:21 AM, Jiri Kosina  wrote:
> On Thu, 24 Sep 2015, Sedat Dilek wrote:
>
>> >> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
>> >> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
>> >> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
>> >> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
>> >> > >> [   24.705789]  [] flush_work+0x5c/0x80
>> >> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
>> >> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
>> >> > >> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
>> >> > >> [   24.705810]  [] ? 
>> >> > >> _raw_spin_lock_irqsave+0x73/0x80
>> >> > >> [   24.705814]  [] __cancel_work_timer+0x179/0x260
>> >>
>> >> This one is even more strange. It says that flush_work() is being called
>> >> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
>> >> restored just one statement before that, and usbhid_close() explicitly
>> >> calls cancel_work_sync() after unconditionally enabling interrupts.
>> >>
>> >> So I am not able to make any sense of either of the traces really.
>> >>
>> >> Are you seeing this with the same .config with GCC-compiled kernel as
>> >> well?
>> >
>> > Actually could you please provide disassembly of your
>> > __cancel_work_timer()?
>> >
>>
>> Disassembly of which file - corresponding workqueue or hid file?
>
> make kernel/workqueue.o
> objdump -Dr kernel/workqueue.o
>
> and copy/paste output for __cancel_work_timer function.
>
>> > One explanation would be LLVM not considering local_irq_restore() a
>> > compiler memory barrier, but I am pretty sure it'll expose much more
>> > breakage if that'd be the case.
>>
>> Can you point me where I can find more informations about "compiler
>> memory barrier" or explain in a few words if possible?
>
> If compiler would not take "memory" clobber (while disabling IRQs) as a
> reordering barrier, it wouldn't see any data dependency between
> local_irq_restore(flags) and flush_work(data) and could reorder them,
> resulting in flush_work() being called with IRQs disabled.
>

BTW, is membarrier_test helpful in this case?

I asked parallelly on LTP ML is there exists something similiar.

- Sedat -

[1] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/membarrier
[2] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/tools/testing/selftests/membarrier/membarrier_test.c
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Sedat Dilek
On Thu, Sep 24, 2015 at 10:21 AM, Jiri Kosina  wrote:
> On Thu, 24 Sep 2015, Sedat Dilek wrote:
>
>> >> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
>> >> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
>> >> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
>> >> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
>> >> > >> [   24.705789]  [] flush_work+0x5c/0x80
>> >> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
>> >> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
>> >> > >> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
>> >> > >> [   24.705810]  [] ? 
>> >> > >> _raw_spin_lock_irqsave+0x73/0x80
>> >> > >> [   24.705814]  [] __cancel_work_timer+0x179/0x260
>> >>
>> >> This one is even more strange. It says that flush_work() is being called
>> >> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
>> >> restored just one statement before that, and usbhid_close() explicitly
>> >> calls cancel_work_sync() after unconditionally enabling interrupts.
>> >>
>> >> So I am not able to make any sense of either of the traces really.
>> >>
>> >> Are you seeing this with the same .config with GCC-compiled kernel as
>> >> well?
>> >
>> > Actually could you please provide disassembly of your
>> > __cancel_work_timer()?
>> >
>>
>> Disassembly of which file - corresponding workqueue or hid file?
>
> make kernel/workqueue.o
> objdump -Dr kernel/workqueue.o
>
> and copy/paste output for __cancel_work_timer function.
>
>> > One explanation would be LLVM not considering local_irq_restore() a
>> > compiler memory barrier, but I am pretty sure it'll expose much more
>> > breakage if that'd be the case.
>>
>> Can you point me where I can find more informations about "compiler
>> memory barrier" or explain in a few words if possible?
>
> If compiler would not take "memory" clobber (while disabling IRQs) as a
> reordering barrier, it wouldn't see any data dependency between
> local_irq_restore(flags) and flush_work(data) and could reorder them,
> resulting in flush_work() being called with IRQs disabled.
>

BTW, is membarrier_test helpful in this case?

I asked parallelly on LTP ML is there exists something similiar.

- Sedat -

[1] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/membarrier
[2] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/tools/testing/selftests/membarrier/membarrier_test.c
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Sedat Dilek
On Fri, Sep 25, 2015 at 2:00 PM, Jiri Kosina  wrote:
> On Thu, 24 Sep 2015, Sedat Dilek wrote:
>
>> >> >> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
>> >> >> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
>> >> >> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
>> >> >> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
>> >> >> > >> [   24.705789]  [] flush_work+0x5c/0x80
>> >> >> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
>> >> >> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
>> >> >> > >> [   24.705804]  [] ? 
>> >> >> > >> try_to_grab_pending+0x48/0x360
>> >> >> > >> [   24.705810]  [] ? 
>> >> >> > >> _raw_spin_lock_irqsave+0x73/0x80
>> >> >> > >> [   24.705814]  [] 
>> >> >> > >> __cancel_work_timer+0x179/0x260
>> >> >>
>> >> >> This one is even more strange. It says that flush_work() is being 
>> >> >> called
>> >> >> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
>> >> >> restored just one statement before that, and usbhid_close() explicitly
>> >> >> calls cancel_work_sync() after unconditionally enabling interrupts.
>> >> >>
>> >> >> So I am not able to make any sense of either of the traces really.
>> >> >>
>> >> >> Are you seeing this with the same .config with GCC-compiled kernel as
>> >> >> well?
>> >> >
>> >> > Actually could you please provide disassembly of your
>> >> > __cancel_work_timer()?
>> >> >
>> >>
>> >> Disassembly of which file - corresponding workqueue or hid file?
>> >
>> > make kernel/workqueue.o
>> > objdump -Dr kernel/workqueue.o
>> >
>> > and copy/paste output for __cancel_work_timer function.
>> >
>>
>> It was a bit long, so I attached the full output.
>
> 216a:   ff 14 25 00 00 00 00callq  *0x0
> 216d: R_X86_64_32S  pv_irq_ops+0x8
> 2171:   4c 89 efmov%r13,%rdi
> 2174:   e8 00 00 00 00  callq  2179 
> <__cancel_work_timer+0x189>
> 2175: R_X86_64_PC32 flush_work-0x4
>
> So the ordering generated by compiler is correct here.
>
> This however:
>
> [   24.824639] hardirqs last  enabled at (7913): [] 
> _raw_spin_unlock_irq+0x32/0x60
> [   24.824646] hardirqs last disabled at (7914): [] 
> del_timer_sync+0x37/0x110
>
> combined with the stacktrace above, doesn't still make too much sense; IRQ
> disable/enable is properly paired in del_timer_sync() and timer base lock
> locking.
>
> Could you look into the generated assembly whether the sequence
>
> local_irq_save(flags);
> lock_map_acquire(>lockdep_map);
> lock_map_release(>lockdep_map);
> local_irq_restore(flags);
>
> in del_timer_sync() hasn't been optimized in some damaging way?
>

OK, now I see what you mean that snippet...

[ kernel/time/timer.c ]
...
int del_timer_sync(struct timer_list *timer)
{
#ifdef CONFIG_LOCKDEP
unsigned long flags;

/*
* If lockdep gives a backtrace here, please reference
* the synchronization rules above.
*/
local_irq_save(flags);
lock_map_acquire(>lockdep_map);
lock_map_release(>lockdep_map);
local_irq_restore(flags);
#endif
...

So, I could try to see with CONFIG_LOCKDEP=n ?

- Sedat -

[1] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/time/timer.c?id=v4.3-rc2#n1080
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Jiri Kosina
On Thu, 24 Sep 2015, Sedat Dilek wrote:

> >> >> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
> >> >> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
> >> >> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
> >> >> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
> >> >> > >> [   24.705789]  [] flush_work+0x5c/0x80
> >> >> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
> >> >> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
> >> >> > >> [   24.705804]  [] ? 
> >> >> > >> try_to_grab_pending+0x48/0x360
> >> >> > >> [   24.705810]  [] ? 
> >> >> > >> _raw_spin_lock_irqsave+0x73/0x80
> >> >> > >> [   24.705814]  [] 
> >> >> > >> __cancel_work_timer+0x179/0x260
> >> >>
> >> >> This one is even more strange. It says that flush_work() is being called
> >> >> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
> >> >> restored just one statement before that, and usbhid_close() explicitly
> >> >> calls cancel_work_sync() after unconditionally enabling interrupts.
> >> >>
> >> >> So I am not able to make any sense of either of the traces really.
> >> >>
> >> >> Are you seeing this with the same .config with GCC-compiled kernel as
> >> >> well?
> >> >
> >> > Actually could you please provide disassembly of your
> >> > __cancel_work_timer()?
> >> >
> >>
> >> Disassembly of which file - corresponding workqueue or hid file?
> >
> > make kernel/workqueue.o
> > objdump -Dr kernel/workqueue.o
> >
> > and copy/paste output for __cancel_work_timer function.
> >
> 
> It was a bit long, so I attached the full output.

216a:   ff 14 25 00 00 00 00callq  *0x0
216d: R_X86_64_32S  pv_irq_ops+0x8
2171:   4c 89 efmov%r13,%rdi
2174:   e8 00 00 00 00  callq  2179 <__cancel_work_timer+0x189>
2175: R_X86_64_PC32 flush_work-0x4

So the ordering generated by compiler is correct here.

This however:

[   24.824639] hardirqs last  enabled at (7913): [] 
_raw_spin_unlock_irq+0x32/0x60
[   24.824646] hardirqs last disabled at (7914): [] 
del_timer_sync+0x37/0x110

combined with the stacktrace above, doesn't still make too much sense; IRQ 
disable/enable is properly paired in del_timer_sync() and timer base lock 
locking.

Could you look into the generated assembly whether the sequence

local_irq_save(flags);
lock_map_acquire(>lockdep_map);
lock_map_release(>lockdep_map);
local_irq_restore(flags);

in del_timer_sync() hasn't been optimized in some damaging way?

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Jiri Kosina
On Fri, 25 Sep 2015, Sedat Dilek wrote:

> > This however:
> >
> > [   24.824639] hardirqs last  enabled at (7913): [] 
> > _raw_spin_unlock_irq+0x32/0x60
> > [   24.824646] hardirqs last disabled at (7914): [] 
> > del_timer_sync+0x37/0x110
> >
> > combined with the stacktrace above, doesn't still make too much sense; IRQ
> > disable/enable is properly paired in del_timer_sync() and timer base lock
> > locking.
> >
> > Could you look into the generated assembly whether the sequence
> >
> > local_irq_save(flags);
> > lock_map_acquire(>lockdep_map);
> > lock_map_release(>lockdep_map);
> > local_irq_restore(flags);
> >
> > in del_timer_sync() hasn't been optimized in some damaging way?
> >
> 
> I am looking into my CLANG objdump...
> 
> $ egrep -nr 'save|restore|acquire|release'
> objdump-Dr_kernel-workqueue_o_CLANG-3-7.txt | egrep 'irq|map'
> 5718:   4601: R_X86_64_PC32 
> _raw_spin_unlock_irqrestore-0x4
> 5766:   4699: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
> 6173:   4bd9: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
> 6265:   4d05: R_X86_64_PC32 
> _raw_spin_unlock_irqrestore-0x4
> 
> ...this is what I have.
> 
> Does that help?

It doesn't. I am asking for disassembly of del_timer_sync() (that's in 
kernel/time/timer.o).

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Jiri Kosina
On Fri, 25 Sep 2015, Sedat Dilek wrote:

> >> $ egrep -nr 'save|restore|acquire|release'
> >> objdump-Dr_kernel-workqueue_o_CLANG-3-7.txt | egrep 'irq|map'
> >> 5718:   4601: R_X86_64_PC32 
> >> _raw_spin_unlock_irqrestore-0x4
> >> 5766:   4699: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
> >> 6173:   4bd9: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
> >> 6265:   4d05: R_X86_64_PC32 
> >> _raw_spin_unlock_irqrestore-0x4
> >>
> >> ...this is what I have.
> >>
> >> Does that help?
> >
> > It doesn't. I am asking for disassembly of del_timer_sync() (that's in
> > kernel/time/timer.o).
> >
> 
> See attached file.

The sequence looks correct. So I don't really see what call sequence could 
lead to calling flush_work() from __cancel_work_timer() with IRQs 
disabled (which is what your stacktrace is suggesting).

The fact that this doesn't happen with GCC-compiled kernels is really 
suspicious.

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Sedat Dilek
On Fri, Sep 25, 2015 at 2:00 PM, Jiri Kosina  wrote:
> On Thu, 24 Sep 2015, Sedat Dilek wrote:
>
>> >> >> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
>> >> >> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
>> >> >> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
>> >> >> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
>> >> >> > >> [   24.705789]  [] flush_work+0x5c/0x80
>> >> >> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
>> >> >> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
>> >> >> > >> [   24.705804]  [] ? 
>> >> >> > >> try_to_grab_pending+0x48/0x360
>> >> >> > >> [   24.705810]  [] ? 
>> >> >> > >> _raw_spin_lock_irqsave+0x73/0x80
>> >> >> > >> [   24.705814]  [] 
>> >> >> > >> __cancel_work_timer+0x179/0x260
>> >> >>
>> >> >> This one is even more strange. It says that flush_work() is being 
>> >> >> called
>> >> >> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
>> >> >> restored just one statement before that, and usbhid_close() explicitly
>> >> >> calls cancel_work_sync() after unconditionally enabling interrupts.
>> >> >>
>> >> >> So I am not able to make any sense of either of the traces really.
>> >> >>
>> >> >> Are you seeing this with the same .config with GCC-compiled kernel as
>> >> >> well?
>> >> >
>> >> > Actually could you please provide disassembly of your
>> >> > __cancel_work_timer()?
>> >> >
>> >>
>> >> Disassembly of which file - corresponding workqueue or hid file?
>> >
>> > make kernel/workqueue.o
>> > objdump -Dr kernel/workqueue.o
>> >
>> > and copy/paste output for __cancel_work_timer function.
>> >
>>
>> It was a bit long, so I attached the full output.
>
> 216a:   ff 14 25 00 00 00 00callq  *0x0
> 216d: R_X86_64_32S  pv_irq_ops+0x8
> 2171:   4c 89 efmov%r13,%rdi
> 2174:   e8 00 00 00 00  callq  2179 
> <__cancel_work_timer+0x189>
> 2175: R_X86_64_PC32 flush_work-0x4
>
> So the ordering generated by compiler is correct here.
>
> This however:
>
> [   24.824639] hardirqs last  enabled at (7913): [] 
> _raw_spin_unlock_irq+0x32/0x60
> [   24.824646] hardirqs last disabled at (7914): [] 
> del_timer_sync+0x37/0x110
>
> combined with the stacktrace above, doesn't still make too much sense; IRQ
> disable/enable is properly paired in del_timer_sync() and timer base lock
> locking.
>
> Could you look into the generated assembly whether the sequence
>
> local_irq_save(flags);
> lock_map_acquire(>lockdep_map);
> lock_map_release(>lockdep_map);
> local_irq_restore(flags);
>
> in del_timer_sync() hasn't been optimized in some damaging way?
>

I am looking into my CLANG objdump...

$ egrep -nr 'save|restore|acquire|release'
objdump-Dr_kernel-workqueue_o_CLANG-3-7.txt | egrep 'irq|map'
5718:   4601: R_X86_64_PC32 _raw_spin_unlock_irqrestore-0x4
5766:   4699: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
6173:   4bd9: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
6265:   4d05: R_X86_64_PC32 _raw_spin_unlock_irqrestore-0x4

...this is what I have.

Does that help?

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Sedat Dilek
On Fri, Sep 25, 2015 at 2:40 PM, Jiri Kosina  wrote:
> On Fri, 25 Sep 2015, Sedat Dilek wrote:
>
>> >> $ egrep -nr 'save|restore|acquire|release'
>> >> objdump-Dr_kernel-workqueue_o_CLANG-3-7.txt | egrep 'irq|map'
>> >> 5718:   4601: R_X86_64_PC32 
>> >> _raw_spin_unlock_irqrestore-0x4
>> >> 5766:   4699: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
>> >> 6173:   4bd9: R_X86_64_PC32 _raw_spin_lock_irqsave-0x4
>> >> 6265:   4d05: R_X86_64_PC32 
>> >> _raw_spin_unlock_irqrestore-0x4
>> >>
>> >> ...this is what I have.
>> >>
>> >> Does that help?
>> >
>> > It doesn't. I am asking for disassembly of del_timer_sync() (that's in
>> > kernel/time/timer.o).
>> >
>>
>> See attached file.
>
> The sequence looks correct. So I don't really see what call sequence could
> lead to calling flush_work() from __cancel_work_timer() with IRQs
> disabled (which is what your stacktrace is suggesting).
>
> The fact that this doesn't happen with GCC-compiled kernels is really
> suspicious.
>

Hm, and now?

Shall I try a different OptLevel than default -O2?

CONFIG_LOCKDEP=n ?

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Jiri Kosina
On Fri, 25 Sep 2015, Sedat Dilek wrote:

> > The sequence looks correct. So I don't really see what call sequence could
> > lead to calling flush_work() from __cancel_work_timer() with IRQs
> > disabled (which is what your stacktrace is suggesting).
> >
> > The fact that this doesn't happen with GCC-compiled kernels is really
> > suspicious.
> >
> 
> Hm, and now?

I guess one of the things to try is to generate ftrace graph trace to see 
what exact function sequence leads to this warning.

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-25 Thread Sedat Dilek
On Fri, Sep 25, 2015 at 3:13 PM, Jiri Kosina  wrote:
> On Fri, 25 Sep 2015, Sedat Dilek wrote:
>
>> > The sequence looks correct. So I don't really see what call sequence could
>> > lead to calling flush_work() from __cancel_work_timer() with IRQs
>> > disabled (which is what your stacktrace is suggesting).
>> >
>> > The fact that this doesn't happen with GCC-compiled kernels is really
>> > suspicious.
>> >
>>
>> Hm, and now?
>
> I guess one of the things to try is to generate ftrace graph trace to see
> what exact function sequence leads to this warning.
>

Can I do this via kernel-boot-parameter?
No experiences with ftrace here.

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-24 Thread Jiri Kosina
On Thu, 24 Sep 2015, Sedat Dilek wrote:

> >> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
> >> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
> >> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
> >> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
> >> > >> [   24.705789]  [] flush_work+0x5c/0x80
> >> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
> >> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
> >> > >> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
> >> > >> [   24.705810]  [] ? 
> >> > >> _raw_spin_lock_irqsave+0x73/0x80
> >> > >> [   24.705814]  [] __cancel_work_timer+0x179/0x260
> >>
> >> This one is even more strange. It says that flush_work() is being called
> >> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
> >> restored just one statement before that, and usbhid_close() explicitly
> >> calls cancel_work_sync() after unconditionally enabling interrupts.
> >>
> >> So I am not able to make any sense of either of the traces really.
> >>
> >> Are you seeing this with the same .config with GCC-compiled kernel as
> >> well?
> >
> > Actually could you please provide disassembly of your
> > __cancel_work_timer()?
> >
> 
> Disassembly of which file - corresponding workqueue or hid file?

make kernel/workqueue.o
objdump -Dr kernel/workqueue.o

and copy/paste output for __cancel_work_timer function.

> > One explanation would be LLVM not considering local_irq_restore() a
> > compiler memory barrier, but I am pretty sure it'll expose much more
> > breakage if that'd be the case.
> 
> Can you point me where I can find more informations about "compiler
> memory barrier" or explain in a few words if possible?

If compiler would not take "memory" clobber (while disabling IRQs) as a 
reordering barrier, it wouldn't see any data dependency between 
local_irq_restore(flags) and flush_work(data) and could reorder them, 
resulting in flush_work() being called with IRQs disabled.

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-24 Thread Sedat Dilek
On Thu, Sep 24, 2015 at 10:03 AM, Jiri Kosina  wrote:
> On Thu, 24 Sep 2015, Jiri Kosina wrote:
>
>> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
>> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
>> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
>> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
>> > >> [   24.705789]  [] flush_work+0x5c/0x80
>> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
>> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
>> > >> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
>> > >> [   24.705810]  [] ? _raw_spin_lock_irqsave+0x73/0x80
>> > >> [   24.705814]  [] __cancel_work_timer+0x179/0x260
>>
>> This one is even more strange. It says that flush_work() is being called
>> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
>> restored just one statement before that, and usbhid_close() explicitly
>> calls cancel_work_sync() after unconditionally enabling interrupts.
>>
>> So I am not able to make any sense of either of the traces really.
>>
>> Are you seeing this with the same .config with GCC-compiled kernel as
>> well?
>
> Actually could you please provide disassembly of your
> __cancel_work_timer()?
>

Disassembly of which file - corresponding workqueue or hid file?

> One explanation would be LLVM not considering local_irq_restore() a
> compiler memory barrier, but I am pretty sure it'll expose much more
> breakage if that'd be the case.
>

Can you point me where I can find more informations about "compiler
memory barrier" or explain in a few words if possible?

Thanks.

I can provide the files this evening...

/me in front of a Windows machine.

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-24 Thread Sedat Dilek
On Thu, Sep 24, 2015 at 9:57 AM, Jiri Kosina  wrote:
> On Thu, 24 Sep 2015, Sedat Dilek wrote:
>
>> I am seeing this call-trace when compiling a Linux v4.2.y or Linux
>> v4.3-rcN kernel with my llvm-toolchain and llvmlinux-amd64 patchset.
>> CLANG sometimes catches things which GCC does not.
>>
>> Not sure if this is a workqueue or hid issue...
>>
>> [   24.824396] BUG: sleeping function called from invalid context at
>> kernel/workqueue.c:2767
>> [   24.824511] in_atomic(): 0, irqs_disabled(): 1, pid: 1485, name: acpid
>> [   24.824596] 3 locks held by acpid/1485:
>> [   24.824599]  #0:  (>mutex){+.+...}, at: []
>> evdev_release+0xbc/0xf0
>> [   24.824612]  #1:  (>mutex#2){+.+...}, at: []
>> input_close_device+0x27/0x70
>> [   24.824624]  #2:  (hid_open_mut){+.+...}, at: []
>> usbhid_close+0x28/0xb0 [usbhid]
>> [   24.824636] irq event stamp: 7914
>> [   24.824639] hardirqs last  enabled at (7913): []
>> _raw_spin_unlock_irq+0x32/0x60
>> [   24.824646] hardirqs last disabled at (7914): []
>> del_timer_sync+0x37/0x110
>> [   24.824652] softirqs last  enabled at (7220): []
>> local_bh_enable+0x9/0x20
>> [   24.824659] softirqs last disabled at (7218): []
>> local_bh_disable+0x9/0x20
>> [   24.824666] CPU: 1 PID: 1485 Comm: acpid Not tainted
>> 4.3.0-rc2-3-llvmlinux-amd64 #2
>> [   24.824670] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> [   24.824673]  8800d4806948 0096 
>> 8800b8933bf8
>> [   24.824680]  8149a4ad 8800b8933c28 810cd53a
>> 81c56fcc
>> [   24.824686]  8801185f4240  0acf
>> 8800b8933c68
>> [   24.824693] Call Trace:
>> [   24.824699]  [] dump_stack+0x7d/0xa0
>> [   24.824705]  [] ___might_sleep+0x28a/0x2a0
>> [   24.824709]  [] __might_sleep+0x4f/0xc0
>> [   24.824715]  [] __cancel_work_timer+0x2e/0x270
>> [   24.824720]  [] ? _raw_spin_unlock_irqrestore+0x52/0x80
>> [   24.824725]  [] ? try_to_del_timer_sync+0xad/0xc0
>> [   24.824730]  [] cancel_work_sync+0x18/0x20
>
> Could you please (on a kernel compiled with debuginfo) resolve (using
> addr2line) those offsets to source code lines? Especially
> __cancel_work_timer+0x2e will be interesting in this case.
>

Hi,

I can do that, but not sure if I have enough disc-space on my
Ubuntu/precise AMD64 system.

> __cancel_work_timer() calls try_to_grab_pending() which disabled IRQs, but
> as far as I can see, the only way it goes to sleep then is when
> try_to_grab_pending() returned -ENOENT, but in such case it reenabled IRQs
> explicitly before.
>
> Also the fact that "hardirqs last disabled at (7914): []
> del_timer_sync+0x37/0x110" is a bit puzzling as well, but AFAICS
> del_timer_sync is complete outside the codepath that's on stack.
>
> [ ... snip ... ]
>> >> [   24.705463] BUG: sleeping function called from invalid context at
>> >> kernel/workqueue.c:2680
>> >> [   24.705576] in_atomic(): 0, irqs_disabled(): 1, pid: 1447, name: acpid
>> >> [   24.705662] 3 locks held by acpid/1447:
>> >> [   24.705664]  #0:  (>mutex){+.+...}, at: []
>> >> evdev_release+0xbc/0xf0
>> >> [   24.705678]  #1:  (>mutex#2){+.+...}, at: []
>> >> input_close_device+0x27/0x70
>> >> [   24.705691]  #2:  (hid_open_mut){+.+...}, at: []
>> >> usbhid_close+0x28/0xb0 [usbhid]
>> >> [   24.705704] irq event stamp: 19968
>> >> [   24.705706] hardirqs last  enabled at (19967): []
>> >> _raw_spin_unlock_irq+0x32/0x60
>> >> [   24.705713] hardirqs last disabled at (19968): []
>> >> del_timer_sync+0x37/0x110
>> >> [   24.705720] softirqs last  enabled at (18890): []
>> >> local_bh_enable+0x9/0x20
>> >> [   24.705726] softirqs last disabled at (1): []
>> >> local_bh_disable+0x9/0x20
>> >> [   24.705734] CPU: 2 PID: 1447 Comm: acpid Not tainted
>> >> 4.2.0-2-llvmlinux-small #2
>> >> [   24.705737] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> >> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> >> [   24.705741]  8800d57ea948 0092 
>> >> 8800bb3e3aa8
>> >> [   24.705748]  8149287d 8800bb3e3ad8 810cbf7a
>> >> 81c51a34
>> >> [   24.705754]  8800d3095000  0a78
>> >> 8800bb3e3b18
>> >> [   24.705761] Call Trace:
>> >> [   24.705767]  [] dump_stack+0x7d/0xa0
>> >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
>> >> [   24.705779]  [] __might_sleep+0x4f/0xc0
>> >> [   24.705784]  [] start_flush_work+0x2f/0x290
>> >> [   24.705789]  [] flush_work+0x5c/0x80
>> >> [   24.705792]  [] ? flush_work+0x1a/0x80
>> >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
>> >> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
>> >> [   24.705810]  [] ? _raw_spin_lock_irqsave+0x73/0x80
>> >> [   24.705814]  [] __cancel_work_timer+0x179/0x260
>
> This one is even more strange. It says that flush_work() is being called
> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
> restored 

Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-24 Thread Jiri Kosina
On Thu, 24 Sep 2015, Jiri Kosina wrote:

> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
> > >> [   24.705789]  [] flush_work+0x5c/0x80
> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
> > >> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
> > >> [   24.705810]  [] ? _raw_spin_lock_irqsave+0x73/0x80
> > >> [   24.705814]  [] __cancel_work_timer+0x179/0x260
> 
> This one is even more strange. It says that flush_work() is being called 
> from __cancel_work_timer() with IRQs disabled, but flags are explicitly 
> restored just one statement before that, and usbhid_close() explicitly 
> calls cancel_work_sync() after unconditionally enabling interrupts.
> 
> So I am not able to make any sense of either of the traces really.
> 
> Are you seeing this with the same .config with GCC-compiled kernel as 
> well?

Actually could you please provide disassembly of your 
__cancel_work_timer()?

One explanation would be LLVM not considering local_irq_restore() a 
compiler memory barrier, but I am pretty sure it'll expose much more 
breakage if that'd be the case.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-24 Thread Jiri Kosina
On Thu, 24 Sep 2015, Sedat Dilek wrote:

> I am seeing this call-trace when compiling a Linux v4.2.y or Linux
> v4.3-rcN kernel with my llvm-toolchain and llvmlinux-amd64 patchset.
> CLANG sometimes catches things which GCC does not.
> 
> Not sure if this is a workqueue or hid issue...
> 
> [   24.824396] BUG: sleeping function called from invalid context at
> kernel/workqueue.c:2767
> [   24.824511] in_atomic(): 0, irqs_disabled(): 1, pid: 1485, name: acpid
> [   24.824596] 3 locks held by acpid/1485:
> [   24.824599]  #0:  (>mutex){+.+...}, at: []
> evdev_release+0xbc/0xf0
> [   24.824612]  #1:  (>mutex#2){+.+...}, at: []
> input_close_device+0x27/0x70
> [   24.824624]  #2:  (hid_open_mut){+.+...}, at: []
> usbhid_close+0x28/0xb0 [usbhid]
> [   24.824636] irq event stamp: 7914
> [   24.824639] hardirqs last  enabled at (7913): []
> _raw_spin_unlock_irq+0x32/0x60
> [   24.824646] hardirqs last disabled at (7914): []
> del_timer_sync+0x37/0x110
> [   24.824652] softirqs last  enabled at (7220): []
> local_bh_enable+0x9/0x20
> [   24.824659] softirqs last disabled at (7218): []
> local_bh_disable+0x9/0x20
> [   24.824666] CPU: 1 PID: 1485 Comm: acpid Not tainted
> 4.3.0-rc2-3-llvmlinux-amd64 #2
> [   24.824670] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [   24.824673]  8800d4806948 0096 
> 8800b8933bf8
> [   24.824680]  8149a4ad 8800b8933c28 810cd53a
> 81c56fcc
> [   24.824686]  8801185f4240  0acf
> 8800b8933c68
> [   24.824693] Call Trace:
> [   24.824699]  [] dump_stack+0x7d/0xa0
> [   24.824705]  [] ___might_sleep+0x28a/0x2a0
> [   24.824709]  [] __might_sleep+0x4f/0xc0
> [   24.824715]  [] __cancel_work_timer+0x2e/0x270
> [   24.824720]  [] ? _raw_spin_unlock_irqrestore+0x52/0x80
> [   24.824725]  [] ? try_to_del_timer_sync+0xad/0xc0
> [   24.824730]  [] cancel_work_sync+0x18/0x20

Could you please (on a kernel compiled with debuginfo) resolve (using 
addr2line) those offsets to source code lines? Especially 
__cancel_work_timer+0x2e will be interesting in this case.

__cancel_work_timer() calls try_to_grab_pending() which disabled IRQs, but 
as far as I can see, the only way it goes to sleep then is when 
try_to_grab_pending() returned -ENOENT, but in such case it reenabled IRQs 
explicitly before.

Also the fact that "hardirqs last disabled at (7914): [] 
del_timer_sync+0x37/0x110" is a bit puzzling as well, but AFAICS 
del_timer_sync is complete outside the codepath that's on stack.

[ ... snip ... ]
> >> [   24.705463] BUG: sleeping function called from invalid context at
> >> kernel/workqueue.c:2680
> >> [   24.705576] in_atomic(): 0, irqs_disabled(): 1, pid: 1447, name: acpid
> >> [   24.705662] 3 locks held by acpid/1447:
> >> [   24.705664]  #0:  (>mutex){+.+...}, at: []
> >> evdev_release+0xbc/0xf0
> >> [   24.705678]  #1:  (>mutex#2){+.+...}, at: []
> >> input_close_device+0x27/0x70
> >> [   24.705691]  #2:  (hid_open_mut){+.+...}, at: []
> >> usbhid_close+0x28/0xb0 [usbhid]
> >> [   24.705704] irq event stamp: 19968
> >> [   24.705706] hardirqs last  enabled at (19967): []
> >> _raw_spin_unlock_irq+0x32/0x60
> >> [   24.705713] hardirqs last disabled at (19968): []
> >> del_timer_sync+0x37/0x110
> >> [   24.705720] softirqs last  enabled at (18890): []
> >> local_bh_enable+0x9/0x20
> >> [   24.705726] softirqs last disabled at (1): []
> >> local_bh_disable+0x9/0x20
> >> [   24.705734] CPU: 2 PID: 1447 Comm: acpid Not tainted
> >> 4.2.0-2-llvmlinux-small #2
> >> [   24.705737] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> >> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> >> [   24.705741]  8800d57ea948 0092 
> >> 8800bb3e3aa8
> >> [   24.705748]  8149287d 8800bb3e3ad8 810cbf7a
> >> 81c51a34
> >> [   24.705754]  8800d3095000  0a78
> >> 8800bb3e3b18
> >> [   24.705761] Call Trace:
> >> [   24.705767]  [] dump_stack+0x7d/0xa0
> >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
> >> [   24.705779]  [] __might_sleep+0x4f/0xc0
> >> [   24.705784]  [] start_flush_work+0x2f/0x290
> >> [   24.705789]  [] flush_work+0x5c/0x80
> >> [   24.705792]  [] ? flush_work+0x1a/0x80
> >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
> >> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
> >> [   24.705810]  [] ? _raw_spin_lock_irqsave+0x73/0x80
> >> [   24.705814]  [] __cancel_work_timer+0x179/0x260

This one is even more strange. It says that flush_work() is being called 
from __cancel_work_timer() with IRQs disabled, but flags are explicitly 
restored just one statement before that, and usbhid_close() explicitly 
calls cancel_work_sync() after unconditionally enabling interrupts.

So I am not able to make any sense of either of the traces really.

Are you seeing this with the same .config with GCC-compiled 

Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-24 Thread Jiri Kosina
On Thu, 24 Sep 2015, Jiri Kosina wrote:

> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
> > >> [   24.705789]  [] flush_work+0x5c/0x80
> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
> > >> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
> > >> [   24.705810]  [] ? _raw_spin_lock_irqsave+0x73/0x80
> > >> [   24.705814]  [] __cancel_work_timer+0x179/0x260
> 
> This one is even more strange. It says that flush_work() is being called 
> from __cancel_work_timer() with IRQs disabled, but flags are explicitly 
> restored just one statement before that, and usbhid_close() explicitly 
> calls cancel_work_sync() after unconditionally enabling interrupts.
> 
> So I am not able to make any sense of either of the traces really.
> 
> Are you seeing this with the same .config with GCC-compiled kernel as 
> well?

Actually could you please provide disassembly of your 
__cancel_work_timer()?

One explanation would be LLVM not considering local_irq_restore() a 
compiler memory barrier, but I am pretty sure it'll expose much more 
breakage if that'd be the case.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-24 Thread Jiri Kosina
On Thu, 24 Sep 2015, Sedat Dilek wrote:

> I am seeing this call-trace when compiling a Linux v4.2.y or Linux
> v4.3-rcN kernel with my llvm-toolchain and llvmlinux-amd64 patchset.
> CLANG sometimes catches things which GCC does not.
> 
> Not sure if this is a workqueue or hid issue...
> 
> [   24.824396] BUG: sleeping function called from invalid context at
> kernel/workqueue.c:2767
> [   24.824511] in_atomic(): 0, irqs_disabled(): 1, pid: 1485, name: acpid
> [   24.824596] 3 locks held by acpid/1485:
> [   24.824599]  #0:  (>mutex){+.+...}, at: []
> evdev_release+0xbc/0xf0
> [   24.824612]  #1:  (>mutex#2){+.+...}, at: []
> input_close_device+0x27/0x70
> [   24.824624]  #2:  (hid_open_mut){+.+...}, at: []
> usbhid_close+0x28/0xb0 [usbhid]
> [   24.824636] irq event stamp: 7914
> [   24.824639] hardirqs last  enabled at (7913): []
> _raw_spin_unlock_irq+0x32/0x60
> [   24.824646] hardirqs last disabled at (7914): []
> del_timer_sync+0x37/0x110
> [   24.824652] softirqs last  enabled at (7220): []
> local_bh_enable+0x9/0x20
> [   24.824659] softirqs last disabled at (7218): []
> local_bh_disable+0x9/0x20
> [   24.824666] CPU: 1 PID: 1485 Comm: acpid Not tainted
> 4.3.0-rc2-3-llvmlinux-amd64 #2
> [   24.824670] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [   24.824673]  8800d4806948 0096 
> 8800b8933bf8
> [   24.824680]  8149a4ad 8800b8933c28 810cd53a
> 81c56fcc
> [   24.824686]  8801185f4240  0acf
> 8800b8933c68
> [   24.824693] Call Trace:
> [   24.824699]  [] dump_stack+0x7d/0xa0
> [   24.824705]  [] ___might_sleep+0x28a/0x2a0
> [   24.824709]  [] __might_sleep+0x4f/0xc0
> [   24.824715]  [] __cancel_work_timer+0x2e/0x270
> [   24.824720]  [] ? _raw_spin_unlock_irqrestore+0x52/0x80
> [   24.824725]  [] ? try_to_del_timer_sync+0xad/0xc0
> [   24.824730]  [] cancel_work_sync+0x18/0x20

Could you please (on a kernel compiled with debuginfo) resolve (using 
addr2line) those offsets to source code lines? Especially 
__cancel_work_timer+0x2e will be interesting in this case.

__cancel_work_timer() calls try_to_grab_pending() which disabled IRQs, but 
as far as I can see, the only way it goes to sleep then is when 
try_to_grab_pending() returned -ENOENT, but in such case it reenabled IRQs 
explicitly before.

Also the fact that "hardirqs last disabled at (7914): [] 
del_timer_sync+0x37/0x110" is a bit puzzling as well, but AFAICS 
del_timer_sync is complete outside the codepath that's on stack.

[ ... snip ... ]
> >> [   24.705463] BUG: sleeping function called from invalid context at
> >> kernel/workqueue.c:2680
> >> [   24.705576] in_atomic(): 0, irqs_disabled(): 1, pid: 1447, name: acpid
> >> [   24.705662] 3 locks held by acpid/1447:
> >> [   24.705664]  #0:  (>mutex){+.+...}, at: []
> >> evdev_release+0xbc/0xf0
> >> [   24.705678]  #1:  (>mutex#2){+.+...}, at: []
> >> input_close_device+0x27/0x70
> >> [   24.705691]  #2:  (hid_open_mut){+.+...}, at: []
> >> usbhid_close+0x28/0xb0 [usbhid]
> >> [   24.705704] irq event stamp: 19968
> >> [   24.705706] hardirqs last  enabled at (19967): []
> >> _raw_spin_unlock_irq+0x32/0x60
> >> [   24.705713] hardirqs last disabled at (19968): []
> >> del_timer_sync+0x37/0x110
> >> [   24.705720] softirqs last  enabled at (18890): []
> >> local_bh_enable+0x9/0x20
> >> [   24.705726] softirqs last disabled at (1): []
> >> local_bh_disable+0x9/0x20
> >> [   24.705734] CPU: 2 PID: 1447 Comm: acpid Not tainted
> >> 4.2.0-2-llvmlinux-small #2
> >> [   24.705737] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> >> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> >> [   24.705741]  8800d57ea948 0092 
> >> 8800bb3e3aa8
> >> [   24.705748]  8149287d 8800bb3e3ad8 810cbf7a
> >> 81c51a34
> >> [   24.705754]  8800d3095000  0a78
> >> 8800bb3e3b18
> >> [   24.705761] Call Trace:
> >> [   24.705767]  [] dump_stack+0x7d/0xa0
> >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
> >> [   24.705779]  [] __might_sleep+0x4f/0xc0
> >> [   24.705784]  [] start_flush_work+0x2f/0x290
> >> [   24.705789]  [] flush_work+0x5c/0x80
> >> [   24.705792]  [] ? flush_work+0x1a/0x80
> >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
> >> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
> >> [   24.705810]  [] ? _raw_spin_lock_irqsave+0x73/0x80
> >> [   24.705814]  [] __cancel_work_timer+0x179/0x260

This one is even more strange. It says that flush_work() is being called 
from __cancel_work_timer() with IRQs disabled, but flags are explicitly 
restored just one statement before that, and usbhid_close() explicitly 
calls cancel_work_sync() after unconditionally enabling interrupts.

So I am not able to make any sense of either of the traces really.

Are you seeing this with the same .config with GCC-compiled 

Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-24 Thread Jiri Kosina
On Thu, 24 Sep 2015, Sedat Dilek wrote:

> >> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
> >> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
> >> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
> >> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
> >> > >> [   24.705789]  [] flush_work+0x5c/0x80
> >> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
> >> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
> >> > >> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
> >> > >> [   24.705810]  [] ? 
> >> > >> _raw_spin_lock_irqsave+0x73/0x80
> >> > >> [   24.705814]  [] __cancel_work_timer+0x179/0x260
> >>
> >> This one is even more strange. It says that flush_work() is being called
> >> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
> >> restored just one statement before that, and usbhid_close() explicitly
> >> calls cancel_work_sync() after unconditionally enabling interrupts.
> >>
> >> So I am not able to make any sense of either of the traces really.
> >>
> >> Are you seeing this with the same .config with GCC-compiled kernel as
> >> well?
> >
> > Actually could you please provide disassembly of your
> > __cancel_work_timer()?
> >
> 
> Disassembly of which file - corresponding workqueue or hid file?

make kernel/workqueue.o
objdump -Dr kernel/workqueue.o

and copy/paste output for __cancel_work_timer function.

> > One explanation would be LLVM not considering local_irq_restore() a
> > compiler memory barrier, but I am pretty sure it'll expose much more
> > breakage if that'd be the case.
> 
> Can you point me where I can find more informations about "compiler
> memory barrier" or explain in a few words if possible?

If compiler would not take "memory" clobber (while disabling IRQs) as a 
reordering barrier, it wouldn't see any data dependency between 
local_irq_restore(flags) and flush_work(data) and could reorder them, 
resulting in flush_work() being called with IRQs disabled.

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-24 Thread Sedat Dilek
On Thu, Sep 24, 2015 at 10:03 AM, Jiri Kosina  wrote:
> On Thu, 24 Sep 2015, Jiri Kosina wrote:
>
>> > >> [   24.705767]  [] dump_stack+0x7d/0xa0
>> > >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
>> > >> [   24.705779]  [] __might_sleep+0x4f/0xc0
>> > >> [   24.705784]  [] start_flush_work+0x2f/0x290
>> > >> [   24.705789]  [] flush_work+0x5c/0x80
>> > >> [   24.705792]  [] ? flush_work+0x1a/0x80
>> > >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
>> > >> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
>> > >> [   24.705810]  [] ? _raw_spin_lock_irqsave+0x73/0x80
>> > >> [   24.705814]  [] __cancel_work_timer+0x179/0x260
>>
>> This one is even more strange. It says that flush_work() is being called
>> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
>> restored just one statement before that, and usbhid_close() explicitly
>> calls cancel_work_sync() after unconditionally enabling interrupts.
>>
>> So I am not able to make any sense of either of the traces really.
>>
>> Are you seeing this with the same .config with GCC-compiled kernel as
>> well?
>
> Actually could you please provide disassembly of your
> __cancel_work_timer()?
>

Disassembly of which file - corresponding workqueue or hid file?

> One explanation would be LLVM not considering local_irq_restore() a
> compiler memory barrier, but I am pretty sure it'll expose much more
> breakage if that'd be the case.
>

Can you point me where I can find more informations about "compiler
memory barrier" or explain in a few words if possible?

Thanks.

I can provide the files this evening...

/me in front of a Windows machine.

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-24 Thread Sedat Dilek
On Thu, Sep 24, 2015 at 9:57 AM, Jiri Kosina  wrote:
> On Thu, 24 Sep 2015, Sedat Dilek wrote:
>
>> I am seeing this call-trace when compiling a Linux v4.2.y or Linux
>> v4.3-rcN kernel with my llvm-toolchain and llvmlinux-amd64 patchset.
>> CLANG sometimes catches things which GCC does not.
>>
>> Not sure if this is a workqueue or hid issue...
>>
>> [   24.824396] BUG: sleeping function called from invalid context at
>> kernel/workqueue.c:2767
>> [   24.824511] in_atomic(): 0, irqs_disabled(): 1, pid: 1485, name: acpid
>> [   24.824596] 3 locks held by acpid/1485:
>> [   24.824599]  #0:  (>mutex){+.+...}, at: []
>> evdev_release+0xbc/0xf0
>> [   24.824612]  #1:  (>mutex#2){+.+...}, at: []
>> input_close_device+0x27/0x70
>> [   24.824624]  #2:  (hid_open_mut){+.+...}, at: []
>> usbhid_close+0x28/0xb0 [usbhid]
>> [   24.824636] irq event stamp: 7914
>> [   24.824639] hardirqs last  enabled at (7913): []
>> _raw_spin_unlock_irq+0x32/0x60
>> [   24.824646] hardirqs last disabled at (7914): []
>> del_timer_sync+0x37/0x110
>> [   24.824652] softirqs last  enabled at (7220): []
>> local_bh_enable+0x9/0x20
>> [   24.824659] softirqs last disabled at (7218): []
>> local_bh_disable+0x9/0x20
>> [   24.824666] CPU: 1 PID: 1485 Comm: acpid Not tainted
>> 4.3.0-rc2-3-llvmlinux-amd64 #2
>> [   24.824670] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> [   24.824673]  8800d4806948 0096 
>> 8800b8933bf8
>> [   24.824680]  8149a4ad 8800b8933c28 810cd53a
>> 81c56fcc
>> [   24.824686]  8801185f4240  0acf
>> 8800b8933c68
>> [   24.824693] Call Trace:
>> [   24.824699]  [] dump_stack+0x7d/0xa0
>> [   24.824705]  [] ___might_sleep+0x28a/0x2a0
>> [   24.824709]  [] __might_sleep+0x4f/0xc0
>> [   24.824715]  [] __cancel_work_timer+0x2e/0x270
>> [   24.824720]  [] ? _raw_spin_unlock_irqrestore+0x52/0x80
>> [   24.824725]  [] ? try_to_del_timer_sync+0xad/0xc0
>> [   24.824730]  [] cancel_work_sync+0x18/0x20
>
> Could you please (on a kernel compiled with debuginfo) resolve (using
> addr2line) those offsets to source code lines? Especially
> __cancel_work_timer+0x2e will be interesting in this case.
>

Hi,

I can do that, but not sure if I have enough disc-space on my
Ubuntu/precise AMD64 system.

> __cancel_work_timer() calls try_to_grab_pending() which disabled IRQs, but
> as far as I can see, the only way it goes to sleep then is when
> try_to_grab_pending() returned -ENOENT, but in such case it reenabled IRQs
> explicitly before.
>
> Also the fact that "hardirqs last disabled at (7914): []
> del_timer_sync+0x37/0x110" is a bit puzzling as well, but AFAICS
> del_timer_sync is complete outside the codepath that's on stack.
>
> [ ... snip ... ]
>> >> [   24.705463] BUG: sleeping function called from invalid context at
>> >> kernel/workqueue.c:2680
>> >> [   24.705576] in_atomic(): 0, irqs_disabled(): 1, pid: 1447, name: acpid
>> >> [   24.705662] 3 locks held by acpid/1447:
>> >> [   24.705664]  #0:  (>mutex){+.+...}, at: []
>> >> evdev_release+0xbc/0xf0
>> >> [   24.705678]  #1:  (>mutex#2){+.+...}, at: []
>> >> input_close_device+0x27/0x70
>> >> [   24.705691]  #2:  (hid_open_mut){+.+...}, at: []
>> >> usbhid_close+0x28/0xb0 [usbhid]
>> >> [   24.705704] irq event stamp: 19968
>> >> [   24.705706] hardirqs last  enabled at (19967): []
>> >> _raw_spin_unlock_irq+0x32/0x60
>> >> [   24.705713] hardirqs last disabled at (19968): []
>> >> del_timer_sync+0x37/0x110
>> >> [   24.705720] softirqs last  enabled at (18890): []
>> >> local_bh_enable+0x9/0x20
>> >> [   24.705726] softirqs last disabled at (1): []
>> >> local_bh_disable+0x9/0x20
>> >> [   24.705734] CPU: 2 PID: 1447 Comm: acpid Not tainted
>> >> 4.2.0-2-llvmlinux-small #2
>> >> [   24.705737] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> >> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> >> [   24.705741]  8800d57ea948 0092 
>> >> 8800bb3e3aa8
>> >> [   24.705748]  8149287d 8800bb3e3ad8 810cbf7a
>> >> 81c51a34
>> >> [   24.705754]  8800d3095000  0a78
>> >> 8800bb3e3b18
>> >> [   24.705761] Call Trace:
>> >> [   24.705767]  [] dump_stack+0x7d/0xa0
>> >> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
>> >> [   24.705779]  [] __might_sleep+0x4f/0xc0
>> >> [   24.705784]  [] start_flush_work+0x2f/0x290
>> >> [   24.705789]  [] flush_work+0x5c/0x80
>> >> [   24.705792]  [] ? flush_work+0x1a/0x80
>> >> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
>> >> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
>> >> [   24.705810]  [] ? _raw_spin_lock_irqsave+0x73/0x80
>> >> [   24.705814]  [] __cancel_work_timer+0x179/0x260
>
> This one is even more strange. It says that flush_work() is being called
> from __cancel_work_timer() with IRQs disabled, but flags are 

Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-23 Thread Sedat Dilek
On Thu, Sep 10, 2015 at 3:04 AM, Lai Jiangshan  wrote:
> Hi, TJ
>
> I think we need to add might_sleep() on the top of __cancel_work_timer().
> The might_sleep() on the start_flush_work() doesn't cover all the
> paths of __cancel_work_timer().
> And it can help to narrow the area of this bug.
>

I moved the might_sleep() from start_flush_work() to
__cancel_work_timer() (see attached patch).

[   23.444152] BUG: sleeping function called from invalid context at
kernel/workqueue.c:2767
[   23.444217] in_atomic(): 0, irqs_disabled(): 1, pid: 1423, name: acpid
[   23.444265] 3 locks held by acpid/1423:
[   23.444266]  #0:  (>mutex){+.+...}, at: []
evdev_release+0xbc/0xf0
[   23.444276]  #1:  (>mutex#2){+.+...}, at: []
input_close_device+0x27/0x70
[   23.444283]  #2:  (hid_open_mut){+.+...}, at: []
usbhid_close+0x28/0xb0 [usbhid]
[   23.444291] irq event stamp: 7850
[   23.444293] hardirqs last  enabled at (7849): []
_raw_spin_unlock_irq+0x32/0x60
[   23.444298] hardirqs last disabled at (7850): []
del_timer_sync+0x37/0x110
[   23.444303] softirqs last  enabled at (7164): []
local_bh_enable+0x9/0x20
[   23.444307] softirqs last disabled at (7162): []
local_bh_disable+0x9/0x20
[   23.444312] CPU: 2 PID: 1423 Comm: acpid Not tainted
4.3.0-rc2-3-llvmlinux-amd64 #2
[   23.444314] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[   23.444316]  8800d2e82948 0096 
8800bcc4fbf8
[   23.444320]  8149a4ad 8800bcc4fc28 810cd53a
81c56fcc
[   23.444324]  8800c69f0040  0acf
8800bcc4fc68
[   23.444328] Call Trace:
[   23.444332]  [] dump_stack+0x7d/0xa0
[   23.444336]  [] ___might_sleep+0x28a/0x2a0
[   23.444338]  [] __might_sleep+0x4f/0xc0
[   23.444342]  [] __cancel_work_timer+0x2e/0x270
[   23.444345]  [] ? _raw_spin_unlock_irqrestore+0x52/0x80
[   23.444348]  [] ? try_to_del_timer_sync+0xad/0xc0
[   23.444351]  [] cancel_work_sync+0x18/0x20
[   23.444354]  [] usbhid_close+0x75/0xb0 [usbhid]
[   23.444360]  [] hidinput_close+0x31/0x40 [hid]
[   23.444364]  [] ? hidinput_open+0x40/0x40 [hid]
[   23.444367]  [] input_close_device+0x48/0x70
[   23.444370]  [] evdev_release+0xd6/0xf0
[   23.444373]  [] __fput+0x107/0x240
[   23.444375]  [] fput+0x16/0x20
[   23.444378]  [] task_work_run+0x6c/0xe0
[   23.444383]  [] prepare_exit_to_usermode+0x13a/0x140
[   23.444386]  [] syscall_return_slowpath+0x281/0x2f0
[   23.444389]  [] ? filp_close+0x65/0x90
[   23.444392]  [] ? trace_hardirqs_on_thunk+0x17/0x19
[   23.444396]  [] int_ret_from_sys_call+0x25/0x9f

Can you look at this?

- Sedat -

> Hi Sedat Dilek
>
> [   24.705704] irq event stamp: 19968
> [   24.705706] hardirqs last  enabled at (19967): []
> _raw_spin_unlock_irq+0x32/0x60
> [   24.705713] hardirqs last disabled at (19968): []
> del_timer_sync+0x37/0x110
>
> Is it means the irq-disabled-event is leak by del_timer_sync()? It is
> impossible.
>
> usbhid_close()
>   mutex_lock(); // it has might_sleep() check. So the problem seems to be
> // hidden at one of the following statements
>   del_timer_sync();
>   cancel_work_sync();
>
>
> On Wed, Sep 9, 2015 at 6:24 PM, Sedat Dilek  wrote:
>> [ TO WORKQUEUE maintainers ]
>>
>> Hi,
>>
>> I encountered a BUG with a llvmlinux compiled kernel here on
>> Ubuntu/precise AMD64 touching the workqueue area.
>>
>> [   24.705463] BUG: sleeping function called from invalid context at
>> kernel/workqueue.c:2680
>> [   24.705576] in_atomic(): 0, irqs_disabled(): 1, pid: 1447, name: acpid
>> [   24.705662] 3 locks held by acpid/1447:
>> [   24.705664]  #0:  (>mutex){+.+...}, at: []
>> evdev_release+0xbc/0xf0
>> [   24.705678]  #1:  (>mutex#2){+.+...}, at: []
>> input_close_device+0x27/0x70
>> [   24.705691]  #2:  (hid_open_mut){+.+...}, at: []
>> usbhid_close+0x28/0xb0 [usbhid]
>> [   24.705704] irq event stamp: 19968
>> [   24.705706] hardirqs last  enabled at (19967): []
>> _raw_spin_unlock_irq+0x32/0x60
>> [   24.705713] hardirqs last disabled at (19968): []
>> del_timer_sync+0x37/0x110
>> [   24.705720] softirqs last  enabled at (18890): []
>> local_bh_enable+0x9/0x20
>> [   24.705726] softirqs last disabled at (1): []
>> local_bh_disable+0x9/0x20
>> [   24.705734] CPU: 2 PID: 1447 Comm: acpid Not tainted
>> 4.2.0-2-llvmlinux-small #2
>> [   24.705737] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> [   24.705741]  8800d57ea948 0092 
>> 8800bb3e3aa8
>> [   24.705748]  8149287d 8800bb3e3ad8 810cbf7a
>> 81c51a34
>> [   24.705754]  8800d3095000  0a78
>> 8800bb3e3b18
>> [   24.705761] Call Trace:
>> [   24.705767]  [] dump_stack+0x7d/0xa0
>> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
>> [   24.705779]  [] __might_sleep+0x4f/0xc0
>> [   24.705784]  [] start_flush_work+0x2f/0x290
>> [   

Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-23 Thread Sedat Dilek
On Thu, Sep 10, 2015 at 3:04 AM, Lai Jiangshan  wrote:
> Hi, TJ
>
> I think we need to add might_sleep() on the top of __cancel_work_timer().
> The might_sleep() on the start_flush_work() doesn't cover all the
> paths of __cancel_work_timer().
> And it can help to narrow the area of this bug.
>

I moved the might_sleep() from start_flush_work() to
__cancel_work_timer() (see attached patch).

[   23.444152] BUG: sleeping function called from invalid context at
kernel/workqueue.c:2767
[   23.444217] in_atomic(): 0, irqs_disabled(): 1, pid: 1423, name: acpid
[   23.444265] 3 locks held by acpid/1423:
[   23.444266]  #0:  (>mutex){+.+...}, at: []
evdev_release+0xbc/0xf0
[   23.444276]  #1:  (>mutex#2){+.+...}, at: []
input_close_device+0x27/0x70
[   23.444283]  #2:  (hid_open_mut){+.+...}, at: []
usbhid_close+0x28/0xb0 [usbhid]
[   23.444291] irq event stamp: 7850
[   23.444293] hardirqs last  enabled at (7849): []
_raw_spin_unlock_irq+0x32/0x60
[   23.444298] hardirqs last disabled at (7850): []
del_timer_sync+0x37/0x110
[   23.444303] softirqs last  enabled at (7164): []
local_bh_enable+0x9/0x20
[   23.444307] softirqs last disabled at (7162): []
local_bh_disable+0x9/0x20
[   23.444312] CPU: 2 PID: 1423 Comm: acpid Not tainted
4.3.0-rc2-3-llvmlinux-amd64 #2
[   23.444314] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[   23.444316]  8800d2e82948 0096 
8800bcc4fbf8
[   23.444320]  8149a4ad 8800bcc4fc28 810cd53a
81c56fcc
[   23.444324]  8800c69f0040  0acf
8800bcc4fc68
[   23.444328] Call Trace:
[   23.444332]  [] dump_stack+0x7d/0xa0
[   23.444336]  [] ___might_sleep+0x28a/0x2a0
[   23.444338]  [] __might_sleep+0x4f/0xc0
[   23.444342]  [] __cancel_work_timer+0x2e/0x270
[   23.444345]  [] ? _raw_spin_unlock_irqrestore+0x52/0x80
[   23.444348]  [] ? try_to_del_timer_sync+0xad/0xc0
[   23.444351]  [] cancel_work_sync+0x18/0x20
[   23.444354]  [] usbhid_close+0x75/0xb0 [usbhid]
[   23.444360]  [] hidinput_close+0x31/0x40 [hid]
[   23.444364]  [] ? hidinput_open+0x40/0x40 [hid]
[   23.444367]  [] input_close_device+0x48/0x70
[   23.444370]  [] evdev_release+0xd6/0xf0
[   23.444373]  [] __fput+0x107/0x240
[   23.444375]  [] fput+0x16/0x20
[   23.444378]  [] task_work_run+0x6c/0xe0
[   23.444383]  [] prepare_exit_to_usermode+0x13a/0x140
[   23.444386]  [] syscall_return_slowpath+0x281/0x2f0
[   23.444389]  [] ? filp_close+0x65/0x90
[   23.444392]  [] ? trace_hardirqs_on_thunk+0x17/0x19
[   23.444396]  [] int_ret_from_sys_call+0x25/0x9f

Can you look at this?

- Sedat -

> Hi Sedat Dilek
>
> [   24.705704] irq event stamp: 19968
> [   24.705706] hardirqs last  enabled at (19967): []
> _raw_spin_unlock_irq+0x32/0x60
> [   24.705713] hardirqs last disabled at (19968): []
> del_timer_sync+0x37/0x110
>
> Is it means the irq-disabled-event is leak by del_timer_sync()? It is
> impossible.
>
> usbhid_close()
>   mutex_lock(); // it has might_sleep() check. So the problem seems to be
> // hidden at one of the following statements
>   del_timer_sync();
>   cancel_work_sync();
>
>
> On Wed, Sep 9, 2015 at 6:24 PM, Sedat Dilek  wrote:
>> [ TO WORKQUEUE maintainers ]
>>
>> Hi,
>>
>> I encountered a BUG with a llvmlinux compiled kernel here on
>> Ubuntu/precise AMD64 touching the workqueue area.
>>
>> [   24.705463] BUG: sleeping function called from invalid context at
>> kernel/workqueue.c:2680
>> [   24.705576] in_atomic(): 0, irqs_disabled(): 1, pid: 1447, name: acpid
>> [   24.705662] 3 locks held by acpid/1447:
>> [   24.705664]  #0:  (>mutex){+.+...}, at: []
>> evdev_release+0xbc/0xf0
>> [   24.705678]  #1:  (>mutex#2){+.+...}, at: []
>> input_close_device+0x27/0x70
>> [   24.705691]  #2:  (hid_open_mut){+.+...}, at: []
>> usbhid_close+0x28/0xb0 [usbhid]
>> [   24.705704] irq event stamp: 19968
>> [   24.705706] hardirqs last  enabled at (19967): []
>> _raw_spin_unlock_irq+0x32/0x60
>> [   24.705713] hardirqs last disabled at (19968): []
>> del_timer_sync+0x37/0x110
>> [   24.705720] softirqs last  enabled at (18890): []
>> local_bh_enable+0x9/0x20
>> [   24.705726] softirqs last disabled at (1): []
>> local_bh_disable+0x9/0x20
>> [   24.705734] CPU: 2 PID: 1447 Comm: acpid Not tainted
>> 4.2.0-2-llvmlinux-small #2
>> [   24.705737] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> [   24.705741]  8800d57ea948 0092 
>> 8800bb3e3aa8
>> [   24.705748]  8149287d 8800bb3e3ad8 810cbf7a
>> 81c51a34
>> [   24.705754]  8800d3095000  0a78
>> 8800bb3e3b18
>> [   24.705761] Call Trace:
>> [   24.705767]  [] dump_stack+0x7d/0xa0
>> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
>> [   24.705779]  [] __might_sleep+0x4f/0xc0
>> [   

Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-14 Thread Sedat Dilek
On Mon, Sep 14, 2015 at 4:00 PM, Sedat Dilek  wrote:
> On Thu, Sep 10, 2015 at 3:04 AM, Lai Jiangshan  wrote:
>> Hi, TJ
>>
>> I think we need to add might_sleep() on the top of __cancel_work_timer().
>> The might_sleep() on the start_flush_work() doesn't cover all the
>> paths of __cancel_work_timer().
>> And it can help to narrow the area of this bug.
>>
>> Hi Sedat Dilek
>>
>> [   24.705704] irq event stamp: 19968
>> [   24.705706] hardirqs last  enabled at (19967): []
>> _raw_spin_unlock_irq+0x32/0x60
>> [   24.705713] hardirqs last disabled at (19968): []
>> del_timer_sync+0x37/0x110
>>
>> Is it means the irq-disabled-event is leak by del_timer_sync()? It is
>> impossible.
>>
>> usbhid_close()
>>   mutex_lock(); // it has might_sleep() check. So the problem seems to be
>> // hidden at one of the following statements
>>   del_timer_sync();
>>   cancel_work_sync();
>>
>
> With attached suggested patch the bug-line now looks like this...
>
> [   22.604524] BUG: sleeping function called from invalid context at
> kernel/workqueue.c:2771
> [   22.604539] in_atomic(): 0, irqs_disabled(): 1, pid: 1347, name: acpid
> [   22.604546] 3 locks held by acpid/1347:
> [   22.604547]  #0:  (>mutex){+.+...}, at: []
> evdev_release+0xbc/0xf0
> [   22.604557]  #1:  (>mutex#2){+.+...}, at: []
> input_close_device+0x27/0x70
> [   22.604566]  #2:  (hid_open_mut){+.+...}, at: []
> usbhid_close+0x28/0xb0 [usbhid]
> [   22.604573] irq event stamp: 19874
> [   22.604575] hardirqs last  enabled at (19873): []
> _raw_spin_unlock_irq+0x32/0x60
> [   22.604579] hardirqs last disabled at (19874): []
> del_timer_sync+0x37/0x110
> [   22.604584] softirqs last  enabled at (18852): []
> local_bh_enable+0x9/0x20
> [   22.604588] softirqs last disabled at (18850): []
> local_bh_disable+0x9/0x20
> [   22.604593] CPU: 3 PID: 1347 Comm: acpid Not tainted
> 4.2.0-6-llvmlinux-amd64 #1
> [   22.604595] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [   22.604597]  8800d31ce948 0092 
> 880116e2fbb8
> [   22.604601]  8149289d 880116e2fbe8 810cbf8a
> 81c51a34
> [   22.604605]  8800c65d5140  0ad3
> 880116e2fc28
> [   22.604608] Call Trace:
> [   22.604612]  [] dump_stack+0x7d/0xa0
> [   22.604617]  [] ___might_sleep+0x28a/0x2a0
> [   22.604620]  [] __might_sleep+0x4f/0xc0
> [   22.604623]  [] __cancel_work_timer+0x2e/0x270
> [   22.604626]  [] ? _raw_spin_unlock_irqrestore+0x52/0x80
> [   22.604630]  [] ? try_to_del_timer_sync+0xad/0xc0
> [   22.604632]  [] cancel_work_sync+0x18/0x20
> [   22.604636]  [] usbhid_close+0x75/0xb0 [usbhid]
> [   22.604641]  [] hidinput_close+0x31/0x40 [hid]
> [   22.604645]  [] ? hidinput_open+0x40/0x40 [hid]
> [   22.604648]  [] input_close_device+0x48/0x70
> [   22.604651]  [] evdev_release+0xd6/0xf0
> [   22.604655]  [] __fput+0x107/0x240
> [   22.604659]  [] fput+0x16/0x20
> [   22.604662]  [] task_work_run+0x87/0x130
> [   22.604667]  [] do_notify_resume+0x9cf/0xa00
> [   22.604670]  [] ? trace_hardirqs_on+0xd/0x10
> [   22.604675]  [] ? context_tracking_user_enter+0x13/0x20
> [   22.604678]  [] ? syscall_trace_leave+0x111/0x340
> [   22.604681]  [] ? fput+0x76/0xd0
> [   22.604684]  [] ? filp_close+0x65/0x90
> [   22.604688]  [] ? trace_hardirqs_on_thunk+0x17/0x19
> [   22.604691]  [] int_signal+0x12/0x17
>
> Full dmesg-log and my kernel-config are attached.
>
> Hope this helps.
>

My Xorg crashed several times - so this bug is reproducible.

- Sedat -
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Initializing cgroup subsys cpuacct
[0.00] Linux version 4.2.0-6-llvmlinux-amd64 
(sedat.di...@gmail.com@fambox) (clang version 3.7.0 (tags/RELEASE_370/final)) 
#1 SMP Mon Sep 14 15:49:13 CEST 2015
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.2.0-6-llvmlinux-amd64 
root=UUID=001AADA61AAD9964 loop=/ubuntu/disks/root.disk ro
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] Disabled fast string operations
[0.00] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
[0.00] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 0x340 
bytes, using 'standard' format.
[0.00] x86/fpu: Using 'eager' FPU context switches.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable
[0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] 

Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-14 Thread Sedat Dilek
On Fri, Sep 11, 2015 at 6:12 PM, Sedat Dilek  wrote:
> On Thu, Sep 10, 2015 at 4:53 PM, Tejun Heo  wrote:
>> On Thu, Sep 10, 2015 at 10:52:27AM -0400, Tejun Heo wrote:
>>> Hey,
>>>
>>> On Thu, Sep 10, 2015 at 09:04:27AM +0800, Lai Jiangshan wrote:
>>> > I think we need to add might_sleep() on the top of __cancel_work_timer().
>>> > The might_sleep() on the start_flush_work() doesn't cover all the
>>> > paths of __cancel_work_timer().
>>> > And it can help to narrow the area of this bug.
>>>
>>> Sure, can you submit a patch?
>>
>> That said, this one is likely not a bug in the kernel code but more
>> likely llvm either miscompiling or deviating from gcc behavior on some
>> corner cases.
>>
>
> An llvmlinux-patched Linux v4.2 kernel compiled with GCC v4.9 does not
> show this BUG line.
>

This bug seems to be more serious as I thought.

On my llvmlinux-patched Linux v4.2 my Xorg crashes reproducibly when
marking a text in Firefox with my Logitech mouse.
My LightDM login-manager restarts and I have to re-login.

With a GCC v4.9 compiled Linux v4.2 this does not happen - the
bug-line does not occur.

My mouse (USB 2.0) is attached to a USB 3.0 plug on my notebook.
If this matters...

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-14 Thread Sedat Dilek
On Fri, Sep 11, 2015 at 6:12 PM, Sedat Dilek  wrote:
> On Thu, Sep 10, 2015 at 4:53 PM, Tejun Heo  wrote:
>> On Thu, Sep 10, 2015 at 10:52:27AM -0400, Tejun Heo wrote:
>>> Hey,
>>>
>>> On Thu, Sep 10, 2015 at 09:04:27AM +0800, Lai Jiangshan wrote:
>>> > I think we need to add might_sleep() on the top of __cancel_work_timer().
>>> > The might_sleep() on the start_flush_work() doesn't cover all the
>>> > paths of __cancel_work_timer().
>>> > And it can help to narrow the area of this bug.
>>>
>>> Sure, can you submit a patch?
>>
>> That said, this one is likely not a bug in the kernel code but more
>> likely llvm either miscompiling or deviating from gcc behavior on some
>> corner cases.
>>
>
> An llvmlinux-patched Linux v4.2 kernel compiled with GCC v4.9 does not
> show this BUG line.
>

This bug seems to be more serious as I thought.

On my llvmlinux-patched Linux v4.2 my Xorg crashes reproducibly when
marking a text in Firefox with my Logitech mouse.
My LightDM login-manager restarts and I have to re-login.

With a GCC v4.9 compiled Linux v4.2 this does not happen - the
bug-line does not occur.

My mouse (USB 2.0) is attached to a USB 3.0 plug on my notebook.
If this matters...

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-14 Thread Sedat Dilek
On Mon, Sep 14, 2015 at 4:00 PM, Sedat Dilek  wrote:
> On Thu, Sep 10, 2015 at 3:04 AM, Lai Jiangshan  wrote:
>> Hi, TJ
>>
>> I think we need to add might_sleep() on the top of __cancel_work_timer().
>> The might_sleep() on the start_flush_work() doesn't cover all the
>> paths of __cancel_work_timer().
>> And it can help to narrow the area of this bug.
>>
>> Hi Sedat Dilek
>>
>> [   24.705704] irq event stamp: 19968
>> [   24.705706] hardirqs last  enabled at (19967): []
>> _raw_spin_unlock_irq+0x32/0x60
>> [   24.705713] hardirqs last disabled at (19968): []
>> del_timer_sync+0x37/0x110
>>
>> Is it means the irq-disabled-event is leak by del_timer_sync()? It is
>> impossible.
>>
>> usbhid_close()
>>   mutex_lock(); // it has might_sleep() check. So the problem seems to be
>> // hidden at one of the following statements
>>   del_timer_sync();
>>   cancel_work_sync();
>>
>
> With attached suggested patch the bug-line now looks like this...
>
> [   22.604524] BUG: sleeping function called from invalid context at
> kernel/workqueue.c:2771
> [   22.604539] in_atomic(): 0, irqs_disabled(): 1, pid: 1347, name: acpid
> [   22.604546] 3 locks held by acpid/1347:
> [   22.604547]  #0:  (>mutex){+.+...}, at: []
> evdev_release+0xbc/0xf0
> [   22.604557]  #1:  (>mutex#2){+.+...}, at: []
> input_close_device+0x27/0x70
> [   22.604566]  #2:  (hid_open_mut){+.+...}, at: []
> usbhid_close+0x28/0xb0 [usbhid]
> [   22.604573] irq event stamp: 19874
> [   22.604575] hardirqs last  enabled at (19873): []
> _raw_spin_unlock_irq+0x32/0x60
> [   22.604579] hardirqs last disabled at (19874): []
> del_timer_sync+0x37/0x110
> [   22.604584] softirqs last  enabled at (18852): []
> local_bh_enable+0x9/0x20
> [   22.604588] softirqs last disabled at (18850): []
> local_bh_disable+0x9/0x20
> [   22.604593] CPU: 3 PID: 1347 Comm: acpid Not tainted
> 4.2.0-6-llvmlinux-amd64 #1
> [   22.604595] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [   22.604597]  8800d31ce948 0092 
> 880116e2fbb8
> [   22.604601]  8149289d 880116e2fbe8 810cbf8a
> 81c51a34
> [   22.604605]  8800c65d5140  0ad3
> 880116e2fc28
> [   22.604608] Call Trace:
> [   22.604612]  [] dump_stack+0x7d/0xa0
> [   22.604617]  [] ___might_sleep+0x28a/0x2a0
> [   22.604620]  [] __might_sleep+0x4f/0xc0
> [   22.604623]  [] __cancel_work_timer+0x2e/0x270
> [   22.604626]  [] ? _raw_spin_unlock_irqrestore+0x52/0x80
> [   22.604630]  [] ? try_to_del_timer_sync+0xad/0xc0
> [   22.604632]  [] cancel_work_sync+0x18/0x20
> [   22.604636]  [] usbhid_close+0x75/0xb0 [usbhid]
> [   22.604641]  [] hidinput_close+0x31/0x40 [hid]
> [   22.604645]  [] ? hidinput_open+0x40/0x40 [hid]
> [   22.604648]  [] input_close_device+0x48/0x70
> [   22.604651]  [] evdev_release+0xd6/0xf0
> [   22.604655]  [] __fput+0x107/0x240
> [   22.604659]  [] fput+0x16/0x20
> [   22.604662]  [] task_work_run+0x87/0x130
> [   22.604667]  [] do_notify_resume+0x9cf/0xa00
> [   22.604670]  [] ? trace_hardirqs_on+0xd/0x10
> [   22.604675]  [] ? context_tracking_user_enter+0x13/0x20
> [   22.604678]  [] ? syscall_trace_leave+0x111/0x340
> [   22.604681]  [] ? fput+0x76/0xd0
> [   22.604684]  [] ? filp_close+0x65/0x90
> [   22.604688]  [] ? trace_hardirqs_on_thunk+0x17/0x19
> [   22.604691]  [] int_signal+0x12/0x17
>
> Full dmesg-log and my kernel-config are attached.
>
> Hope this helps.
>

My Xorg crashed several times - so this bug is reproducible.

- Sedat -
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Initializing cgroup subsys cpuacct
[0.00] Linux version 4.2.0-6-llvmlinux-amd64 
(sedat.di...@gmail.com@fambox) (clang version 3.7.0 (tags/RELEASE_370/final)) 
#1 SMP Mon Sep 14 15:49:13 CEST 2015
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.2.0-6-llvmlinux-amd64 
root=UUID=001AADA61AAD9964 loop=/ubuntu/disks/root.disk ro
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] Disabled fast string operations
[0.00] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
[0.00] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 0x340 
bytes, using 'standard' format.
[0.00] x86/fpu: Using 'eager' FPU context switches.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable
[0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved
[0.00] BIOS-e820: [mem 

Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-11 Thread Sedat Dilek
On Thu, Sep 10, 2015 at 4:53 PM, Tejun Heo  wrote:
> On Thu, Sep 10, 2015 at 10:52:27AM -0400, Tejun Heo wrote:
>> Hey,
>>
>> On Thu, Sep 10, 2015 at 09:04:27AM +0800, Lai Jiangshan wrote:
>> > I think we need to add might_sleep() on the top of __cancel_work_timer().
>> > The might_sleep() on the start_flush_work() doesn't cover all the
>> > paths of __cancel_work_timer().
>> > And it can help to narrow the area of this bug.
>>
>> Sure, can you submit a patch?
>
> That said, this one is likely not a bug in the kernel code but more
> likely llvm either miscompiling or deviating from gcc behavior on some
> corner cases.
>

An llvmlinux-patched Linux v4.2 kernel compiled with GCC v4.9 does not
show this BUG line.

- Sedat -

> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-11 Thread Sedat Dilek
On Thu, Sep 10, 2015 at 4:53 PM, Tejun Heo  wrote:
> On Thu, Sep 10, 2015 at 10:52:27AM -0400, Tejun Heo wrote:
>> Hey,
>>
>> On Thu, Sep 10, 2015 at 09:04:27AM +0800, Lai Jiangshan wrote:
>> > I think we need to add might_sleep() on the top of __cancel_work_timer().
>> > The might_sleep() on the start_flush_work() doesn't cover all the
>> > paths of __cancel_work_timer().
>> > And it can help to narrow the area of this bug.
>>
>> Sure, can you submit a patch?
>
> That said, this one is likely not a bug in the kernel code but more
> likely llvm either miscompiling or deviating from gcc behavior on some
> corner cases.
>

An llvmlinux-patched Linux v4.2 kernel compiled with GCC v4.9 does not
show this BUG line.

- Sedat -

> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-10 Thread Tejun Heo
On Thu, Sep 10, 2015 at 10:52:27AM -0400, Tejun Heo wrote:
> Hey,
> 
> On Thu, Sep 10, 2015 at 09:04:27AM +0800, Lai Jiangshan wrote:
> > I think we need to add might_sleep() on the top of __cancel_work_timer().
> > The might_sleep() on the start_flush_work() doesn't cover all the
> > paths of __cancel_work_timer().
> > And it can help to narrow the area of this bug.
> 
> Sure, can you submit a patch?

That said, this one is likely not a bug in the kernel code but more
likely llvm either miscompiling or deviating from gcc behavior on some
corner cases.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-10 Thread Tejun Heo
Hey,

On Thu, Sep 10, 2015 at 09:04:27AM +0800, Lai Jiangshan wrote:
> I think we need to add might_sleep() on the top of __cancel_work_timer().
> The might_sleep() on the start_flush_work() doesn't cover all the
> paths of __cancel_work_timer().
> And it can help to narrow the area of this bug.

Sure, can you submit a patch?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-10 Thread Tejun Heo
On Thu, Sep 10, 2015 at 10:52:27AM -0400, Tejun Heo wrote:
> Hey,
> 
> On Thu, Sep 10, 2015 at 09:04:27AM +0800, Lai Jiangshan wrote:
> > I think we need to add might_sleep() on the top of __cancel_work_timer().
> > The might_sleep() on the start_flush_work() doesn't cover all the
> > paths of __cancel_work_timer().
> > And it can help to narrow the area of this bug.
> 
> Sure, can you submit a patch?

That said, this one is likely not a bug in the kernel code but more
likely llvm either miscompiling or deviating from gcc behavior on some
corner cases.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-10 Thread Tejun Heo
Hey,

On Thu, Sep 10, 2015 at 09:04:27AM +0800, Lai Jiangshan wrote:
> I think we need to add might_sleep() on the top of __cancel_work_timer().
> The might_sleep() on the start_flush_work() doesn't cover all the
> paths of __cancel_work_timer().
> And it can help to narrow the area of this bug.

Sure, can you submit a patch?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-09 Thread Lai Jiangshan
Hi, TJ

I think we need to add might_sleep() on the top of __cancel_work_timer().
The might_sleep() on the start_flush_work() doesn't cover all the
paths of __cancel_work_timer().
And it can help to narrow the area of this bug.

Hi Sedat Dilek

[   24.705704] irq event stamp: 19968
[   24.705706] hardirqs last  enabled at (19967): []
_raw_spin_unlock_irq+0x32/0x60
[   24.705713] hardirqs last disabled at (19968): []
del_timer_sync+0x37/0x110

Is it means the irq-disabled-event is leak by del_timer_sync()? It is
impossible.

usbhid_close()
  mutex_lock(); // it has might_sleep() check. So the problem seems to be
// hidden at one of the following statements
  del_timer_sync();
  cancel_work_sync();


On Wed, Sep 9, 2015 at 6:24 PM, Sedat Dilek  wrote:
> [ TO WORKQUEUE maintainers ]
>
> Hi,
>
> I encountered a BUG with a llvmlinux compiled kernel here on
> Ubuntu/precise AMD64 touching the workqueue area.
>
> [   24.705463] BUG: sleeping function called from invalid context at
> kernel/workqueue.c:2680
> [   24.705576] in_atomic(): 0, irqs_disabled(): 1, pid: 1447, name: acpid
> [   24.705662] 3 locks held by acpid/1447:
> [   24.705664]  #0:  (>mutex){+.+...}, at: []
> evdev_release+0xbc/0xf0
> [   24.705678]  #1:  (>mutex#2){+.+...}, at: []
> input_close_device+0x27/0x70
> [   24.705691]  #2:  (hid_open_mut){+.+...}, at: []
> usbhid_close+0x28/0xb0 [usbhid]
> [   24.705704] irq event stamp: 19968
> [   24.705706] hardirqs last  enabled at (19967): []
> _raw_spin_unlock_irq+0x32/0x60
> [   24.705713] hardirqs last disabled at (19968): []
> del_timer_sync+0x37/0x110
> [   24.705720] softirqs last  enabled at (18890): []
> local_bh_enable+0x9/0x20
> [   24.705726] softirqs last disabled at (1): []
> local_bh_disable+0x9/0x20
> [   24.705734] CPU: 2 PID: 1447 Comm: acpid Not tainted
> 4.2.0-2-llvmlinux-small #2
> [   24.705737] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [   24.705741]  8800d57ea948 0092 
> 8800bb3e3aa8
> [   24.705748]  8149287d 8800bb3e3ad8 810cbf7a
> 81c51a34
> [   24.705754]  8800d3095000  0a78
> 8800bb3e3b18
> [   24.705761] Call Trace:
> [   24.705767]  [] dump_stack+0x7d/0xa0
> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
> [   24.705779]  [] __might_sleep+0x4f/0xc0
> [   24.705784]  [] start_flush_work+0x2f/0x290
> [   24.705789]  [] flush_work+0x5c/0x80
> [   24.705792]  [] ? flush_work+0x1a/0x80
> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
> [   24.705810]  [] ? _raw_spin_lock_irqsave+0x73/0x80
> [   24.705814]  [] __cancel_work_timer+0x179/0x260
> [   24.705820]  [] ? _raw_spin_unlock_irqrestore+0x52/0x80
> [   24.705825]  [] ? try_to_del_timer_sync+0xad/0xc0
> [   24.705829]  [] cancel_work_sync+0x18/0x20
> [   24.705835]  [] usbhid_close+0x75/0xb0 [usbhid]
> [   24.705844]  [] hidinput_close+0x31/0x40 [hid]
> [   24.705851]  [] ? hidinput_open+0x40/0x40 [hid]
> [   24.705856]  [] input_close_device+0x48/0x70
> [   24.705861]  [] evdev_release+0xd6/0xf0
> [   24.705867]  [] __fput+0x107/0x240
> [   24.705872]  [] fput+0x16/0x20
> [   24.705877]  [] task_work_run+0x87/0x130
> [   24.705883]  [] do_notify_resume+0x9cf/0xa00
> [   24.705888]  [] ? trace_hardirqs_on+0xd/0x10
> [   24.705895]  [] ? context_tracking_user_enter+0x13/0x20
> [   24.705900]  [] ? syscall_trace_leave+0x111/0x340
> [   24.705905]  [] ? fput+0x76/0xd0
> [   24.705909]  [] ? filp_close+0x65/0x90
> [   24.705915]  [] ? trace_hardirqs_on_thunk+0x17/0x19
> [   24.705920]  [] int_signal+0x12/0x17
>
> Attached are the full dmesg-log and my kernel-config.
>
> Which area in the kernel is the root cause? CC more people?
>
> Do you need more informations?
> How can I help?
>
> Regards,
> - Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

2015-09-09 Thread Lai Jiangshan
Hi, TJ

I think we need to add might_sleep() on the top of __cancel_work_timer().
The might_sleep() on the start_flush_work() doesn't cover all the
paths of __cancel_work_timer().
And it can help to narrow the area of this bug.

Hi Sedat Dilek

[   24.705704] irq event stamp: 19968
[   24.705706] hardirqs last  enabled at (19967): []
_raw_spin_unlock_irq+0x32/0x60
[   24.705713] hardirqs last disabled at (19968): []
del_timer_sync+0x37/0x110

Is it means the irq-disabled-event is leak by del_timer_sync()? It is
impossible.

usbhid_close()
  mutex_lock(); // it has might_sleep() check. So the problem seems to be
// hidden at one of the following statements
  del_timer_sync();
  cancel_work_sync();


On Wed, Sep 9, 2015 at 6:24 PM, Sedat Dilek  wrote:
> [ TO WORKQUEUE maintainers ]
>
> Hi,
>
> I encountered a BUG with a llvmlinux compiled kernel here on
> Ubuntu/precise AMD64 touching the workqueue area.
>
> [   24.705463] BUG: sleeping function called from invalid context at
> kernel/workqueue.c:2680
> [   24.705576] in_atomic(): 0, irqs_disabled(): 1, pid: 1447, name: acpid
> [   24.705662] 3 locks held by acpid/1447:
> [   24.705664]  #0:  (>mutex){+.+...}, at: []
> evdev_release+0xbc/0xf0
> [   24.705678]  #1:  (>mutex#2){+.+...}, at: []
> input_close_device+0x27/0x70
> [   24.705691]  #2:  (hid_open_mut){+.+...}, at: []
> usbhid_close+0x28/0xb0 [usbhid]
> [   24.705704] irq event stamp: 19968
> [   24.705706] hardirqs last  enabled at (19967): []
> _raw_spin_unlock_irq+0x32/0x60
> [   24.705713] hardirqs last disabled at (19968): []
> del_timer_sync+0x37/0x110
> [   24.705720] softirqs last  enabled at (18890): []
> local_bh_enable+0x9/0x20
> [   24.705726] softirqs last disabled at (1): []
> local_bh_disable+0x9/0x20
> [   24.705734] CPU: 2 PID: 1447 Comm: acpid Not tainted
> 4.2.0-2-llvmlinux-small #2
> [   24.705737] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [   24.705741]  8800d57ea948 0092 
> 8800bb3e3aa8
> [   24.705748]  8149287d 8800bb3e3ad8 810cbf7a
> 81c51a34
> [   24.705754]  8800d3095000  0a78
> 8800bb3e3b18
> [   24.705761] Call Trace:
> [   24.705767]  [] dump_stack+0x7d/0xa0
> [   24.705774]  [] ___might_sleep+0x28a/0x2a0
> [   24.705779]  [] __might_sleep+0x4f/0xc0
> [   24.705784]  [] start_flush_work+0x2f/0x290
> [   24.705789]  [] flush_work+0x5c/0x80
> [   24.705792]  [] ? flush_work+0x1a/0x80
> [   24.705799]  [] ? trace_hardirqs_off+0xd/0x10
> [   24.705804]  [] ? try_to_grab_pending+0x48/0x360
> [   24.705810]  [] ? _raw_spin_lock_irqsave+0x73/0x80
> [   24.705814]  [] __cancel_work_timer+0x179/0x260
> [   24.705820]  [] ? _raw_spin_unlock_irqrestore+0x52/0x80
> [   24.705825]  [] ? try_to_del_timer_sync+0xad/0xc0
> [   24.705829]  [] cancel_work_sync+0x18/0x20
> [   24.705835]  [] usbhid_close+0x75/0xb0 [usbhid]
> [   24.705844]  [] hidinput_close+0x31/0x40 [hid]
> [   24.705851]  [] ? hidinput_open+0x40/0x40 [hid]
> [   24.705856]  [] input_close_device+0x48/0x70
> [   24.705861]  [] evdev_release+0xd6/0xf0
> [   24.705867]  [] __fput+0x107/0x240
> [   24.705872]  [] fput+0x16/0x20
> [   24.705877]  [] task_work_run+0x87/0x130
> [   24.705883]  [] do_notify_resume+0x9cf/0xa00
> [   24.705888]  [] ? trace_hardirqs_on+0xd/0x10
> [   24.705895]  [] ? context_tracking_user_enter+0x13/0x20
> [   24.705900]  [] ? syscall_trace_leave+0x111/0x340
> [   24.705905]  [] ? fput+0x76/0xd0
> [   24.705909]  [] ? filp_close+0x65/0x90
> [   24.705915]  [] ? trace_hardirqs_on_thunk+0x17/0x19
> [   24.705920]  [] int_signal+0x12/0x17
>
> Attached are the full dmesg-log and my kernel-config.
>
> Which area in the kernel is the root cause? CC more people?
>
> Do you need more informations?
> How can I help?
>
> Regards,
> - Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/