Re: 3.16-rc1 kernel BUG triggered at mutex.c:586

2014-06-16 Thread Mike Galbraith
On Mon, 2014-06-16 at 14:28 -0400, Vlad Yasevich wrote: 
> Trying to boot 3.16.0-rc1+ (the + is a small vlan related patch) on a
> tests system triggers the following BUG:

It's easier for developers to read if you dump dmesg, or at least tell
your mail client to please not make a line-wrapped mess of your report.

 

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/3
1 lock held by swapper/3/0:
#0: (&(>counts.lock)->rlock){..-...}, at: [] 
bitmap_endwrite+0x58/0x230 211959
hardirqs last  enabled at (211958): [] 
_raw_spin_unlock_irqrestore+0x36/0x70
hardirqs last disabled at (211959): [] 
_raw_spin_lock_irqsave+0x29/0x70
softirqs last  enabled at (211952): [] 
_local_bh_enable+0x22/0x50
softirqs last disabled at (211953): [] irq_exit+0xc5/0xd0
swapper/3 Not tainted 3.16.0-rc1+ #35 Hewlett-Packard HP xw8400 
Workstation/0A08h, BIOS 786D5 v02.38 10/25/2010
 88047fcc3bd0
81706ef2 81c857e0
88047fcc3be0 810b4dba
88047fcc3c50 8170c79c
 81c85938
0046 0092
Call Trace:

[] dump_stack+0x4d/0x66
[] __might_sleep+0xfa/0x130
[] mutex_lock_nested+0x3c/0x3a0
[] kernfs_notify+0x8a/0x140
[] bitmap_endwrite+0xb9/0x230
[] close_write+0x93/0xb0 [raid1]
[] r1_bio_write_done+0x29/0x50 [raid1]
[] raid1_end_write_request+0xcf/0x220 [raid1]
[] bio_endio+0x5b/0xa0
[] blk_update_request+0x90/0x330
[] blk_update_bidi_request+0x1c/0x80
[] blk_end_bidi_request+0x1f/0x60
[] blk_end_request+0x10/0x20
[] scsi_io_completion+0xf4/0x6e0
[] scsi_finish_command+0xb3/0x110
[] scsi_softirq_done+0x137/0x160
[] blk_done_softirq+0x90/0xb0
[] __do_softirq+0x125/0x300
[] irq_exit+0xc5/0xd0
[] smp_call_function_single_interrupt+0x35/0x40
[] call_function_single_interrupt+0x72/0x80

[] ?  native_safe_halt+0x6/0x10
[] ?  trace_hardirqs_on+0xd/0x10
[] default_idle+0x24/0xe0
[] arch_cpu_idle+0xf/0x20
[] cpu_startup_entry+0x2cb/0x450
[] ?  clockevents_register_device+0xbc/0x120
[] start_secondary+0x1de/0x290
=

inconsistent lock state ]
3.16.0-rc1+ #35 Not tainted
-
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/3/0 [HC0[0]:SC1[1]:HE0:SE0] takes: (kernfs_mutex){+.?.+.}, at: 
[] kernfs_notify+0x8a/0x140
{SOFTIRQ-ON-W} state was registered at:
 [] __lock_acquire+0x316/0x1a60
 [] lock_acquire+0xa2/0x130
 [] mutex_lock_nested+0x6a/0x3a0
 [] kernfs_activate+0x1f/0xf0
 [] kernfs_create_root+0xe8/0x110
 [] sysfs_init+0x13/0x51
 [] mnt_init+0x118/0x225
 [] vfs_caches_init+0x99/0x115
 [] start_kernel+0x3ec/0x443
 [] x86_64_start_reservations+0x2a/0x2c
 [] x86_64_start_kernel+0x13e/0x14d
211959 hardirqs last  enabled at (211958): [] 
_raw_spin_unlock_irqrestore+0x36/0x70
hardirqs last disabled at (211959): [] 
_raw_spin_lock_irqsave+0x29/0x70
softirqs last  enabled at (211952): [] 
_local_bh_enable+0x22/0x50
softirqs last disabled at (211953): [] irq_exit+0xc5/0xd0

other info that might help us debug this:
Possible unsafe locking scenario:

  CPU0
  
 lock(kernfs_mutex);
 
 lock(kernfs_mutex);

 *** DEADLOCK ***

1 lock held by swapper/3/0:
#0: (&(>counts.lock)->rlock){..-...}, at: [] 
bitmap_endwrite+0x58/0x230
stack backtrace: swapper/3 Not tainted 3.16.0-rc1+ #35 Hewlett-Packard HP 
xw8400 Workstation/0A08h, BIOS 786D5 v02.38 10/25/2010
82638c50 88047fcc3a20
81706ef2 8804695edb40
88047fcc3a70 81702a16
0001 8804
8804 0006
8804695edb40 810d7c20
Call Trace:

[] dump_stack+0x4d/0x66
[] print_usage_bug+0x1f3/0x204
[] ?  check_usage_backwards+0x140/0x140
[] mark_lock+0x222/0x2b0
[] __lock_acquire+0x2b8/0x1a60
[] ?  show_trace_log_lvl+0x4d/0x60
[] ?  show_stack_log_lvl+0xa4/0x190
[] lock_acquire+0xa2/0x130
[] ?  kernfs_notify+0x8a/0x140
[] mutex_lock_nested+0x6a/0x3a0
[] ?  kernfs_notify+0x8a/0x140
[] kernfs_notify+0x8a/0x140
[] bitmap_endwrite+0xb9/0x230
[] close_write+0x93/0xb0 [raid1]
[] r1_bio_write_done+0x29/0x50 [raid1]
[] raid1_end_write_request+0xcf/0x220 [raid1]
[] bio_endio+0x5b/0xa0
[] blk_update_request+0x90/0x330
[] blk_update_bidi_request+0x1c/0x80
[] blk_end_bidi_request+0x1f/0x60
[] blk_end_request+0x10/0x20
[] scsi_io_completion+0xf4/0x6e0
[] scsi_finish_command+0xb3/0x110
[] scsi_softirq_done+0x137/0x160
[] blk_done_softirq+0x90/0xb0
[] __do_softirq+0x125/0x300
[] irq_exit+0xc5/0xd0
[] smp_call_function_single_interrupt+0x35/0x40
[] call_function_single_interrupt+0x72/0x80

[] ?  native_safe_halt+0x6/0x10
[] ?  trace_hardirqs_on+0xd/0x10
[] default_idle+0x24/0xe0
[] arch_cpu_idle+0xf/0x20
[] cpu_startup_entry+0x2cb/0x450
[] ?  clockevents_register_device+0xbc/0x120
[] start_secondary+0x1de/0x290


> Is this a known issue?  Are there any patches I can help
> test?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to 

Re: 3.16-rc1 kernel BUG triggered at mutex.c:586

2014-06-16 Thread Mike Galbraith
On Mon, 2014-06-16 at 14:28 -0400, Vlad Yasevich wrote: 
 Trying to boot 3.16.0-rc1+ (the + is a small vlan related patch) on a
 tests system triggers the following BUG:

It's easier for developers to read if you dump dmesg, or at least tell
your mail client to please not make a line-wrapped mess of your report.

replaces the gawd awful mess your mailer made 

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/3
1 lock held by swapper/3/0:
#0: ((bitmap-counts.lock)-rlock){..-...}, at: [815729e8] 
bitmap_endwrite+0x58/0x230 211959
hardirqs last  enabled at (211958): [8170f4d6] 
_raw_spin_unlock_irqrestore+0x36/0x70
hardirqs last disabled at (211959): [8170fba9] 
_raw_spin_lock_irqsave+0x29/0x70
softirqs last  enabled at (211952): [8108bab2] 
_local_bh_enable+0x22/0x50
softirqs last disabled at (211953): [8108cc15] irq_exit+0xc5/0xd0
swapper/3 Not tainted 3.16.0-rc1+ #35 Hewlett-Packard HP xw8400 
Workstation/0A08h, BIOS 786D5 v02.38 10/25/2010
 88047fcc3bd0
81706ef2 81c857e0
88047fcc3be0 810b4dba
88047fcc3c50 8170c79c
 81c85938
0046 0092
Call Trace:
IRQ
[81706ef2] dump_stack+0x4d/0x66
[810b4dba] __might_sleep+0xfa/0x130
[8170c79c] mutex_lock_nested+0x3c/0x3a0
[81270f5a] kernfs_notify+0x8a/0x140
[81572a49] bitmap_endwrite+0xb9/0x230
[a027a823] close_write+0x93/0xb0 [raid1]
[a027af89] r1_bio_write_done+0x29/0x50 [raid1]
[a027c98f] raid1_end_write_request+0xcf/0x220 [raid1]
[8132f0bb] bio_endio+0x5b/0xa0
[81335f60] blk_update_request+0x90/0x330
[8133621c] blk_update_bidi_request+0x1c/0x80
[8133656f] blk_end_bidi_request+0x1f/0x60
[813365c0] blk_end_request+0x10/0x20
[81497194] scsi_io_completion+0xf4/0x6e0
[8148c613] scsi_finish_command+0xb3/0x110
[81496fb7] scsi_softirq_done+0x137/0x160
[8133cfe0] blk_done_softirq+0x90/0xb0
[8108c705] __do_softirq+0x125/0x300
[8108cc15] irq_exit+0xc5/0xd0
[810439d5] smp_call_function_single_interrupt+0x35/0x40
[81711552] call_function_single_interrupt+0x72/0x80
EOI
[81053fa6] ?  native_safe_halt+0x6/0x10
[810d891d] ?  trace_hardirqs_on+0xd/0x10
[8101e0c4] default_idle+0x24/0xe0
[8101ea6f] arch_cpu_idle+0xf/0x20
[810d0d1b] cpu_startup_entry+0x2cb/0x450
[810fed4c] ?  clockevents_register_device+0xbc/0x120
[810442ee] start_secondary+0x1de/0x290
=

inconsistent lock state ]
3.16.0-rc1+ #35 Not tainted
-
inconsistent {SOFTIRQ-ON-W} - {IN-SOFTIRQ-W} usage.
swapper/3/0 [HC0[0]:SC1[1]:HE0:SE0] takes: (kernfs_mutex){+.?.+.}, at: 
[81270f5a] kernfs_notify+0x8a/0x140
{SOFTIRQ-ON-W} state was registered at:
 [810d8f66] __lock_acquire+0x316/0x1a60
 [810dae72] lock_acquire+0xa2/0x130
 [8170c7ca] mutex_lock_nested+0x6a/0x3a0
 [8127066f] kernfs_activate+0x1f/0xf0
 [81270a08] kernfs_create_root+0xe8/0x110
 [81d73567] sysfs_init+0x13/0x51
 [81d70d06] mnt_init+0x118/0x225
 [81d70923] vfs_caches_init+0x99/0x115
 [81d43022] start_kernel+0x3ec/0x443
 [81d425ee] x86_64_start_reservations+0x2a/0x2c
 [81d4272e] x86_64_start_kernel+0x13e/0x14d
211959 hardirqs last  enabled at (211958): [8170f4d6] 
_raw_spin_unlock_irqrestore+0x36/0x70
hardirqs last disabled at (211959): [8170fba9] 
_raw_spin_lock_irqsave+0x29/0x70
softirqs last  enabled at (211952): [8108bab2] 
_local_bh_enable+0x22/0x50
softirqs last disabled at (211953): [8108cc15] irq_exit+0xc5/0xd0

other info that might help us debug this:
Possible unsafe locking scenario:

  CPU0
  
 lock(kernfs_mutex);
 Interrupt
 lock(kernfs_mutex);

 *** DEADLOCK ***

1 lock held by swapper/3/0:
#0: ((bitmap-counts.lock)-rlock){..-...}, at: [815729e8] 
bitmap_endwrite+0x58/0x230
stack backtrace: swapper/3 Not tainted 3.16.0-rc1+ #35 Hewlett-Packard HP 
xw8400 Workstation/0A08h, BIOS 786D5 v02.38 10/25/2010
82638c50 88047fcc3a20
81706ef2 8804695edb40
88047fcc3a70 81702a16
0001 8804
8804 0006
8804695edb40 810d7c20
Call Trace:
IRQ
[81706ef2] dump_stack+0x4d/0x66
[81702a16] print_usage_bug+0x1f3/0x204
[810d7c20] ?  check_usage_backwards+0x140/0x140
[810d8622] mark_lock+0x222/0x2b0
[810d8f08] __lock_acquire+0x2b8/0x1a60
[8101779d] ?  show_trace_log_lvl+0x4d/0x60
[810163f4] ?  show_stack_log_lvl+0xa4/0x190
[810dae72] lock_acquire+0xa2/0x130
[81270f5a] ?  kernfs_notify+0x8a/0x140
[8170c7ca] mutex_lock_nested+0x6a/0x3a0