Regression caused by 073db4a51ee43ccb827f54a4261c0583b028d5ab

2015-10-22 Thread Felipe Balbi

Hi,

I just noticed that commit 073db4a51ee4 (mtd: fix: avoid race condition
when accessing mtd->usecount) caused a regression at least when removing
m25p80. Wonder if you guys would know of a quick fix, other than
reverting $commit in HEAD (yes, that makes the problem go away, but
regresses on what $commit tried to fix, of course).

More info about the regression follows, together with bisection log:

# modprobe -r m25p80
[   53.419251] 
[   53.420838] ==
[   53.427300] [ INFO: possible circular locking dependency detected ]
[   53.433865] 4.3.0-rc6 #96 Not tainted
[   53.437686] ---
[   53.444220] modprobe/372 is trying to acquire lock:
[   53.449320]  (>lock){+.+...}, at: [] 
del_mtd_blktrans_dev+0x80/0xdc
[   53.457271] 
[   53.457271] but task is already holding lock:
[   53.463372]  (mtd_table_mutex){+.+.+.}, at: [] 
del_mtd_device+0x18/0x100
[   53.471321] 
[   53.471321] which lock already depends on the new lock.
[   53.471321] 
[   53.479856] 
[   53.479856] the existing dependency chain (in reverse order) is:
[   53.487660] 
-> #1 (mtd_table_mutex){+.+.+.}:
[   53.492331][] blktrans_open+0x34/0x1a4
[   53.497879][] __blkdev_get+0xc4/0x3b0
[   53.503364][] blkdev_get+0x108/0x320
[   53.508743][] do_dentry_open+0x218/0x314
[   53.514496][] path_openat+0x4c0/0xf9c
[   53.519959][] do_filp_open+0x5c/0xc0
[   53.525336][] do_sys_open+0xfc/0x1cc
[   53.530716][] ret_fast_syscall+0x0/0x1c
[   53.536375] 
-> #0 (>lock){+.+...}:
[   53.540587][] mutex_lock_nested+0x38/0x3cc
[   53.546504][] del_mtd_blktrans_dev+0x80/0xdc
[   53.552606][] blktrans_notify_remove+0x7c/0x84
[   53.558891][] del_mtd_device+0x74/0x100
[   53.564544][] del_mtd_partitions+0x80/0xc8
[   53.570451][] mtd_device_unregister+0x24/0x48
[   53.576637][] spi_drv_remove+0x1c/0x34
[   53.582207][] __device_release_driver+0x88/0x114
[   53.588663][] device_release_driver+0x20/0x2c
[   53.594843][] bus_remove_device+0xd8/0x108
[   53.600748][] device_del+0x10c/0x210
[   53.606127][] device_unregister+0xc/0x20
[   53.611849][] __unregister+0x10/0x20
[   53.617211][] device_for_each_child+0x50/0x7c
[   53.623387][] spi_unregister_master+0x58/0x8c
[   53.629578][] release_nodes+0x15c/0x1c8
[   53.635223][] __device_release_driver+0x90/0x114
[   53.641689][] driver_detach+0xb4/0xb8
[   53.647147][] bus_remove_driver+0x4c/0xa0
[   53.652970][] SyS_delete_module+0x11c/0x1e4
[   53.658976][] ret_fast_syscall+0x0/0x1c
[   53.664621] 
[   53.664621] other info that might help us debug this:
[   53.664621] 
[   53.672979]  Possible unsafe locking scenario:
[   53.672979] 
[   53.679169]CPU0CPU1
[   53.683900]
[   53.688633]   lock(mtd_table_mutex);
[   53.692383]lock(>lock);
[   53.698306]lock(mtd_table_mutex);
[   53.704658]   lock(>lock);
[   53.707946] 
[   53.707946]  *** DEADLOCK ***
[   53.707946] 
[   53.714123] 5 locks held by modprobe/372:
[   53.718305]  #0:  (>mutex){..}, at: [] 
driver_detach+0x44/0xb8
[   53.726147]  #1:  (>mutex){..}, at: [] 
driver_detach+0x50/0xb8
[   53.733985]  #2:  (>mutex){..}, at: [] 
device_release_driver+0x18/0x2c
[   53.742541]  #3:  (mtd_partitions_mutex){+.+.+.}, at: [] 
del_mtd_partitions+0x1c/0xc8
[   53.751656]  #4:  (mtd_table_mutex){+.+.+.}, at: [] 
del_mtd_device+0x18/0x100
[   53.760048] 
[   53.760048] stack backtrace:
[   53.764591] CPU: 0 PID: 372 Comm: modprobe Not tainted 4.3.0-rc6 #96
[   53.771217] Hardware name: Generic AM43 (Flattened Device Tree)
[   53.777419] [] (unwind_backtrace) from [] 
(show_stack+0x10/0x14)
[   53.785511] [] (show_stack) from [] 
(dump_stack+0x84/0x9c)
[   53.793063] [] (dump_stack) from [] 
(print_circular_bug+0x1c8/0x30c)
[   53.801500] [] (print_circular_bug) from [] 
(__lock_acquire+0x1a48/0x1cd8)
[   53.810480] [] (__lock_acquire) from [] 
(lock_acquire+0xac/0x12c)
[   53.818649] [] (lock_acquire) from [] 
(mutex_lock_nested+0x38/0x3cc)
[   53.827103] [] (mutex_lock_nested) from [] 
(del_mtd_blktrans_dev+0x80/0xdc)
[   53.836199] [] (del_mtd_blktrans_dev) from [] 
(blktrans_notify_remove+0x7c/0x84)
[   53.845735] [] (blktrans_notify_remove) from [] 
(del_mtd_device+0x74/0x100)
[   53.854833] [] (del_mtd_device) from [] 
(del_mtd_partitions+0x80/0xc8)
[   53.863469] [] (del_mtd_partitions) from [] 
(mtd_device_unregister+0x24/0x48)
[   53.872733] [] (mtd_device_unregister) from [] 
(spi_drv_remove+0x1c/0x34)
[   53.881633] [] (spi_drv_remove) from [] 
(__device_release_driver+0x88/0x114)
[   53.890788] [] (__device_release_driver) from [] 
(device_release_driver+0x20/0x2c)
[   53.900483] [] 

Re: Regression caused by 073db4a51ee43ccb827f54a4261c0583b028d5ab

2015-10-22 Thread Brian Norris
Hi Felipe,

First of all, this is only a quick response. I could easily be missing
something obvious.

On Thu, Oct 22, 2015 at 02:01:02PM -0500, Felipe Balbi wrote:
> 
> Hi,
> 
> I just noticed that commit 073db4a51ee4 (mtd: fix: avoid race condition
> when accessing mtd->usecount) caused a regression at least when removing
> m25p80. Wonder if you guys would know of a quick fix, other than
> reverting $commit in HEAD (yes, that makes the problem go away, but
> regresses on what $commit tried to fix, of course).
> 
> More info about the regression follows, together with bisection log:

Not all deadlocks alleged by lockdep are true deadlocks. Are you able to
reproduce a real deadlock? (I know, it's not always easy to hit, so I'm
not discounting this based on lack of evidence.)

In particular, I think something like this warning was mentioned
previously, and I looked into it a bit but didn't have the time to
figure out for sure (and figure out how to squash the potential false
positive). Hence, I'm responding now, even if I'm not 100% sure. More
below.

> # modprobe -r m25p80
> [   53.419251] 
> [   53.420838] ==
> [   53.427300] [ INFO: possible circular locking dependency detected ]
> [   53.433865] 4.3.0-rc6 #96 Not tainted
> [   53.437686] ---
> [   53.444220] modprobe/372 is trying to acquire lock:
> [   53.449320]  (>lock){+.+...}, at: [] 
> del_mtd_blktrans_dev+0x80/0xdc
> [   53.457271] 
> [   53.457271] but task is already holding lock:
> [   53.463372]  (mtd_table_mutex){+.+.+.}, at: [] 
> del_mtd_device+0x18/0x100
> [   53.471321] 
> [   53.471321] which lock already depends on the new lock.
> [   53.471321] 
> [   53.479856] 
> [   53.479856] the existing dependency chain (in reverse order) is:
> [   53.487660] 
> -> #1 (mtd_table_mutex){+.+.+.}:
> [   53.492331][] blktrans_open+0x34/0x1a4
> [   53.497879][] __blkdev_get+0xc4/0x3b0
> [   53.503364][] blkdev_get+0x108/0x320
> [   53.508743][] do_dentry_open+0x218/0x314
> [   53.514496][] path_openat+0x4c0/0xf9c
> [   53.519959][] do_filp_open+0x5c/0xc0
> [   53.525336][] do_sys_open+0xfc/0x1cc
> [   53.530716][] ret_fast_syscall+0x0/0x1c
> [   53.536375] 
> -> #0 (>lock){+.+...}:
> [   53.540587][] mutex_lock_nested+0x38/0x3cc
> [   53.546504][] del_mtd_blktrans_dev+0x80/0xdc
> [   53.552606][] blktrans_notify_remove+0x7c/0x84
> [   53.558891][] del_mtd_device+0x74/0x100
> [   53.564544][] del_mtd_partitions+0x80/0xc8
> [   53.570451][] mtd_device_unregister+0x24/0x48
> [   53.576637][] spi_drv_remove+0x1c/0x34
> [   53.582207][] __device_release_driver+0x88/0x114
> [   53.588663][] device_release_driver+0x20/0x2c
> [   53.594843][] bus_remove_device+0xd8/0x108
> [   53.600748][] device_del+0x10c/0x210
> [   53.606127][] device_unregister+0xc/0x20
> [   53.611849][] __unregister+0x10/0x20
> [   53.617211][] device_for_each_child+0x50/0x7c
> [   53.623387][] spi_unregister_master+0x58/0x8c
> [   53.629578][] release_nodes+0x15c/0x1c8
> [   53.635223][] __device_release_driver+0x90/0x114
> [   53.641689][] driver_detach+0xb4/0xb8
> [   53.647147][] bus_remove_driver+0x4c/0xa0
> [   53.652970][] SyS_delete_module+0x11c/0x1e4
> [   53.658976][] ret_fast_syscall+0x0/0x1c

Actually, the more I look at this, the more I think the warning is
probably correct. I might have been thinking of a different false
positive.

Tentatively: I think the right fix might be to reverse the ordering of
acquire/release of the mtd_table_mutex and dev->lock throughout
mtd_blkdevs.c. See below.

> [   53.664621] 
> [   53.664621] other info that might help us debug this:
> [   53.664621] 
> [   53.672979]  Possible unsafe locking scenario:
> [   53.672979] 
> [   53.679169]CPU0CPU1
> [   53.683900]
> [   53.688633]   lock(mtd_table_mutex);
> [   53.692383]lock(>lock);
> [   53.698306]lock(mtd_table_mutex);
> [   53.704658]   lock(>lock);
> [   53.707946] 
> [   53.707946]  *** DEADLOCK ***
> [   53.707946] 
> [   53.714123] 5 locks held by modprobe/372:
> [   53.718305]  #0:  (>mutex){..}, at: [] 
> driver_detach+0x44/0xb8
> [   53.726147]  #1:  (>mutex){..}, at: [] 
> driver_detach+0x50/0xb8
> [   53.733985]  #2:  (>mutex){..}, at: [] 
> device_release_driver+0x18/0x2c
> [   53.742541]  #3:  (mtd_partitions_mutex){+.+.+.}, at: [] 
> del_mtd_partitions+0x1c/0xc8
> [   53.751656]  #4:  (mtd_table_mutex){+.+.+.}, at: [] 
> del_mtd_device+0x18/0x100
> [   53.760048] 
> [   53.760048] stack backtrace:
> [   53.764591] CPU: 0 PID: 372 Comm: modprobe Not tainted 4.3.0-rc6 #96
> [   53.771217] Hardware name: Generic 

Re: Regression caused by 073db4a51ee43ccb827f54a4261c0583b028d5ab

2015-10-22 Thread Felipe Balbi

(fixing Tony's address)

Felipe Balbi  writes:
> Hi,
>
> I just noticed that commit 073db4a51ee4 (mtd: fix: avoid race condition
> when accessing mtd->usecount) caused a regression at least when removing
> m25p80. Wonder if you guys would know of a quick fix, other than
> reverting $commit in HEAD (yes, that makes the problem go away, but
> regresses on what $commit tried to fix, of course).
>
> More info about the regression follows, together with bisection log:
>
> # modprobe -r m25p80
> [   53.419251] 
> [   53.420838] ==
> [   53.427300] [ INFO: possible circular locking dependency detected ]
> [   53.433865] 4.3.0-rc6 #96 Not tainted
> [   53.437686] ---
> [   53.444220] modprobe/372 is trying to acquire lock:
> [   53.449320]  (>lock){+.+...}, at: [] 
> del_mtd_blktrans_dev+0x80/0xdc
> [   53.457271] 
> [   53.457271] but task is already holding lock:
> [   53.463372]  (mtd_table_mutex){+.+.+.}, at: [] 
> del_mtd_device+0x18/0x100
> [   53.471321] 
> [   53.471321] which lock already depends on the new lock.
> [   53.471321] 
> [   53.479856] 
> [   53.479856] the existing dependency chain (in reverse order) is:
> [   53.487660] 
> -> #1 (mtd_table_mutex){+.+.+.}:
> [   53.492331][] blktrans_open+0x34/0x1a4
> [   53.497879][] __blkdev_get+0xc4/0x3b0
> [   53.503364][] blkdev_get+0x108/0x320
> [   53.508743][] do_dentry_open+0x218/0x314
> [   53.514496][] path_openat+0x4c0/0xf9c
> [   53.519959][] do_filp_open+0x5c/0xc0
> [   53.525336][] do_sys_open+0xfc/0x1cc
> [   53.530716][] ret_fast_syscall+0x0/0x1c
> [   53.536375] 
> -> #0 (>lock){+.+...}:
> [   53.540587][] mutex_lock_nested+0x38/0x3cc
> [   53.546504][] del_mtd_blktrans_dev+0x80/0xdc
> [   53.552606][] blktrans_notify_remove+0x7c/0x84
> [   53.558891][] del_mtd_device+0x74/0x100
> [   53.564544][] del_mtd_partitions+0x80/0xc8
> [   53.570451][] mtd_device_unregister+0x24/0x48
> [   53.576637][] spi_drv_remove+0x1c/0x34
> [   53.582207][] __device_release_driver+0x88/0x114
> [   53.588663][] device_release_driver+0x20/0x2c
> [   53.594843][] bus_remove_device+0xd8/0x108
> [   53.600748][] device_del+0x10c/0x210
> [   53.606127][] device_unregister+0xc/0x20
> [   53.611849][] __unregister+0x10/0x20
> [   53.617211][] device_for_each_child+0x50/0x7c
> [   53.623387][] spi_unregister_master+0x58/0x8c
> [   53.629578][] release_nodes+0x15c/0x1c8
> [   53.635223][] __device_release_driver+0x90/0x114
> [   53.641689][] driver_detach+0xb4/0xb8
> [   53.647147][] bus_remove_driver+0x4c/0xa0
> [   53.652970][] SyS_delete_module+0x11c/0x1e4
> [   53.658976][] ret_fast_syscall+0x0/0x1c
> [   53.664621] 
> [   53.664621] other info that might help us debug this:
> [   53.664621] 
> [   53.672979]  Possible unsafe locking scenario:
> [   53.672979] 
> [   53.679169]CPU0CPU1
> [   53.683900]
> [   53.688633]   lock(mtd_table_mutex);
> [   53.692383]lock(>lock);
> [   53.698306]lock(mtd_table_mutex);
> [   53.704658]   lock(>lock);
> [   53.707946] 
> [   53.707946]  *** DEADLOCK ***
> [   53.707946] 
> [   53.714123] 5 locks held by modprobe/372:
> [   53.718305]  #0:  (>mutex){..}, at: [] 
> driver_detach+0x44/0xb8
> [   53.726147]  #1:  (>mutex){..}, at: [] 
> driver_detach+0x50/0xb8
> [   53.733985]  #2:  (>mutex){..}, at: [] 
> device_release_driver+0x18/0x2c
> [   53.742541]  #3:  (mtd_partitions_mutex){+.+.+.}, at: [] 
> del_mtd_partitions+0x1c/0xc8
> [   53.751656]  #4:  (mtd_table_mutex){+.+.+.}, at: [] 
> del_mtd_device+0x18/0x100
> [   53.760048] 
> [   53.760048] stack backtrace:
> [   53.764591] CPU: 0 PID: 372 Comm: modprobe Not tainted 4.3.0-rc6 #96
> [   53.771217] Hardware name: Generic AM43 (Flattened Device Tree)
> [   53.777419] [] (unwind_backtrace) from [] 
> (show_stack+0x10/0x14)
> [   53.785511] [] (show_stack) from [] 
> (dump_stack+0x84/0x9c)
> [   53.793063] [] (dump_stack) from [] 
> (print_circular_bug+0x1c8/0x30c)
> [   53.801500] [] (print_circular_bug) from [] 
> (__lock_acquire+0x1a48/0x1cd8)
> [   53.810480] [] (__lock_acquire) from [] 
> (lock_acquire+0xac/0x12c)
> [   53.818649] [] (lock_acquire) from [] 
> (mutex_lock_nested+0x38/0x3cc)
> [   53.827103] [] (mutex_lock_nested) from [] 
> (del_mtd_blktrans_dev+0x80/0xdc)
> [   53.836199] [] (del_mtd_blktrans_dev) from [] 
> (blktrans_notify_remove+0x7c/0x84)
> [   53.845735] [] (blktrans_notify_remove) from [] 
> (del_mtd_device+0x74/0x100)
> [   53.854833] [] (del_mtd_device) from [] 
> (del_mtd_partitions+0x80/0xc8)
> [   53.863469] [] (del_mtd_partitions) from [] 
> 

Re: Regression caused by 073db4a51ee43ccb827f54a4261c0583b028d5ab

2015-10-22 Thread Felipe Balbi

Hi,

Brian Norris  writes:
> Hi Felipe,
>
> First of all, this is only a quick response. I could easily be missing
> something obvious.
>
> On Thu, Oct 22, 2015 at 02:01:02PM -0500, Felipe Balbi wrote:
>> 
>> Hi,
>> 
>> I just noticed that commit 073db4a51ee4 (mtd: fix: avoid race condition
>> when accessing mtd->usecount) caused a regression at least when removing
>> m25p80. Wonder if you guys would know of a quick fix, other than
>> reverting $commit in HEAD (yes, that makes the problem go away, but
>> regresses on what $commit tried to fix, of course).
>> 
>> More info about the regression follows, together with bisection log:
>
> Not all deadlocks alleged by lockdep are true deadlocks. Are you able to
> reproduce a real deadlock? (I know, it's not always easy to hit, so I'm
> not discounting this based on lack of evidence.)
>
> In particular, I think something like this warning was mentioned
> previously, and I looked into it a bit but didn't have the time to
> figure out for sure (and figure out how to squash the potential false
> positive). Hence, I'm responding now, even if I'm not 100% sure. More
> below.
>
>> # modprobe -r m25p80
>> [   53.419251] 
>> [   53.420838] ==
>> [   53.427300] [ INFO: possible circular locking dependency detected ]
>> [   53.433865] 4.3.0-rc6 #96 Not tainted
>> [   53.437686] ---
>> [   53.444220] modprobe/372 is trying to acquire lock:
>> [   53.449320]  (>lock){+.+...}, at: [] 
>> del_mtd_blktrans_dev+0x80/0xdc
>> [   53.457271] 
>> [   53.457271] but task is already holding lock:
>> [   53.463372]  (mtd_table_mutex){+.+.+.}, at: [] 
>> del_mtd_device+0x18/0x100
>> [   53.471321] 
>> [   53.471321] which lock already depends on the new lock.
>> [   53.471321] 
>> [   53.479856] 
>> [   53.479856] the existing dependency chain (in reverse order) is:
>> [   53.487660] 
>> -> #1 (mtd_table_mutex){+.+.+.}:
>> [   53.492331][] blktrans_open+0x34/0x1a4
>> [   53.497879][] __blkdev_get+0xc4/0x3b0
>> [   53.503364][] blkdev_get+0x108/0x320
>> [   53.508743][] do_dentry_open+0x218/0x314
>> [   53.514496][] path_openat+0x4c0/0xf9c
>> [   53.519959][] do_filp_open+0x5c/0xc0
>> [   53.525336][] do_sys_open+0xfc/0x1cc
>> [   53.530716][] ret_fast_syscall+0x0/0x1c
>> [   53.536375] 
>> -> #0 (>lock){+.+...}:
>> [   53.540587][] mutex_lock_nested+0x38/0x3cc
>> [   53.546504][] del_mtd_blktrans_dev+0x80/0xdc
>> [   53.552606][] blktrans_notify_remove+0x7c/0x84
>> [   53.558891][] del_mtd_device+0x74/0x100
>> [   53.564544][] del_mtd_partitions+0x80/0xc8
>> [   53.570451][] mtd_device_unregister+0x24/0x48
>> [   53.576637][] spi_drv_remove+0x1c/0x34
>> [   53.582207][] __device_release_driver+0x88/0x114
>> [   53.588663][] device_release_driver+0x20/0x2c
>> [   53.594843][] bus_remove_device+0xd8/0x108
>> [   53.600748][] device_del+0x10c/0x210
>> [   53.606127][] device_unregister+0xc/0x20
>> [   53.611849][] __unregister+0x10/0x20
>> [   53.617211][] device_for_each_child+0x50/0x7c
>> [   53.623387][] spi_unregister_master+0x58/0x8c
>> [   53.629578][] release_nodes+0x15c/0x1c8
>> [   53.635223][] __device_release_driver+0x90/0x114
>> [   53.641689][] driver_detach+0xb4/0xb8
>> [   53.647147][] bus_remove_driver+0x4c/0xa0
>> [   53.652970][] SyS_delete_module+0x11c/0x1e4
>> [   53.658976][] ret_fast_syscall+0x0/0x1c
>
> Actually, the more I look at this, the more I think the warning is
> probably correct. I might have been thinking of a different false
> positive.
>
> Tentatively: I think the right fix might be to reverse the ordering of
> acquire/release of the mtd_table_mutex and dev->lock throughout
> mtd_blkdevs.c. See below.
>
>> [   53.664621] 
>> [   53.664621] other info that might help us debug this:
>> [   53.664621] 
>> [   53.672979]  Possible unsafe locking scenario:
>> [   53.672979] 
>> [   53.679169]CPU0CPU1
>> [   53.683900]
>> [   53.688633]   lock(mtd_table_mutex);
>> [   53.692383]lock(>lock);
>> [   53.698306]lock(mtd_table_mutex);
>> [   53.704658]   lock(>lock);
>> [   53.707946] 
>> [   53.707946]  *** DEADLOCK ***
>> [   53.707946] 
>> [   53.714123] 5 locks held by modprobe/372:
>> [   53.718305]  #0:  (>mutex){..}, at: [] 
>> driver_detach+0x44/0xb8
>> [   53.726147]  #1:  (>mutex){..}, at: [] 
>> driver_detach+0x50/0xb8
>> [   53.733985]  #2:  (>mutex){..}, at: [] 
>> device_release_driver+0x18/0x2c
>> [   53.742541]  #3:  (mtd_partitions_mutex){+.+.+.}, at: [] 
>> del_mtd_partitions+0x1c/0xc8
>> [   53.751656]  #4:  (mtd_table_mutex){+.+.+.}, at: [] 
>>