Regression caused by 073db4a51ee43ccb827f54a4261c0583b028d5ab
Hi, I just noticed that commit 073db4a51ee4 (mtd: fix: avoid race condition when accessing mtd->usecount) caused a regression at least when removing m25p80. Wonder if you guys would know of a quick fix, other than reverting $commit in HEAD (yes, that makes the problem go away, but regresses on what $commit tried to fix, of course). More info about the regression follows, together with bisection log: # modprobe -r m25p80 [ 53.419251] [ 53.420838] == [ 53.427300] [ INFO: possible circular locking dependency detected ] [ 53.433865] 4.3.0-rc6 #96 Not tainted [ 53.437686] --- [ 53.444220] modprobe/372 is trying to acquire lock: [ 53.449320] (>lock){+.+...}, at: [] del_mtd_blktrans_dev+0x80/0xdc [ 53.457271] [ 53.457271] but task is already holding lock: [ 53.463372] (mtd_table_mutex){+.+.+.}, at: [] del_mtd_device+0x18/0x100 [ 53.471321] [ 53.471321] which lock already depends on the new lock. [ 53.471321] [ 53.479856] [ 53.479856] the existing dependency chain (in reverse order) is: [ 53.487660] -> #1 (mtd_table_mutex){+.+.+.}: [ 53.492331][] blktrans_open+0x34/0x1a4 [ 53.497879][] __blkdev_get+0xc4/0x3b0 [ 53.503364][] blkdev_get+0x108/0x320 [ 53.508743][] do_dentry_open+0x218/0x314 [ 53.514496][] path_openat+0x4c0/0xf9c [ 53.519959][] do_filp_open+0x5c/0xc0 [ 53.525336][] do_sys_open+0xfc/0x1cc [ 53.530716][] ret_fast_syscall+0x0/0x1c [ 53.536375] -> #0 (>lock){+.+...}: [ 53.540587][] mutex_lock_nested+0x38/0x3cc [ 53.546504][] del_mtd_blktrans_dev+0x80/0xdc [ 53.552606][] blktrans_notify_remove+0x7c/0x84 [ 53.558891][] del_mtd_device+0x74/0x100 [ 53.564544][] del_mtd_partitions+0x80/0xc8 [ 53.570451][] mtd_device_unregister+0x24/0x48 [ 53.576637][] spi_drv_remove+0x1c/0x34 [ 53.582207][] __device_release_driver+0x88/0x114 [ 53.588663][] device_release_driver+0x20/0x2c [ 53.594843][] bus_remove_device+0xd8/0x108 [ 53.600748][] device_del+0x10c/0x210 [ 53.606127][] device_unregister+0xc/0x20 [ 53.611849][] __unregister+0x10/0x20 [ 53.617211][] device_for_each_child+0x50/0x7c [ 53.623387][] spi_unregister_master+0x58/0x8c [ 53.629578][] release_nodes+0x15c/0x1c8 [ 53.635223][] __device_release_driver+0x90/0x114 [ 53.641689][] driver_detach+0xb4/0xb8 [ 53.647147][] bus_remove_driver+0x4c/0xa0 [ 53.652970][] SyS_delete_module+0x11c/0x1e4 [ 53.658976][] ret_fast_syscall+0x0/0x1c [ 53.664621] [ 53.664621] other info that might help us debug this: [ 53.664621] [ 53.672979] Possible unsafe locking scenario: [ 53.672979] [ 53.679169]CPU0CPU1 [ 53.683900] [ 53.688633] lock(mtd_table_mutex); [ 53.692383]lock(>lock); [ 53.698306]lock(mtd_table_mutex); [ 53.704658] lock(>lock); [ 53.707946] [ 53.707946] *** DEADLOCK *** [ 53.707946] [ 53.714123] 5 locks held by modprobe/372: [ 53.718305] #0: (>mutex){..}, at: [] driver_detach+0x44/0xb8 [ 53.726147] #1: (>mutex){..}, at: [] driver_detach+0x50/0xb8 [ 53.733985] #2: (>mutex){..}, at: [] device_release_driver+0x18/0x2c [ 53.742541] #3: (mtd_partitions_mutex){+.+.+.}, at: [] del_mtd_partitions+0x1c/0xc8 [ 53.751656] #4: (mtd_table_mutex){+.+.+.}, at: [] del_mtd_device+0x18/0x100 [ 53.760048] [ 53.760048] stack backtrace: [ 53.764591] CPU: 0 PID: 372 Comm: modprobe Not tainted 4.3.0-rc6 #96 [ 53.771217] Hardware name: Generic AM43 (Flattened Device Tree) [ 53.777419] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [ 53.785511] [] (show_stack) from [] (dump_stack+0x84/0x9c) [ 53.793063] [] (dump_stack) from [] (print_circular_bug+0x1c8/0x30c) [ 53.801500] [] (print_circular_bug) from [] (__lock_acquire+0x1a48/0x1cd8) [ 53.810480] [] (__lock_acquire) from [] (lock_acquire+0xac/0x12c) [ 53.818649] [] (lock_acquire) from [] (mutex_lock_nested+0x38/0x3cc) [ 53.827103] [] (mutex_lock_nested) from [] (del_mtd_blktrans_dev+0x80/0xdc) [ 53.836199] [] (del_mtd_blktrans_dev) from [] (blktrans_notify_remove+0x7c/0x84) [ 53.845735] [] (blktrans_notify_remove) from [] (del_mtd_device+0x74/0x100) [ 53.854833] [] (del_mtd_device) from [] (del_mtd_partitions+0x80/0xc8) [ 53.863469] [] (del_mtd_partitions) from [] (mtd_device_unregister+0x24/0x48) [ 53.872733] [] (mtd_device_unregister) from [] (spi_drv_remove+0x1c/0x34) [ 53.881633] [] (spi_drv_remove) from [] (__device_release_driver+0x88/0x114) [ 53.890788] [] (__device_release_driver) from [] (device_release_driver+0x20/0x2c) [ 53.900483] []
Re: Regression caused by 073db4a51ee43ccb827f54a4261c0583b028d5ab
Hi Felipe, First of all, this is only a quick response. I could easily be missing something obvious. On Thu, Oct 22, 2015 at 02:01:02PM -0500, Felipe Balbi wrote: > > Hi, > > I just noticed that commit 073db4a51ee4 (mtd: fix: avoid race condition > when accessing mtd->usecount) caused a regression at least when removing > m25p80. Wonder if you guys would know of a quick fix, other than > reverting $commit in HEAD (yes, that makes the problem go away, but > regresses on what $commit tried to fix, of course). > > More info about the regression follows, together with bisection log: Not all deadlocks alleged by lockdep are true deadlocks. Are you able to reproduce a real deadlock? (I know, it's not always easy to hit, so I'm not discounting this based on lack of evidence.) In particular, I think something like this warning was mentioned previously, and I looked into it a bit but didn't have the time to figure out for sure (and figure out how to squash the potential false positive). Hence, I'm responding now, even if I'm not 100% sure. More below. > # modprobe -r m25p80 > [ 53.419251] > [ 53.420838] == > [ 53.427300] [ INFO: possible circular locking dependency detected ] > [ 53.433865] 4.3.0-rc6 #96 Not tainted > [ 53.437686] --- > [ 53.444220] modprobe/372 is trying to acquire lock: > [ 53.449320] (>lock){+.+...}, at: [] > del_mtd_blktrans_dev+0x80/0xdc > [ 53.457271] > [ 53.457271] but task is already holding lock: > [ 53.463372] (mtd_table_mutex){+.+.+.}, at: [] > del_mtd_device+0x18/0x100 > [ 53.471321] > [ 53.471321] which lock already depends on the new lock. > [ 53.471321] > [ 53.479856] > [ 53.479856] the existing dependency chain (in reverse order) is: > [ 53.487660] > -> #1 (mtd_table_mutex){+.+.+.}: > [ 53.492331][] blktrans_open+0x34/0x1a4 > [ 53.497879][] __blkdev_get+0xc4/0x3b0 > [ 53.503364][] blkdev_get+0x108/0x320 > [ 53.508743][] do_dentry_open+0x218/0x314 > [ 53.514496][] path_openat+0x4c0/0xf9c > [ 53.519959][] do_filp_open+0x5c/0xc0 > [ 53.525336][] do_sys_open+0xfc/0x1cc > [ 53.530716][] ret_fast_syscall+0x0/0x1c > [ 53.536375] > -> #0 (>lock){+.+...}: > [ 53.540587][] mutex_lock_nested+0x38/0x3cc > [ 53.546504][] del_mtd_blktrans_dev+0x80/0xdc > [ 53.552606][] blktrans_notify_remove+0x7c/0x84 > [ 53.558891][] del_mtd_device+0x74/0x100 > [ 53.564544][] del_mtd_partitions+0x80/0xc8 > [ 53.570451][] mtd_device_unregister+0x24/0x48 > [ 53.576637][] spi_drv_remove+0x1c/0x34 > [ 53.582207][] __device_release_driver+0x88/0x114 > [ 53.588663][] device_release_driver+0x20/0x2c > [ 53.594843][] bus_remove_device+0xd8/0x108 > [ 53.600748][] device_del+0x10c/0x210 > [ 53.606127][] device_unregister+0xc/0x20 > [ 53.611849][] __unregister+0x10/0x20 > [ 53.617211][] device_for_each_child+0x50/0x7c > [ 53.623387][] spi_unregister_master+0x58/0x8c > [ 53.629578][] release_nodes+0x15c/0x1c8 > [ 53.635223][] __device_release_driver+0x90/0x114 > [ 53.641689][] driver_detach+0xb4/0xb8 > [ 53.647147][] bus_remove_driver+0x4c/0xa0 > [ 53.652970][] SyS_delete_module+0x11c/0x1e4 > [ 53.658976][] ret_fast_syscall+0x0/0x1c Actually, the more I look at this, the more I think the warning is probably correct. I might have been thinking of a different false positive. Tentatively: I think the right fix might be to reverse the ordering of acquire/release of the mtd_table_mutex and dev->lock throughout mtd_blkdevs.c. See below. > [ 53.664621] > [ 53.664621] other info that might help us debug this: > [ 53.664621] > [ 53.672979] Possible unsafe locking scenario: > [ 53.672979] > [ 53.679169]CPU0CPU1 > [ 53.683900] > [ 53.688633] lock(mtd_table_mutex); > [ 53.692383]lock(>lock); > [ 53.698306]lock(mtd_table_mutex); > [ 53.704658] lock(>lock); > [ 53.707946] > [ 53.707946] *** DEADLOCK *** > [ 53.707946] > [ 53.714123] 5 locks held by modprobe/372: > [ 53.718305] #0: (>mutex){..}, at: [] > driver_detach+0x44/0xb8 > [ 53.726147] #1: (>mutex){..}, at: [] > driver_detach+0x50/0xb8 > [ 53.733985] #2: (>mutex){..}, at: [] > device_release_driver+0x18/0x2c > [ 53.742541] #3: (mtd_partitions_mutex){+.+.+.}, at: [] > del_mtd_partitions+0x1c/0xc8 > [ 53.751656] #4: (mtd_table_mutex){+.+.+.}, at: [] > del_mtd_device+0x18/0x100 > [ 53.760048] > [ 53.760048] stack backtrace: > [ 53.764591] CPU: 0 PID: 372 Comm: modprobe Not tainted 4.3.0-rc6 #96 > [ 53.771217] Hardware name: Generic
Re: Regression caused by 073db4a51ee43ccb827f54a4261c0583b028d5ab
(fixing Tony's address) Felipe Balbiwrites: > Hi, > > I just noticed that commit 073db4a51ee4 (mtd: fix: avoid race condition > when accessing mtd->usecount) caused a regression at least when removing > m25p80. Wonder if you guys would know of a quick fix, other than > reverting $commit in HEAD (yes, that makes the problem go away, but > regresses on what $commit tried to fix, of course). > > More info about the regression follows, together with bisection log: > > # modprobe -r m25p80 > [ 53.419251] > [ 53.420838] == > [ 53.427300] [ INFO: possible circular locking dependency detected ] > [ 53.433865] 4.3.0-rc6 #96 Not tainted > [ 53.437686] --- > [ 53.444220] modprobe/372 is trying to acquire lock: > [ 53.449320] (>lock){+.+...}, at: [] > del_mtd_blktrans_dev+0x80/0xdc > [ 53.457271] > [ 53.457271] but task is already holding lock: > [ 53.463372] (mtd_table_mutex){+.+.+.}, at: [] > del_mtd_device+0x18/0x100 > [ 53.471321] > [ 53.471321] which lock already depends on the new lock. > [ 53.471321] > [ 53.479856] > [ 53.479856] the existing dependency chain (in reverse order) is: > [ 53.487660] > -> #1 (mtd_table_mutex){+.+.+.}: > [ 53.492331][] blktrans_open+0x34/0x1a4 > [ 53.497879][] __blkdev_get+0xc4/0x3b0 > [ 53.503364][] blkdev_get+0x108/0x320 > [ 53.508743][] do_dentry_open+0x218/0x314 > [ 53.514496][] path_openat+0x4c0/0xf9c > [ 53.519959][] do_filp_open+0x5c/0xc0 > [ 53.525336][] do_sys_open+0xfc/0x1cc > [ 53.530716][] ret_fast_syscall+0x0/0x1c > [ 53.536375] > -> #0 (>lock){+.+...}: > [ 53.540587][] mutex_lock_nested+0x38/0x3cc > [ 53.546504][] del_mtd_blktrans_dev+0x80/0xdc > [ 53.552606][] blktrans_notify_remove+0x7c/0x84 > [ 53.558891][] del_mtd_device+0x74/0x100 > [ 53.564544][] del_mtd_partitions+0x80/0xc8 > [ 53.570451][] mtd_device_unregister+0x24/0x48 > [ 53.576637][] spi_drv_remove+0x1c/0x34 > [ 53.582207][] __device_release_driver+0x88/0x114 > [ 53.588663][] device_release_driver+0x20/0x2c > [ 53.594843][] bus_remove_device+0xd8/0x108 > [ 53.600748][] device_del+0x10c/0x210 > [ 53.606127][] device_unregister+0xc/0x20 > [ 53.611849][] __unregister+0x10/0x20 > [ 53.617211][] device_for_each_child+0x50/0x7c > [ 53.623387][] spi_unregister_master+0x58/0x8c > [ 53.629578][] release_nodes+0x15c/0x1c8 > [ 53.635223][] __device_release_driver+0x90/0x114 > [ 53.641689][] driver_detach+0xb4/0xb8 > [ 53.647147][] bus_remove_driver+0x4c/0xa0 > [ 53.652970][] SyS_delete_module+0x11c/0x1e4 > [ 53.658976][] ret_fast_syscall+0x0/0x1c > [ 53.664621] > [ 53.664621] other info that might help us debug this: > [ 53.664621] > [ 53.672979] Possible unsafe locking scenario: > [ 53.672979] > [ 53.679169]CPU0CPU1 > [ 53.683900] > [ 53.688633] lock(mtd_table_mutex); > [ 53.692383]lock(>lock); > [ 53.698306]lock(mtd_table_mutex); > [ 53.704658] lock(>lock); > [ 53.707946] > [ 53.707946] *** DEADLOCK *** > [ 53.707946] > [ 53.714123] 5 locks held by modprobe/372: > [ 53.718305] #0: (>mutex){..}, at: [] > driver_detach+0x44/0xb8 > [ 53.726147] #1: (>mutex){..}, at: [] > driver_detach+0x50/0xb8 > [ 53.733985] #2: (>mutex){..}, at: [] > device_release_driver+0x18/0x2c > [ 53.742541] #3: (mtd_partitions_mutex){+.+.+.}, at: [] > del_mtd_partitions+0x1c/0xc8 > [ 53.751656] #4: (mtd_table_mutex){+.+.+.}, at: [] > del_mtd_device+0x18/0x100 > [ 53.760048] > [ 53.760048] stack backtrace: > [ 53.764591] CPU: 0 PID: 372 Comm: modprobe Not tainted 4.3.0-rc6 #96 > [ 53.771217] Hardware name: Generic AM43 (Flattened Device Tree) > [ 53.777419] [] (unwind_backtrace) from [] > (show_stack+0x10/0x14) > [ 53.785511] [] (show_stack) from [] > (dump_stack+0x84/0x9c) > [ 53.793063] [] (dump_stack) from [] > (print_circular_bug+0x1c8/0x30c) > [ 53.801500] [] (print_circular_bug) from [] > (__lock_acquire+0x1a48/0x1cd8) > [ 53.810480] [] (__lock_acquire) from [] > (lock_acquire+0xac/0x12c) > [ 53.818649] [] (lock_acquire) from [] > (mutex_lock_nested+0x38/0x3cc) > [ 53.827103] [] (mutex_lock_nested) from [] > (del_mtd_blktrans_dev+0x80/0xdc) > [ 53.836199] [] (del_mtd_blktrans_dev) from [] > (blktrans_notify_remove+0x7c/0x84) > [ 53.845735] [] (blktrans_notify_remove) from [] > (del_mtd_device+0x74/0x100) > [ 53.854833] [] (del_mtd_device) from [] > (del_mtd_partitions+0x80/0xc8) > [ 53.863469] [] (del_mtd_partitions) from [] >
Re: Regression caused by 073db4a51ee43ccb827f54a4261c0583b028d5ab
Hi, Brian Norriswrites: > Hi Felipe, > > First of all, this is only a quick response. I could easily be missing > something obvious. > > On Thu, Oct 22, 2015 at 02:01:02PM -0500, Felipe Balbi wrote: >> >> Hi, >> >> I just noticed that commit 073db4a51ee4 (mtd: fix: avoid race condition >> when accessing mtd->usecount) caused a regression at least when removing >> m25p80. Wonder if you guys would know of a quick fix, other than >> reverting $commit in HEAD (yes, that makes the problem go away, but >> regresses on what $commit tried to fix, of course). >> >> More info about the regression follows, together with bisection log: > > Not all deadlocks alleged by lockdep are true deadlocks. Are you able to > reproduce a real deadlock? (I know, it's not always easy to hit, so I'm > not discounting this based on lack of evidence.) > > In particular, I think something like this warning was mentioned > previously, and I looked into it a bit but didn't have the time to > figure out for sure (and figure out how to squash the potential false > positive). Hence, I'm responding now, even if I'm not 100% sure. More > below. > >> # modprobe -r m25p80 >> [ 53.419251] >> [ 53.420838] == >> [ 53.427300] [ INFO: possible circular locking dependency detected ] >> [ 53.433865] 4.3.0-rc6 #96 Not tainted >> [ 53.437686] --- >> [ 53.444220] modprobe/372 is trying to acquire lock: >> [ 53.449320] (>lock){+.+...}, at: [] >> del_mtd_blktrans_dev+0x80/0xdc >> [ 53.457271] >> [ 53.457271] but task is already holding lock: >> [ 53.463372] (mtd_table_mutex){+.+.+.}, at: [] >> del_mtd_device+0x18/0x100 >> [ 53.471321] >> [ 53.471321] which lock already depends on the new lock. >> [ 53.471321] >> [ 53.479856] >> [ 53.479856] the existing dependency chain (in reverse order) is: >> [ 53.487660] >> -> #1 (mtd_table_mutex){+.+.+.}: >> [ 53.492331][] blktrans_open+0x34/0x1a4 >> [ 53.497879][] __blkdev_get+0xc4/0x3b0 >> [ 53.503364][] blkdev_get+0x108/0x320 >> [ 53.508743][] do_dentry_open+0x218/0x314 >> [ 53.514496][] path_openat+0x4c0/0xf9c >> [ 53.519959][] do_filp_open+0x5c/0xc0 >> [ 53.525336][] do_sys_open+0xfc/0x1cc >> [ 53.530716][] ret_fast_syscall+0x0/0x1c >> [ 53.536375] >> -> #0 (>lock){+.+...}: >> [ 53.540587][] mutex_lock_nested+0x38/0x3cc >> [ 53.546504][] del_mtd_blktrans_dev+0x80/0xdc >> [ 53.552606][] blktrans_notify_remove+0x7c/0x84 >> [ 53.558891][] del_mtd_device+0x74/0x100 >> [ 53.564544][] del_mtd_partitions+0x80/0xc8 >> [ 53.570451][] mtd_device_unregister+0x24/0x48 >> [ 53.576637][] spi_drv_remove+0x1c/0x34 >> [ 53.582207][] __device_release_driver+0x88/0x114 >> [ 53.588663][] device_release_driver+0x20/0x2c >> [ 53.594843][] bus_remove_device+0xd8/0x108 >> [ 53.600748][] device_del+0x10c/0x210 >> [ 53.606127][] device_unregister+0xc/0x20 >> [ 53.611849][] __unregister+0x10/0x20 >> [ 53.617211][] device_for_each_child+0x50/0x7c >> [ 53.623387][] spi_unregister_master+0x58/0x8c >> [ 53.629578][] release_nodes+0x15c/0x1c8 >> [ 53.635223][] __device_release_driver+0x90/0x114 >> [ 53.641689][] driver_detach+0xb4/0xb8 >> [ 53.647147][] bus_remove_driver+0x4c/0xa0 >> [ 53.652970][] SyS_delete_module+0x11c/0x1e4 >> [ 53.658976][] ret_fast_syscall+0x0/0x1c > > Actually, the more I look at this, the more I think the warning is > probably correct. I might have been thinking of a different false > positive. > > Tentatively: I think the right fix might be to reverse the ordering of > acquire/release of the mtd_table_mutex and dev->lock throughout > mtd_blkdevs.c. See below. > >> [ 53.664621] >> [ 53.664621] other info that might help us debug this: >> [ 53.664621] >> [ 53.672979] Possible unsafe locking scenario: >> [ 53.672979] >> [ 53.679169]CPU0CPU1 >> [ 53.683900] >> [ 53.688633] lock(mtd_table_mutex); >> [ 53.692383]lock(>lock); >> [ 53.698306]lock(mtd_table_mutex); >> [ 53.704658] lock(>lock); >> [ 53.707946] >> [ 53.707946] *** DEADLOCK *** >> [ 53.707946] >> [ 53.714123] 5 locks held by modprobe/372: >> [ 53.718305] #0: (>mutex){..}, at: [] >> driver_detach+0x44/0xb8 >> [ 53.726147] #1: (>mutex){..}, at: [] >> driver_detach+0x50/0xb8 >> [ 53.733985] #2: (>mutex){..}, at: [] >> device_release_driver+0x18/0x2c >> [ 53.742541] #3: (mtd_partitions_mutex){+.+.+.}, at: [] >> del_mtd_partitions+0x1c/0xc8 >> [ 53.751656] #4: (mtd_table_mutex){+.+.+.}, at: [] >>