Re: possible deadlock shown by CONFIG_PROVE_LOCKING
Carlos Carvalho wrote: Lachlan McIlroy ([EMAIL PROTECTED]) wrote on 25 January 2008 13:00: Carlos Carvalho wrote: I compiled the kernel with Ingo's CONFIG_PROVE_LOCKING and got the below at boot. Is it a problem? It was a problem - it has since been fixed in 2.6.23. Patch is attached in case you're interested. [patch removed] Thanks. I applied it to 22.16 without rejects but then the machine doesn't even mount the partition: Filesystem md0: Disabling barriers, not supported by the underlying device XFS mounting filesystem md0 Unable to handle kernel paging request at 8110e325e8d0 RIP: The patch shouldn't have caused that. Did you keep a stack trace from this? Then I ran 23.14, no errors. However the machine locks hard a few seconds after we start the heavy IO tests... Not good either - again, any stack traces from the stuck threads? So we used .24. Now the machine stood some 16h of heavy IO load. Good! However we got the log below after about 15 minutes of running the test. It didn't repeat. Is it still a problem? If it didn't cause a deadlock then it's probably not an issue. We probably just need some more lockdep annotations to tell lockdep to ignore this locking scenario. I can't see where we acquire nested qh_lock's in xfs_qm_dqget(). It might actually be caused by acquiring the xfs_qm_freelist lock in xfs_qm_dqget while holding the hash lock because both locks are initialised with the same stack and lockdep will think they are locks for the same object type. Jan 26 19:25:43 mumm kernel: Filesystem md0: Disabling barriers, not supported by the underlying device Jan 26 19:25:43 mumm kernel: XFS mounting filesystem md0 Jan 26 19:25:43 mumm kernel: Ending clean XFS mount for filesystem: md0 15min later... Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: = Jan 26 19:41:40 mumm kernel: [ INFO: possible recursive locking detected ] Jan 26 19:41:40 mumm kernel: 2.6.24 #3 Jan 26 19:41:40 mumm kernel: - Jan 26 19:41:40 mumm kernel: rm/1589 is trying to acquire lock: Jan 26 19:41:40 mumm kernel: (list-qh_lock){--..}, at: [802c58e2] xfs_qm_dqget+0x297/0x355 Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: but task is already holding lock: Jan 26 19:41:40 mumm kernel: (list-qh_lock){--..}, at: [802c5841] xfs_qm_dqget+0x1f6/0x355 Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: other info that might help us debug this: Jan 26 19:41:40 mumm kernel: 4 locks held by rm/1589: Jan 26 19:41:40 mumm kernel: #0: (type-i_mutex_dir_key#2/1){--..}, at: [8027c400] do_unlinkat+0x6b/0x133 Jan 26 19:41:40 mumm kernel: #1: (sb-s_type-i_mutex_key#2){--..}, at: [8027a59d] vfs_unlink+0x42/0x8c Jan 26 19:41:40 mumm kernel: #2: ((ip-i_lock)-mr_lock){}, at: [802f4093] xfs_ilock+0x63/0x8d Jan 26 19:41:40 mumm kernel: #3: (list-qh_lock){--..}, at: [802c5841] xfs_qm_dqget+0x1f6/0x355 Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: stack backtrace: Jan 26 19:41:40 mumm kernel: Pid: 1589, comm: rm Not tainted 2.6.24 #3 Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: Call Trace: Jan 26 19:41:40 mumm kernel: [802446b2] __lock_acquire+0x1b7/0xcd0 Jan 26 19:41:40 mumm kernel: [802c58e2] xfs_qm_dqget+0x297/0x355 Jan 26 19:41:40 mumm kernel: [802455f4] lock_acquire+0x51/0x6c Jan 26 19:41:40 mumm kernel: [802c58e2] xfs_qm_dqget+0x297/0x355 Jan 26 19:41:40 mumm kernel: [80241ef9] debug_mutex_lock_common+0x16/0x23 Jan 26 19:41:40 mumm kernel: other info that might help us debug this: Jan 26 19:41:40 mumm kernel: 4 locks held by rm/1589: Jan 26 19:41:40 mumm kernel: #0: (type-i_mutex_dir_key#2/1){--..}, at: [8027c400] do_unlinkat+0x6b/0x133 Jan 26 19:41:40 mumm kernel: #1: (sb-s_type-i_mutex_key#2){--..}, at: [8027a59d] vfs_unlink+0x42/0x8c Jan 26 19:41:40 mumm kernel: #2: ((ip-i_lock)-mr_lock){}, at: [802f4093] xfs_ilock+0x63/0x8d Jan 26 19:41:40 mumm kernel: #3: (list-qh_lock){--..}, at: [802c5841] xfs_qm_dqget+0x1f6/0x355 Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: stack backtrace: Jan 26 19:41:40 mumm kernel: Pid: 1589, comm: rm Not tainted 2.6.24 #3 Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: Call Trace: Jan 26 19:41:40 mumm kernel: [802446b2] __lock_acquire+0x1b7/0xcd0 Jan 26 19:41:40 mumm kernel: [802c58e2] xfs_qm_dqget+0x297/0x355 Jan 26 19:41:40 mumm kernel: [802455f4] lock_acquire+0x51/0x6c Jan 26 19:41:40 mumm kernel: [802c58e2] xfs_qm_dqget+0x297/0x355 Jan 26 19:41:40 mumm kernel: [80241ef9] debug_mutex_lock_common+0x16/0x23 Jan 26 19:41:40 mumm kernel: [803f8c7f] mutex_lock_nested+0xec/0x286 Jan 26 19:41:40 mumm kernel: [802c58e2] xfs_qm_dqget+0x297/0x355 Jan 26 19:41:40 mumm kernel: [802c8c2c]
Re: possible deadlock shown by CONFIG_PROVE_LOCKING
Lachlan McIlroy ([EMAIL PROTECTED]) wrote on 25 January 2008 13:00: Carlos Carvalho wrote: I compiled the kernel with Ingo's CONFIG_PROVE_LOCKING and got the below at boot. Is it a problem? It was a problem - it has since been fixed in 2.6.23. Patch is attached in case you're interested. [patch removed] Thanks. I applied it to 22.16 without rejects but then the machine doesn't even mount the partition: Filesystem md0: Disabling barriers, not supported by the underlying device XFS mounting filesystem md0 Unable to handle kernel paging request at 8110e325e8d0 RIP: Then I ran 23.14, no errors. However the machine locks hard a few seconds after we start the heavy IO tests... So we used .24. Now the machine stood some 16h of heavy IO load. Good! However we got the log below after about 15 minutes of running the test. It didn't repeat. Is it still a problem? Jan 26 19:25:43 mumm kernel: Filesystem md0: Disabling barriers, not supported by the underlying device Jan 26 19:25:43 mumm kernel: XFS mounting filesystem md0 Jan 26 19:25:43 mumm kernel: Ending clean XFS mount for filesystem: md0 15min later... Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: = Jan 26 19:41:40 mumm kernel: [ INFO: possible recursive locking detected ] Jan 26 19:41:40 mumm kernel: 2.6.24 #3 Jan 26 19:41:40 mumm kernel: - Jan 26 19:41:40 mumm kernel: rm/1589 is trying to acquire lock: Jan 26 19:41:40 mumm kernel: (list-qh_lock){--..}, at: [802c58e2] xfs_qm_dqget+0x297/0x355 Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: but task is already holding lock: Jan 26 19:41:40 mumm kernel: (list-qh_lock){--..}, at: [802c5841] xfs_qm_dqget+0x1f6/0x355 Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: other info that might help us debug this: Jan 26 19:41:40 mumm kernel: 4 locks held by rm/1589: Jan 26 19:41:40 mumm kernel: #0: (type-i_mutex_dir_key#2/1){--..}, at: [8027c400] do_unlinkat+0x6b/0x133 Jan 26 19:41:40 mumm kernel: #1: (sb-s_type-i_mutex_key#2){--..}, at: [8027a59d] vfs_unlink+0x42/0x8c Jan 26 19:41:40 mumm kernel: #2: ((ip-i_lock)-mr_lock){}, at: [802f4093] xfs_ilock+0x63/0x8d Jan 26 19:41:40 mumm kernel: #3: (list-qh_lock){--..}, at: [802c5841] xfs_qm_dqget+0x1f6/0x355 Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: stack backtrace: Jan 26 19:41:40 mumm kernel: Pid: 1589, comm: rm Not tainted 2.6.24 #3 Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: Call Trace: Jan 26 19:41:40 mumm kernel: [802446b2] __lock_acquire+0x1b7/0xcd0 Jan 26 19:41:40 mumm kernel: [802c58e2] xfs_qm_dqget+0x297/0x355 Jan 26 19:41:40 mumm kernel: [802455f4] lock_acquire+0x51/0x6c Jan 26 19:41:40 mumm kernel: [802c58e2] xfs_qm_dqget+0x297/0x355 Jan 26 19:41:40 mumm kernel: [80241ef9] debug_mutex_lock_common+0x16/0x23 Jan 26 19:41:40 mumm kernel: other info that might help us debug this: Jan 26 19:41:40 mumm kernel: 4 locks held by rm/1589: Jan 26 19:41:40 mumm kernel: #0: (type-i_mutex_dir_key#2/1){--..}, at: [8027c400] do_unlinkat+0x6b/0x133 Jan 26 19:41:40 mumm kernel: #1: (sb-s_type-i_mutex_key#2){--..}, at: [8027a59d] vfs_unlink+0x42/0x8c Jan 26 19:41:40 mumm kernel: #2: ((ip-i_lock)-mr_lock){}, at: [802f4093] xfs_ilock+0x63/0x8d Jan 26 19:41:40 mumm kernel: #3: (list-qh_lock){--..}, at: [802c5841] xfs_qm_dqget+0x1f6/0x355 Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: stack backtrace: Jan 26 19:41:40 mumm kernel: Pid: 1589, comm: rm Not tainted 2.6.24 #3 Jan 26 19:41:40 mumm kernel: Jan 26 19:41:40 mumm kernel: Call Trace: Jan 26 19:41:40 mumm kernel: [802446b2] __lock_acquire+0x1b7/0xcd0 Jan 26 19:41:40 mumm kernel: [802c58e2] xfs_qm_dqget+0x297/0x355 Jan 26 19:41:40 mumm kernel: [802455f4] lock_acquire+0x51/0x6c Jan 26 19:41:40 mumm kernel: [802c58e2] xfs_qm_dqget+0x297/0x355 Jan 26 19:41:40 mumm kernel: [80241ef9] debug_mutex_lock_common+0x16/0x23 Jan 26 19:41:40 mumm kernel: [803f8c7f] mutex_lock_nested+0xec/0x286 Jan 26 19:41:40 mumm kernel: [802c58e2] xfs_qm_dqget+0x297/0x355 Jan 26 19:41:40 mumm kernel: [802c8c2c] xfs_qm_dqattach_one+0xe2/0x150 Jan 26 19:41:40 mumm kernel: [802c8d51] xfs_qm_dqattach+0xb7/0x175 Jan 26 19:41:40 mumm kernel: [80311ac5] xfs_remove+0x8f/0x364 Jan 26 19:41:40 mumm kernel: [803f8dff] mutex_lock_nested+0x26c/0x286 Jan 26 19:41:40 mumm kernel: [80244082] trace_hardirqs_on+0x115/0x139 Jan 26 19:41:40 mumm kernel: [80318e6f] xfs_vn_unlink+0x17/0x36 Jan 26 19:41:40 mumm kernel: [8027a5b7] vfs_unlink+0x5c/0x8c Jan 26 19:41:40 mumm kernel: [8027c449] do_unlinkat+0xb4/0x133 Jan 26 19:41:40 mumm kernel: [803f997a] trace_hardirqs_on_thunk+0x35/0x3a Jan 26
possible deadlock shown by CONFIG_PROVE_LOCKING
I compiled the kernel with Ingo's CONFIG_PROVE_LOCKING and got the below at boot. Is it a problem? Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar ... MAX_LOCKDEP_SUBCLASSES:8 ... MAX_LOCK_DEPTH: 30 ... MAX_LOCKDEP_KEYS:2048 ... CLASSHASH_SIZE: 1024 ... MAX_LOCKDEP_ENTRIES: 8192 ... MAX_LOCKDEP_CHAINS: 16384 ... CHAINHASH_SIZE: 8192 memory used by lock dependency info: 1648 kB per task-struct memory footprint: 1680 bytes | Locking API testsuite: [removed] --- Good, all 218 testcases passed! | - Further down md: running: sdah1sdag1 raid1: raid set md3 active with 2 out of 2 mirrors md: ... autorun DONE. Filesystem md1: Disabling barriers, not supported by the underlying device XFS mounting filesystem md1 Ending clean XFS mount for filesystem: md1 VFS: Mounted root (xfs filesystem). Freeing unused kernel memory: 284k freed Warning: unable to open an initial console. Filesystem md1: Disabling barriers, not supported by the underlying device === [ INFO: possible circular locking dependency detected ] 2.6.22.16 #1 --- mount/1558 is trying to acquire lock: ((ip-i_lock)-mr_lock/1){--..}, at: [80312805] xfs_ilock+0x63/0x8d but task is already holding lock: ((ip-i_lock)-mr_lock){}, at: [80312805] xfs_ilock+0x63/0x8d which lock already depends on the new lock. the existing dependency chain (in reverse order) is: - #1 ((ip-i_lock)-mr_lock){}: [80249fa6] __lock_acquire+0xa0f/0xb9f [8024a50d] lock_acquire+0x48/0x63 [80312805] xfs_ilock+0x63/0x8d [8023c909] down_write_nested+0x38/0x46 [80312805] xfs_ilock+0x63/0x8d [803132e8] xfs_iget_core+0x3ef/0x705 [803136a2] xfs_iget+0xa4/0x14e [80328364] xfs_trans_iget+0xb4/0x128 [80316a57] xfs_ialloc+0x9b/0x4b7 [80249fc9] __lock_acquire+0xa32/0xb9f [80328d87] xfs_dir_ialloc+0x84/0x2cd [80312805] xfs_ilock+0x63/0x8d [8023c909] down_write_nested+0x38/0x46 [8032e307] xfs_create+0x331/0x65f [80308163] xfs_dir2_leaf_lookup+0x1d/0x96 [80338367] xfs_vn_mknod+0x12f/0x1f2 [8027fb0a] vfs_create+0x6e/0x9e [80282af3] open_namei+0x1f7/0x6a9 [8021843d] do_page_fault+0x438/0x78f [8027705a] do_filp_open+0x1c/0x3d [8045bf56] _spin_unlock+0x17/0x20 [80276e3d] get_unused_fd+0x11c/0x12a [802770bb] do_sys_open+0x40/0x7b [802095be] system_call+0x7e/0x83 [] 0x - #0 ((ip-i_lock)-mr_lock/1){--..}: [80248896] print_circular_bug_header+0xcc/0xd3 [80249ea2] __lock_acquire+0x90b/0xb9f [8024a50d] lock_acquire+0x48/0x63 [80312805] xfs_ilock+0x63/0x8d [8023c909] down_write_nested+0x38/0x46 [80312805] xfs_ilock+0x63/0x8d [8032bd30] xfs_lock_inodes+0x152/0x16d [8032e807] xfs_link+0x1d2/0x3f7 [80249f3f] __lock_acquire+0x9a8/0xb9f [80337fe5] xfs_vn_link+0x3c/0x91 [80248f4a] mark_held_locks+0x58/0x72 [8045a9b7] __mutex_lock_slowpath+0x250/0x266 [80249119] trace_hardirqs_on+0x115/0x139 [8045a9c2] __mutex_lock_slowpath+0x25b/0x266 [8027f88b] vfs_link+0xe8/0x124 [802822d8] sys_linkat+0xcd/0x129 [8045baaf] trace_hardirqs_on_thunk+0x35/0x37 [80249119] trace_hardirqs_on+0x115/0x139 [8045baaf] trace_hardirqs_on_thunk+0x35/0x37 [802095be] system_call+0x7e/0x83 [] 0x other info that might help us debug this: 3 locks held by mount/1558: #0: (inode-i_mutex/1){--..}, at: [802800f5] lookup_create+0x23/0x8 5 #1: (inode-i_mutex){--..}, at: [8027f878] vfs_link+0xd5/0x124 #2: ((ip-i_lock)-mr_lock){}, at: [80312805] xfs_ilock+0x63/0 x8d stack backtrace: Call Trace: [80248612] print_circular_bug_tail+0x69/0x72 [80248896] print_circular_bug_header+0xcc/0xd3 [80249ea2] __lock_acquire+0x90b/0xb9f [8024a50d] lock_acquire+0x48/0x63 [80312805] xfs_ilock+0x63/0x8d [8023c909] down_write_nested+0x38/0x46 [80312805] xfs_ilock+0x63/0x8d [8032bd30] xfs_lock_inodes+0x152/0x16d [8032e807] xfs_link+0x1d2/0x3f7 [80249f3f] __lock_acquire+0x9a8/0xb9f [80337fe5] xfs_vn_link+0x3c/0x91
Re: possible deadlock shown by CONFIG_PROVE_LOCKING
Carlos Carvalho wrote: I compiled the kernel with Ingo's CONFIG_PROVE_LOCKING and got the below at boot. Is it a problem? It was a problem - it has since been fixed in 2.6.23. Patch is attached in case you're interested. Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar ... MAX_LOCKDEP_SUBCLASSES:8 ... MAX_LOCK_DEPTH: 30 ... MAX_LOCKDEP_KEYS:2048 ... CLASSHASH_SIZE: 1024 ... MAX_LOCKDEP_ENTRIES: 8192 ... MAX_LOCKDEP_CHAINS: 16384 ... CHAINHASH_SIZE: 8192 memory used by lock dependency info: 1648 kB per task-struct memory footprint: 1680 bytes | Locking API testsuite: [removed] --- Good, all 218 testcases passed! | - Further down md: running: sdah1sdag1 raid1: raid set md3 active with 2 out of 2 mirrors md: ... autorun DONE. Filesystem md1: Disabling barriers, not supported by the underlying device XFS mounting filesystem md1 Ending clean XFS mount for filesystem: md1 VFS: Mounted root (xfs filesystem). Freeing unused kernel memory: 284k freed Warning: unable to open an initial console. Filesystem md1: Disabling barriers, not supported by the underlying device === [ INFO: possible circular locking dependency detected ] 2.6.22.16 #1 --- mount/1558 is trying to acquire lock: ((ip-i_lock)-mr_lock/1){--..}, at: [80312805] xfs_ilock+0x63/0x8d but task is already holding lock: ((ip-i_lock)-mr_lock){}, at: [80312805] xfs_ilock+0x63/0x8d which lock already depends on the new lock. the existing dependency chain (in reverse order) is: - #1 ((ip-i_lock)-mr_lock){}: [80249fa6] __lock_acquire+0xa0f/0xb9f [8024a50d] lock_acquire+0x48/0x63 [80312805] xfs_ilock+0x63/0x8d [8023c909] down_write_nested+0x38/0x46 [80312805] xfs_ilock+0x63/0x8d [803132e8] xfs_iget_core+0x3ef/0x705 [803136a2] xfs_iget+0xa4/0x14e [80328364] xfs_trans_iget+0xb4/0x128 [80316a57] xfs_ialloc+0x9b/0x4b7 [80249fc9] __lock_acquire+0xa32/0xb9f [80328d87] xfs_dir_ialloc+0x84/0x2cd [80312805] xfs_ilock+0x63/0x8d [8023c909] down_write_nested+0x38/0x46 [8032e307] xfs_create+0x331/0x65f [80308163] xfs_dir2_leaf_lookup+0x1d/0x96 [80338367] xfs_vn_mknod+0x12f/0x1f2 [8027fb0a] vfs_create+0x6e/0x9e [80282af3] open_namei+0x1f7/0x6a9 [8021843d] do_page_fault+0x438/0x78f [8027705a] do_filp_open+0x1c/0x3d [8045bf56] _spin_unlock+0x17/0x20 [80276e3d] get_unused_fd+0x11c/0x12a [802770bb] do_sys_open+0x40/0x7b [802095be] system_call+0x7e/0x83 [] 0x - #0 ((ip-i_lock)-mr_lock/1){--..}: [80248896] print_circular_bug_header+0xcc/0xd3 [80249ea2] __lock_acquire+0x90b/0xb9f [8024a50d] lock_acquire+0x48/0x63 [80312805] xfs_ilock+0x63/0x8d [8023c909] down_write_nested+0x38/0x46 [80312805] xfs_ilock+0x63/0x8d [8032bd30] xfs_lock_inodes+0x152/0x16d [8032e807] xfs_link+0x1d2/0x3f7 [80249f3f] __lock_acquire+0x9a8/0xb9f [80337fe5] xfs_vn_link+0x3c/0x91 [80248f4a] mark_held_locks+0x58/0x72 [8045a9b7] __mutex_lock_slowpath+0x250/0x266 [80249119] trace_hardirqs_on+0x115/0x139 [8045a9c2] __mutex_lock_slowpath+0x25b/0x266 [8027f88b] vfs_link+0xe8/0x124 [802822d8] sys_linkat+0xcd/0x129 [8045baaf] trace_hardirqs_on_thunk+0x35/0x37 [80249119] trace_hardirqs_on+0x115/0x139 [8045baaf] trace_hardirqs_on_thunk+0x35/0x37 [802095be] system_call+0x7e/0x83 [] 0x other info that might help us debug this: 3 locks held by mount/1558: #0: (inode-i_mutex/1){--..}, at: [802800f5] lookup_create+0x23/0x8 5 #1: (inode-i_mutex){--..}, at: [8027f878] vfs_link+0xd5/0x124 #2: ((ip-i_lock)-mr_lock){}, at: [80312805] xfs_ilock+0x63/0 x8d stack backtrace: Call Trace: [80248612] print_circular_bug_tail+0x69/0x72 [80248896] print_circular_bug_header+0xcc/0xd3 [80249ea2] __lock_acquire+0x90b/0xb9f [8024a50d] lock_acquire+0x48/0x63 [80312805] xfs_ilock+0x63/0x8d [8023c909] down_write_nested+0x38/0x46 [80312805] xfs_ilock+0x63/0x8d [8032bd30] xfs_lock_inodes+0x152/0x16d [8032e807]