Re: transaction commit deadlock on current rc
On Thu, Oct 17, 2013 at 12:56:14PM -0700, Sage Weil wrote: Hey, I'm seeing the deadlock below under a ceph-osd workload. There may be a subtle problem with the async transaction sequence (since nobody but ceph uses that that I know of), but not obvious to me why create_pending_snapshots would get stuck on btrfs_tree_lock... Can you do sysrq+w when this happens so I can see everybody who's blocked? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: transaction commit deadlock on current rc
On Fri, 18 Oct 2013, Josef Bacik wrote: On Thu, Oct 17, 2013 at 12:56:14PM -0700, Sage Weil wrote: Hey, I'm seeing the deadlock below under a ceph-osd workload. There may be a subtle problem with the async transaction sequence (since nobody but ceph uses that that I know of), but not obvious to me why create_pending_snapshots would get stuck on btrfs_tree_lock... Can you do sysrq+w when this happens so I can see everybody who's blocked? Thanks, Oops, forgot to attach the bug link. It's at http://tracker.ceph.com/attachments/download/1035/a http://tracker.ceph.com/issues/6451 The machine is still hung.. if there is additional info I can gather you can ping me on irc. Thanks! sage -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: transaction commit deadlock on current rc
On Fri, Oct 18, 2013 at 08:42:28AM -0700, Sage Weil wrote: On Fri, 18 Oct 2013, Josef Bacik wrote: On Thu, Oct 17, 2013 at 12:56:14PM -0700, Sage Weil wrote: Hey, I'm seeing the deadlock below under a ceph-osd workload. There may be a subtle problem with the async transaction sequence (since nobody but ceph uses that that I know of), but not obvious to me why create_pending_snapshots would get stuck on btrfs_tree_lock... Can you do sysrq+w when this happens so I can see everybody who's blocked? Thanks, Oops, forgot to attach the bug link. It's at http://tracker.ceph.com/attachments/download/1035/a http://tracker.ceph.com/issues/6451 The machine is still hung.. if there is additional info I can gather you can ping me on irc. Oops, I'll fix that right up, sorry about that. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: transaction commit deadlock on current rc
Quoting Sage Weil (2013-10-18 11:42:28) On Fri, 18 Oct 2013, Josef Bacik wrote: On Thu, Oct 17, 2013 at 12:56:14PM -0700, Sage Weil wrote: Hey, I'm seeing the deadlock below under a ceph-osd workload. There may be a subtle problem with the async transaction sequence (since nobody but ceph uses that that I know of), but not obvious to me why create_pending_snapshots would get stuck on btrfs_tree_lock... Can you do sysrq+w when this happens so I can see everybody who's blocked? Thanks, Oops, forgot to attach the bug link. It's at http://tracker.ceph.com/attachments/download/1035/a http://tracker.ceph.com/issues/6451 The machine is still hung.. if there is additional info I can gather you can ping me on irc. Thanks Sage and Josef, I've got this one queued up pending an ack from Sage. But it's obviously not harmful, so I'll probably send this afternoon either way. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: transaction commit deadlock on current rc
On Fri, 18 Oct 2013, Chris Mason wrote: Quoting Sage Weil (2013-10-18 11:42:28) On Fri, 18 Oct 2013, Josef Bacik wrote: On Thu, Oct 17, 2013 at 12:56:14PM -0700, Sage Weil wrote: Hey, I'm seeing the deadlock below under a ceph-osd workload. There may be a subtle problem with the async transaction sequence (since nobody but ceph uses that that I know of), but not obvious to me why create_pending_snapshots would get stuck on btrfs_tree_lock... Can you do sysrq+w when this happens so I can see everybody who's blocked? Thanks, Oops, forgot to attach the bug link. It's at http://tracker.ceph.com/attachments/download/1035/a http://tracker.ceph.com/issues/6451 The machine is still hung.. if there is additional info I can gather you can ping me on irc. Thanks Sage and Josef, I've got this one queued up pending an ack from Sage. But it's obviously not harmful, so I'll probably send this afternoon either way. This is passing my initial tests! It'll be subjected to the full firehose later tonight; I'll let you know if anything comes up. Thanks! sage -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
transaction commit deadlock on current rc
Hey, I'm seeing the deadlock below under a ceph-osd workload. There may be a subtle problem with the async transaction sequence (since nobody but ceph uses that that I know of), but not obvious to me why create_pending_snapshots would get stuck on btrfs_tree_lock... [ 602.217383] INFO: task kworker/3:2:771 blocked for more than 120 seconds. [ 602.224234] Not tainted 3.12.0-rc2-ceph-9-g53d0281 #1 [ 602.230216] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 602.238121] kworker/3:2 D 88003677df10 0 771 2 0x [ 602.245349] Workqueue: events do_async_commit [btrfs] [ 602.250513] 8800c95c78d8 0046 0286 8800638fca08 [ 602.258192] 88003677df10 8800c95c7fd8 8800c95c7fd8 8800c95c7fd8 [ 602.265867] 880225d2df10 88003677df10 8800c95c78e8 8800638fc8e0 [ 602.273545] Call Trace: [ 602.276049] [81665849] schedule+0x29/0x70 [ 602.281087] [a0176975] btrfs_tree_lock+0x75/0x270 [btrfs] [ 602.287509] [81070310] ? __init_waitqueue_head+0x60/0x60 [ 602.293840] [a01185bb] btrfs_lock_root_node+0x3b/0x50 [btrfs] [ 602.300612] [a011da67] btrfs_search_slot+0x867/0x930 [btrfs] [ 602.307293] [a012ac62] ? run_clustered_refs+0x232/0xf30 [btrfs] [ 602.314236] [a011f238] btrfs_insert_empty_items+0x78/0xd0 [btrfs] [ 602.321393] [a01330cc] insert_with_overflow+0x3c/0x110 [btrfs] [ 602.328287] [a013325f] btrfs_insert_dir_item+0xbf/0x200 [btrfs] [ 602.335229] [a013f19c] create_pending_snapshot+0x81c/0xa00 [btrfs] [ 602.342469] [a013f423] create_pending_snapshots+0xa3/0xb0 [btrfs] [ 602.349624] [a01408fe] btrfs_commit_transaction+0x46e/0xa40 [btrfs] [ 602.356919] [81070310] ? __init_waitqueue_head+0x60/0x60 [ 602.363291] [a0140f58] do_async_commit+0x88/0xa0 [btrfs] [ 602.369665] [a0140ef9] ? do_async_commit+0x29/0xa0 [btrfs] [ 602.376166] [810672fa] process_one_work+0x1da/0x540 [ 602.382099] [8106728f] ? process_one_work+0x16f/0x540 [ 602.388205] [810684dc] worker_thread+0x11c/0x370 [ 602.393834] [810683c0] ? manage_workers.isra.20+0x2e0/0x2e0 [ 602.400462] [8106fada] kthread+0xea/0xf0 [ 602.405396] [8106f9f0] ? flush_kthread_worker+0x150/0x150 [ 602.411836] [8166fdec] ret_from_fork+0x7c/0xb0 [ 602.417300] [8106f9f0] ? flush_kthread_worker+0x150/0x150 [ 602.423787] INFO: lockdep is turned off. [ 602.427852] INFO: task btrfs-transacti:6069 blocked for more than 120 seconds. [ 602.435155] Not tainted 3.12.0-rc2-ceph-9-g53d0281 #1 [ 602.441229] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 602.449212] btrfs-transacti D 8800c96461e8 0 6069 2 0x [ 602.457660] 88022408fd08 0046 0286 8800b68a4578 [ 602.465350] 88022448df10 88022408ffd8 88022408ffd8 88022408ffd8 [ 602.473081] 880225d29fb0 88022448df10 88022408fd18 880082fd48a8 [ 602.480835] Call Trace: [ 602.483342] [81665849] schedule+0x29/0x70 [ 602.488450] [a013f74f] wait_current_trans.isra.33+0xbf/0x120 [btrfs] [ 602.495836] [81070310] ? __init_waitqueue_head+0x60/0x60 [ 602.502241] [a01416a8] start_transaction+0x348/0x540 [btrfs] [ 602.509010] [a0141907] btrfs_attach_transaction+0x17/0x20 [btrfs] [ 602.516124] [a0139c12] transaction_kthread+0x182/0x250 [btrfs] [ 602.523065] [a0139a90] ? btrfs_destroy_delayed_refs+0x370/0x370 [btrfs] [ 602.530791] [8106fada] kthread+0xea/0xf0 [ 602.535725] [8106f9f0] ? flush_kthread_worker+0x150/0x150 [ 602.542178] [8166fdec] ret_from_fork+0x7c/0xb0 [ 602.547658] [8106f9f0] ? flush_kthread_worker+0x150/0x150 [ 602.554068] INFO: lockdep is turned off. [ 602.558154] INFO: task ceph-osd:12248 blocked for more than 120 seconds. [ 602.558155] Not tainted 3.12.0-rc2-ceph-9-g53d0281 #1 [ 602.558156] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 602.558158] ceph-osdD 880082fd48a8 0 12248 12215 0x [ 602.558161] 880184441b58 0046 0282 8800b68a4578 [ 602.558162] 880077fcbf60 880184441fd8 880184441fd8 880184441fd8 [ 602.558164] 88003677df10 880077fcbf60 880184441b68 880184441ba0 [ 602.558164] Call Trace: [ 602.558166] [81665849] schedule+0x29/0x70 [ 602.558178] [a0141af7] btrfs_commit_transaction_async+0x187/0x2c0 [btrfs] [ 602.558188] [a01413f6] ? start_transaction+0x96/0x540 [btrfs] [ 602.558190] [81070310] ? __init_waitqueue_head+0x60/0x60 [ 602.558201] [a0171565] btrfs_mksubvol.isra.59+0x2a5/0x410 [btrfs] [ 602.558204] [811a3d9c] ?