Re: transaction commit deadlock on current rc

2013-10-18 Thread Josef Bacik
On Thu, Oct 17, 2013 at 12:56:14PM -0700, Sage Weil wrote:
 Hey,
 
 I'm seeing the deadlock below under a ceph-osd workload.  There may be a 
 subtle problem with the async transaction sequence (since nobody but ceph 
 uses that that I know of), but not obvious to me why 
 create_pending_snapshots would get stuck on btrfs_tree_lock...
 

Can you do sysrq+w when this happens so I can see everybody who's blocked?
Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: transaction commit deadlock on current rc

2013-10-18 Thread Sage Weil
On Fri, 18 Oct 2013, Josef Bacik wrote:
 On Thu, Oct 17, 2013 at 12:56:14PM -0700, Sage Weil wrote:
  Hey,
  
  I'm seeing the deadlock below under a ceph-osd workload.  There may be a 
  subtle problem with the async transaction sequence (since nobody but ceph 
  uses that that I know of), but not obvious to me why 
  create_pending_snapshots would get stuck on btrfs_tree_lock...
  
 
 Can you do sysrq+w when this happens so I can see everybody who's blocked?
 Thanks,

Oops, forgot to attach the bug link.  It's at

http://tracker.ceph.com/attachments/download/1035/a
http://tracker.ceph.com/issues/6451

The machine is still hung.. if there is additional info I can gather 
you can ping me on irc.  

Thanks!
sage
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: transaction commit deadlock on current rc

2013-10-18 Thread Josef Bacik
On Fri, Oct 18, 2013 at 08:42:28AM -0700, Sage Weil wrote:
 On Fri, 18 Oct 2013, Josef Bacik wrote:
  On Thu, Oct 17, 2013 at 12:56:14PM -0700, Sage Weil wrote:
   Hey,
   
   I'm seeing the deadlock below under a ceph-osd workload.  There may be a 
   subtle problem with the async transaction sequence (since nobody but ceph 
   uses that that I know of), but not obvious to me why 
   create_pending_snapshots would get stuck on btrfs_tree_lock...
   
  
  Can you do sysrq+w when this happens so I can see everybody who's blocked?
  Thanks,
 
 Oops, forgot to attach the bug link.  It's at
 
   http://tracker.ceph.com/attachments/download/1035/a
   http://tracker.ceph.com/issues/6451
 
 The machine is still hung.. if there is additional info I can gather 
 you can ping me on irc.  
 

Oops, I'll fix that right up, sorry about that.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: transaction commit deadlock on current rc

2013-10-18 Thread Chris Mason
Quoting Sage Weil (2013-10-18 11:42:28)
 On Fri, 18 Oct 2013, Josef Bacik wrote:
  On Thu, Oct 17, 2013 at 12:56:14PM -0700, Sage Weil wrote:
   Hey,
   
   I'm seeing the deadlock below under a ceph-osd workload.  There may be a 
   subtle problem with the async transaction sequence (since nobody but ceph 
   uses that that I know of), but not obvious to me why 
   create_pending_snapshots would get stuck on btrfs_tree_lock...
   
  
  Can you do sysrq+w when this happens so I can see everybody who's blocked?
  Thanks,
 
 Oops, forgot to attach the bug link.  It's at
 
 http://tracker.ceph.com/attachments/download/1035/a
 http://tracker.ceph.com/issues/6451
 
 The machine is still hung.. if there is additional info I can gather 
 you can ping me on irc.  

Thanks Sage and Josef, I've got this one queued up pending an ack from
Sage.  But it's obviously not harmful, so I'll probably send this
afternoon either way.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: transaction commit deadlock on current rc

2013-10-18 Thread Sage Weil
On Fri, 18 Oct 2013, Chris Mason wrote:
 Quoting Sage Weil (2013-10-18 11:42:28)
  On Fri, 18 Oct 2013, Josef Bacik wrote:
   On Thu, Oct 17, 2013 at 12:56:14PM -0700, Sage Weil wrote:
Hey,

I'm seeing the deadlock below under a ceph-osd workload.  There may be 
a 
subtle problem with the async transaction sequence (since nobody but 
ceph 
uses that that I know of), but not obvious to me why 
create_pending_snapshots would get stuck on btrfs_tree_lock...

   
   Can you do sysrq+w when this happens so I can see everybody who's blocked?
   Thanks,
  
  Oops, forgot to attach the bug link.  It's at
  
  http://tracker.ceph.com/attachments/download/1035/a
  http://tracker.ceph.com/issues/6451
  
  The machine is still hung.. if there is additional info I can gather 
  you can ping me on irc.  
 
 Thanks Sage and Josef, I've got this one queued up pending an ack from
 Sage.  But it's obviously not harmful, so I'll probably send this
 afternoon either way.

This is passing my initial tests!  It'll be subjected to the full firehose 
later tonight; I'll let you know if anything comes up.

Thanks!
sage
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


transaction commit deadlock on current rc

2013-10-17 Thread Sage Weil
Hey,

I'm seeing the deadlock below under a ceph-osd workload.  There may be a 
subtle problem with the async transaction sequence (since nobody but ceph 
uses that that I know of), but not obvious to me why 
create_pending_snapshots would get stuck on btrfs_tree_lock...

[  602.217383] INFO: task kworker/3:2:771 blocked for more than 120 seconds.
[  602.224234]   Not tainted 3.12.0-rc2-ceph-9-g53d0281 #1
[  602.230216] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[  602.238121] kworker/3:2 D 88003677df10 0   771  2 0x
[  602.245349] Workqueue: events do_async_commit [btrfs]
[  602.250513]  8800c95c78d8 0046 0286 
8800638fca08
[  602.258192]  88003677df10 8800c95c7fd8 8800c95c7fd8 
8800c95c7fd8
[  602.265867]  880225d2df10 88003677df10 8800c95c78e8 
8800638fc8e0
[  602.273545] Call Trace:
[  602.276049]  [81665849] schedule+0x29/0x70
[  602.281087]  [a0176975] btrfs_tree_lock+0x75/0x270 [btrfs]
[  602.287509]  [81070310] ? __init_waitqueue_head+0x60/0x60
[  602.293840]  [a01185bb] btrfs_lock_root_node+0x3b/0x50 [btrfs]
[  602.300612]  [a011da67] btrfs_search_slot+0x867/0x930 [btrfs]
[  602.307293]  [a012ac62] ? run_clustered_refs+0x232/0xf30 [btrfs]
[  602.314236]  [a011f238] btrfs_insert_empty_items+0x78/0xd0 [btrfs]
[  602.321393]  [a01330cc] insert_with_overflow+0x3c/0x110 [btrfs]
[  602.328287]  [a013325f] btrfs_insert_dir_item+0xbf/0x200 [btrfs]
[  602.335229]  [a013f19c] create_pending_snapshot+0x81c/0xa00 [btrfs]
[  602.342469]  [a013f423] create_pending_snapshots+0xa3/0xb0 [btrfs]
[  602.349624]  [a01408fe] btrfs_commit_transaction+0x46e/0xa40 
[btrfs]
[  602.356919]  [81070310] ? __init_waitqueue_head+0x60/0x60
[  602.363291]  [a0140f58] do_async_commit+0x88/0xa0 [btrfs]
[  602.369665]  [a0140ef9] ? do_async_commit+0x29/0xa0 [btrfs]
[  602.376166]  [810672fa] process_one_work+0x1da/0x540
[  602.382099]  [8106728f] ? process_one_work+0x16f/0x540
[  602.388205]  [810684dc] worker_thread+0x11c/0x370
[  602.393834]  [810683c0] ? manage_workers.isra.20+0x2e0/0x2e0
[  602.400462]  [8106fada] kthread+0xea/0xf0
[  602.405396]  [8106f9f0] ? flush_kthread_worker+0x150/0x150
[  602.411836]  [8166fdec] ret_from_fork+0x7c/0xb0
[  602.417300]  [8106f9f0] ? flush_kthread_worker+0x150/0x150
[  602.423787] INFO: lockdep is turned off.

[  602.427852] INFO: task btrfs-transacti:6069 blocked for more than 120 
seconds.
[  602.435155]   Not tainted 3.12.0-rc2-ceph-9-g53d0281 #1
[  602.441229] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[  602.449212] btrfs-transacti D 8800c96461e8 0  6069  2 0x
[  602.457660]  88022408fd08 0046 0286 
8800b68a4578
[  602.465350]  88022448df10 88022408ffd8 88022408ffd8 
88022408ffd8
[  602.473081]  880225d29fb0 88022448df10 88022408fd18 
880082fd48a8
[  602.480835] Call Trace:
[  602.483342]  [81665849] schedule+0x29/0x70
[  602.488450]  [a013f74f] wait_current_trans.isra.33+0xbf/0x120 
[btrfs]
[  602.495836]  [81070310] ? __init_waitqueue_head+0x60/0x60
[  602.502241]  [a01416a8] start_transaction+0x348/0x540 [btrfs]
[  602.509010]  [a0141907] btrfs_attach_transaction+0x17/0x20 [btrfs]
[  602.516124]  [a0139c12] transaction_kthread+0x182/0x250 [btrfs]
[  602.523065]  [a0139a90] ? btrfs_destroy_delayed_refs+0x370/0x370 
[btrfs]
[  602.530791]  [8106fada] kthread+0xea/0xf0
[  602.535725]  [8106f9f0] ? flush_kthread_worker+0x150/0x150
[  602.542178]  [8166fdec] ret_from_fork+0x7c/0xb0
[  602.547658]  [8106f9f0] ? flush_kthread_worker+0x150/0x150
[  602.554068] INFO: lockdep is turned off.

[  602.558154] INFO: task ceph-osd:12248 blocked for more than 120 seconds.
[  602.558155]   Not tainted 3.12.0-rc2-ceph-9-g53d0281 #1
[  602.558156] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[  602.558158] ceph-osdD 880082fd48a8 0 12248  12215 0x
[  602.558161]  880184441b58 0046 0282 
8800b68a4578
[  602.558162]  880077fcbf60 880184441fd8 880184441fd8 
880184441fd8
[  602.558164]  88003677df10 880077fcbf60 880184441b68 
880184441ba0
[  602.558164] Call Trace:
[  602.558166]  [81665849] schedule+0x29/0x70
[  602.558178]  [a0141af7] btrfs_commit_transaction_async+0x187/0x2c0 
[btrfs]
[  602.558188]  [a01413f6] ? start_transaction+0x96/0x540 [btrfs]
[  602.558190]  [81070310] ? __init_waitqueue_head+0x60/0x60
[  602.558201]  [a0171565] btrfs_mksubvol.isra.59+0x2a5/0x410 [btrfs]
[  602.558204]  [811a3d9c] ?