Re: btrfs on 3.14rc5 stuck on btrfs_tree_read_lock sync
On Mon, Apr 07, 2014 at 01:00:02PM -0700, Marc MERLIN wrote: On Mon, Apr 07, 2014 at 03:32:13PM -0400, Chris Mason wrote: You're recommending that I try btrfs-next on a 3.15 pre kernel, correct? If so would it be likely to fix my filesystem and let me go back to a stable 3.14? (I'm a bit warry about running some unstable 3.15 on it :). Right now the fixes for this are in the integration branch on my git tree. I think we've shaken out all the problems, but if you want to wait until tomorrow I'll have it in my next branch (for linux-next). I can wait, even a few more days if needed. But just to be clear: will this new kernel be something that will be required for me to run from there on to avoid all those deadlocks and very poor performance I'm seeing, or the new kernel will fix things up, and then if other stuff isn't quite stable, I can downgrade back to 3.14 stable? By the way, I think I know which filesystem is causing this, and one unusual thing is that it uses a lot of hardlinks. In case that helps, there are only 40 snapshots on it, but many inodes, of which many are hardlinked together: gargamel:/mnt/btrfs_pool2# btrfs filesystem df `pwd` Data, single: total=3.28TiB, used=2.30TiB System, DUP: total=8.00MiB, used=384.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=74.50GiB, used=70.11GiB Metadata, single: total=8.00MiB, used=0.00 gargamel:/mnt/btrfs_pool2# btrfs filesystem show `pwd` Label: btrfs_pool2 uuid: cb9df6d3-a528-4afc-9a45-4fed5ec358d6 Total devices 1 FS bytes used 2.37TiB devid1 size 7.28TiB used 3.43TiB path /dev/mapper/dshelf2 Back on that front, while debugging the other problem I sent you, I've been having more issues with this device too. At boot time, I've been getting multiple of these after boot: gargamel login: [ 1328.241302] INFO: task btrfs-cleaner:3571 blocked for more than 120 seconds. [ 1328.264046] Not tainted 3.14.0-amd64-i915-preempt-20140216 #2 [ 1328.284413] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 1328.309394] btrfs-cleaner D 88020d5ea800 0 3571 2 0x [ 1328.331996] 8800c8985d00 0046 8800c8985fd8 88020d5ea2d0 [ 1328.355774] 000141c0 88020d5ea2d0 8801d9a7ee50 880213bfc9e8 [ 1328.379408] 880213bfc800 88020c5b7ce0 8800c8985d10 [ 1328.403654] Call Trace: [ 1328.412617] [8160d2a1] schedule+0x73/0x75 [ 1328.429189] [8122aa77] wait_current_trans.isra.15+0x98/0xf4 [ 1328.450402] [81085116] ? finish_wait+0x65/0x65 [ 1328.467957] [8122bf1c] start_transaction+0x48e/0x4f2 [ 1328.487315] [8122c2ff] ? __btrfs_end_transaction+0x2a1/0x2c6 [ 1328.508614] [8122bf9b] btrfs_start_transaction+0x1b/0x1d [ 1328.528842] [8121cc7d] btrfs_drop_snapshot+0x443/0x610 [ 1328.548481] [8122c73d] btrfs_clean_one_deleted_snapshot+0x103/0x10f [ 1328.571518] [81224f09] cleaner_kthread+0x103/0x136 [ 1328.590436] [81224e06] ? btrfs_alloc_root+0x26/0x26 [ 1328.609348] [8106bc62] kthread+0xae/0xb6 [ 1328.625275] [8106bbb4] ? __kthread_parkme+0x61/0x61 [ 1328.644406] [8161637c] ret_from_fork+0x7c/0xb0 [ 1328.662075] [8106bbb4] ? __kthread_parkme+0x61/0x61 But more annoyingly, accessing the mountpoint was hanging, so I've now mounted it with recovery,ro, and backing up all the data to another device so that I can destroy/recreate this device that clearly has severe performance issues. Do you want btrfsck output and an image of that one too? (this one is not raid0, it's on top of an dm encrypted md raid5 array) Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs on 3.14rc5 stuck on btrfs_tree_read_lock sync
I was debugging my why backup failed to run, and eventually found it was stuck on sync: 14080 18:18 btrfs_tree_read_lock sync This was hung for hours on this lock. Strangely, it looks like taking my sysrq-w hung the machine pretty hard for close to 30sec, but this seems to have unhung sync and in the end btrfs send completed after that. Sysqrq-w is here: http://marc.merlins.org/tmp/sysrq-btrfs-sync-hang.txt Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on 3.14rc5 stuck on btrfs_tree_read_lock sync
On 04/07/2014 12:05 PM, Marc MERLIN wrote: I was debugging my why backup failed to run, and eventually found it was stuck on sync: 14080 18:18 btrfs_tree_read_lock sync This was hung for hours on this lock. Strangely, it looks like taking my sysrq-w hung the machine pretty hard for close to 30sec, but this seems to have unhung sync and in the end btrfs send completed after that. Sysqrq-w is here: https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/sysrq-btrfs-sync-hang.txtk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=IHXWC1Chbc0jEiUWu1v4Va9NOphtjPbjYp6yVMdUmXM%3D%0As=bd787a3422e9ff0972d2d09de7d424f56589aadc9d6db33e19fc44886dce604f Try Chris's integration branch in a few hours and see if that fixes it. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on 3.14rc5 stuck on btrfs_tree_read_lock sync
On Mon, Apr 07, 2014 at 12:10:52PM -0400, Josef Bacik wrote: On 04/07/2014 12:05 PM, Marc MERLIN wrote: I was debugging my why backup failed to run, and eventually found it was stuck on sync: 14080 18:18 btrfs_tree_read_lock sync This was hung for hours on this lock. Strangely, it looks like taking my sysrq-w hung the machine pretty hard for close to 30sec, but this seems to have unhung sync and in the end btrfs send completed after that. Sysqrq-w is here: https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/sysrq-btrfs-sync-hang.txtk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=IHXWC1Chbc0jEiUWu1v4Va9NOphtjPbjYp6yVMdUmXM%3D%0As=bd787a3422e9ff0972d2d09de7d424f56589aadc9d6db33e19fc44886dce604f Try Chris's integration branch in a few hours and see if that fixes it. Thanks, Mmmh, so I rebooted that server with 3.14.0 (no rc), and it was deadlocked a long time during boot (about 10mn) before it unlocked itself and finished booting. This is a bit vexing, I don't yet know which of my 3 btrfs filesystems is causing this, and how to fix it. After boot, it seems ok enough. You're recommending that I try btrfs-next on a 3.15 pre kernel, correct? If so would it be likely to fix my filesystem and let me go back to a stable 3.14? (I'm a bit warry about running some unstable 3.15 on it :). Is there a chance balance or some file system cleaning will fix this? For now, during boot, I get: INFO: task btrfs-transacti:3633 blocked for more than 120 seconds. Not tainted 3.14.0-rc5-amd64-i915-preempt-20140216c #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. btrfs-transacti D 88020d762680 0 3633 2 0x 88020c6c7dc0 0046 88020c6c7fd8 88020d762150 000141c0 88020d762150 88020e11be90 8802106271e8 880210627000 8800c5c82740 88020c6c7dd0 Call Trace: [8160c331] schedule+0x73/0x75 [8122a5f9] wait_current_trans.isra.15+0x98/0xf4 [810850c9] ? finish_wait+0x65/0x65 [8122b812] start_transaction+0x202/0x4f2 [8122bb9e] btrfs_attach_transaction+0x17/0x19 [812277a8] transaction_kthread+0xd6/0x1ab [812276d2] ? btrfs_cleanup_transaction+0x43f/0x43f [8106bc56] kthread+0xae/0xb6 [8106bba8] ? __kthread_parkme+0x61/0x61 [816153fc] ret_from_fork+0x7c/0xb0 [8106bba8] ? __kthread_parkme+0x61/0x61 INFO: task btrfs-transacti:3633 blocked for more than 120 seconds. Not tainted 3.14.0-rc5-amd64-i915-preempt-20140216c #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. btrfs-transacti D 88020d762680 0 3633 2 0x 88020c6c7dc0 0046 88020c6c7fd8 88020d762150 000141c0 88020d762150 88020e11be90 8802106271e8 880210627000 8800c5c82740 88020c6c7dd0 Call Trace: [8160c331] schedule+0x73/0x75 [8122a5f9] wait_current_trans.isra.15+0x98/0xf4 [810850c9] ? finish_wait+0x65/0x65 [8122b812] start_transaction+0x202/0x4f2 [8122bb9e] btrfs_attach_transaction+0x17/0x19 [812277a8] transaction_kthread+0xd6/0x1ab [812276d2] ? btrfs_cleanup_transaction+0x43f/0x43f [8106bc56] kthread+0xae/0xb6 [8106bba8] ? __kthread_parkme+0x61/0x61 [816153fc] ret_from_fork+0x7c/0xb0 [8106bba8] ? __kthread_parkme+0x61/0x61 INFO: task btrfs-transacti:3633 blocked for more than 120 seconds. Not tainted 3.14.0-rc5-amd64-i915-preempt-20140216c #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. btrfs-transacti D 88020d762680 0 3633 2 0x 88020c6c7dc0 0046 88020c6c7fd8 88020d762150 000141c0 88020d762150 88020e11be90 8802106271e8 880210627000 8800c5c82740 88020c6c7dd0 Call Trace: [8160c331] schedule+0x73/0x75 [8122a5f9] wait_current_trans.isra.15+0x98/0xf4 [810850c9] ? finish_wait+0x65/0x65 [8122b812] start_transaction+0x202/0x4f2 [8122bb9e] btrfs_attach_transaction+0x17/0x19 [812277a8] transaction_kthread+0xd6/0x1ab [812276d2] ? btrfs_cleanup_transaction+0x43f/0x43f [8106bc56] kthread+0xae/0xb6 [8106bba8] ? __kthread_parkme+0x61/0x61 [816153fc] ret_from_fork+0x7c/0xb0 [8106bba8] ? __kthread_parkme+0x61/0x61 Eventually the boot finishes, but it hangs way too long. Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to
Re: btrfs on 3.14rc5 stuck on btrfs_tree_read_lock sync
On 04/07/2014 02:51 PM, Marc MERLIN wrote: On Mon, Apr 07, 2014 at 12:10:52PM -0400, Josef Bacik wrote: On 04/07/2014 12:05 PM, Marc MERLIN wrote: I was debugging my why backup failed to run, and eventually found it was stuck on sync: 14080 18:18 btrfs_tree_read_lock sync This was hung for hours on this lock. Strangely, it looks like taking my sysrq-w hung the machine pretty hard for close to 30sec, but this seems to have unhung sync and in the end btrfs send completed after that. Sysqrq-w is here: https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/sysrq-btrfs-sync-hang.txtk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=IHXWC1Chbc0jEiUWu1v4Va9NOphtjPbjYp6yVMdUmXM%3D%0As=bd787a3422e9ff0972d2d09de7d424f56589aadc9d6db33e19fc44886dce604f Try Chris's integration branch in a few hours and see if that fixes it. Thanks, Mmmh, so I rebooted that server with 3.14.0 (no rc), and it was deadlocked a long time during boot (about 10mn) before it unlocked itself and finished booting. This is a bit vexing, I don't yet know which of my 3 btrfs filesystems is causing this, and how to fix it. After boot, it seems ok enough. You're recommending that I try btrfs-next on a 3.15 pre kernel, correct? If so would it be likely to fix my filesystem and let me go back to a stable 3.14? (I'm a bit warry about running some unstable 3.15 on it :). Right now the fixes for this are in the integration branch on my git tree. I think we've shaken out all the problems, but if you want to wait until tomorrow I'll have it in my next branch (for linux-next). -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on 3.14rc5 stuck on btrfs_tree_read_lock sync
On Mon, Apr 07, 2014 at 03:32:13PM -0400, Chris Mason wrote: You're recommending that I try btrfs-next on a 3.15 pre kernel, correct? If so would it be likely to fix my filesystem and let me go back to a stable 3.14? (I'm a bit warry about running some unstable 3.15 on it :). Right now the fixes for this are in the integration branch on my git tree. I think we've shaken out all the problems, but if you want to wait until tomorrow I'll have it in my next branch (for linux-next). I can wait, even a few more days if needed. But just to be clear: will this new kernel be something that will be required for me to run from there on to avoid all those deadlocks and very poor performance I'm seeing, or the new kernel will fix things up, and then if other stuff isn't quite stable, I can downgrade back to 3.14 stable? By the way, I think I know which filesystem is causing this, and one unusual thing is that it uses a lot of hardlinks. In case that helps, there are only 40 snapshots on it, but many inodes, of which many are hardlinked together: gargamel:/mnt/btrfs_pool2# btrfs filesystem df `pwd` Data, single: total=3.28TiB, used=2.30TiB System, DUP: total=8.00MiB, used=384.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=74.50GiB, used=70.11GiB Metadata, single: total=8.00MiB, used=0.00 gargamel:/mnt/btrfs_pool2# btrfs filesystem show `pwd` Label: btrfs_pool2 uuid: cb9df6d3-a528-4afc-9a45-4fed5ec358d6 Total devices 1 FS bytes used 2.37TiB devid1 size 7.28TiB used 3.43TiB path /dev/mapper/dshelf2 Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html