Re: Lockup in BTRFS_IOC_CLONE/Kernel 4.2.0-rc5
On Fri, Aug 7, 2015 at 5:01 AM, Liu Bowrote: > Hi, > > On Wed, Aug 05, 2015 at 10:28:05AM +0200, Elias Probst wrote: >> I can reproduce a hard btrfs lockup (process issuing the ioctl() is in >> D-state, same goes for btrfs-transacti process) on Kernel 4.2.0-rc5. >> >> I had the same issue on 4.1, so it's unlikely a regression introduced in >> 4.2. >> >> ## With the following steps, I can reproduce the problem: >> >> 1. Create a new clean btrfs volume for /var/lib/machines >> machinectl set-limit 6G >> >> 2. Paste this to /tmp/yum.conf >> [main] >> reposdir=/dev/null >> gpgcheck=0 >> logfile=/var/log/yum.log >> installroot=/var/lib/machines/centos7.1-base >> assumeyes=1 >> >> [base] >> name=CentOS 7.1.1503 - x86_64 >> baseurl=http://mirror.centos.org/centos/7.1.1503/os/x86_64/ >> enabled=1 >> >> 3. Bootstrap a CentOS 7.1 base image >> /usr/bin/yum -c /tmp/yum.conf groupinstall Base >> >> 4. Start an ephemeral systemd-nspawn container based on 'centos7.1-base' >> strace -o /tmp/systemd-nspawn.out -s 500 -f systemd-nspawn -xbD >> /var/lib/machines/centos7.1-base/ >> >> >> `systemd-nspawn` will now just hang forever. >> I couldn't come up yet with a shorter/more low-level way to reproduce this >> as I lack quite a bit of btrfs experience. > > Thank you for reporting this. > > Could you do 'echo w > /proc/sysrq-trigger' to gather the whole hang call > stack? > > Here's a quick patch that may address your problem, can you give it a shot > after > getting sysrq-w output? Hi Liu, Can you please send the patch git formatted, in a dedicated thread, with a change log, signed-off-by tag, etc, so that it hopefully gets in to 4.4? You can also add my Reviewed-by: Filipe Manana Thanks > > Thanks, > > -liubo > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 0770c91..b52bd66 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -3478,6 +3478,22 @@ process_slot: > drop_start = new_key.offset; > > /* > +* We need to look up the roots that point at > +* this bytenr and see if the new root does. If > +* it does not we need to make sure we update > +* quotas appropriately. > +*/ > + if (disko && root != BTRFS_I(src)->root && > + disko != last_disko) { > + no_quota = check_ref(trans, root, > +disko); > + if (no_quota < 0) { > + ret = no_quota; > + goto out; > + } > + } > + > + /* > * 1 - adjusting old extent (we may have to * split > it) > * 1 - add new extent > * 1 - inode update > @@ -3544,27 +3560,6 @@ process_slot: > btrfs_set_file_extent_num_bytes(leaf, extent, > datal); > > - /* > -* We need to look up the roots that point at > -* this bytenr and see if the new root does. > If > -* it does not we need to make sure we update > -* quotas appropriately. - >*/ > - if (disko && root != BTRFS_I(src)->root && > - disko != last_disko) { > - no_quota = check_ref(trans, root, > -disko); > - if (no_quota < 0) { > - btrfs_abort_transaction(trans, > - root, > - ret); > - btrfs_end_transaction(trans, > - root); > - ret = no_quota; > - goto out; > - } > - } > - > if (disko) { > inode_add_bytes(inode, datal); > ret = btrfs_inc_extent_ref(trans, > root, > > >> >> ## Results: >> >> - Last 'strace' lines >> 6095 fchown(16, 0, 0) = 0 >> 6095 fchmod(16, 0755) = 0 >> 6095 utimensat(16, NULL, {{1402362275, 0}, {1438761285, 819041906}}, 0) = 0 >>
Re: Lockup in BTRFS_IOC_CLONE/Kernel 4.2.0-rc5
On 08/08/2015 01:29 AM, Elias Probst wrote: On 08/07/2015 06:01 AM, Liu Bo wrote: Could you do 'echo w /proc/sysrq-trigger' to gather the whole hang call stack? Thanks a lot for the feedback. Full call stack output is attached (pasting inline makes no sense due to the size of 2423 lines/135k). In case VGER will strip attachments of this size, I made it available as a pastebin here: https://bpaste.net/show/e5e0fd4bbb9f Here's a quick patch that may address your problem, can you give it a shot after getting sysrq-w output? Will rebuild now with your patch and keep you updated about the outcome. So far I've been unable to reproduce the lockup using your patch on 4.2.0-rc6. Thanks a lot! - Elias signature.asc Description: OpenPGP digital signature
Re: Lockup in BTRFS_IOC_CLONE/Kernel 4.2.0-rc5
On 08/07/2015 06:01 AM, Liu Bo wrote: Could you do 'echo w /proc/sysrq-trigger' to gather the whole hang call stack? Thanks a lot for the feedback. Full call stack output is attached (pasting inline makes no sense due to the size of 2423 lines/135k). In case VGER will strip attachments of this size, I made it available as a pastebin here: https://bpaste.net/show/e5e0fd4bbb9f Here's a quick patch that may address your problem, can you give it a shot after getting sysrq-w output? Will rebuild now with your patch and keep you updated about the outcome. Thanks a lot again! - Elias signature.asc Description: OpenPGP digital signature
Re: Lockup in BTRFS_IOC_CLONE/Kernel 4.2.0-rc5
Hi, On Wed, Aug 05, 2015 at 10:28:05AM +0200, Elias Probst wrote: I can reproduce a hard btrfs lockup (process issuing the ioctl() is in D-state, same goes for btrfs-transacti process) on Kernel 4.2.0-rc5. I had the same issue on 4.1, so it's unlikely a regression introduced in 4.2. ## With the following steps, I can reproduce the problem: 1. Create a new clean btrfs volume for /var/lib/machines machinectl set-limit 6G 2. Paste this to /tmp/yum.conf [main] reposdir=/dev/null gpgcheck=0 logfile=/var/log/yum.log installroot=/var/lib/machines/centos7.1-base assumeyes=1 [base] name=CentOS 7.1.1503 - x86_64 baseurl=http://mirror.centos.org/centos/7.1.1503/os/x86_64/ enabled=1 3. Bootstrap a CentOS 7.1 base image /usr/bin/yum -c /tmp/yum.conf groupinstall Base 4. Start an ephemeral systemd-nspawn container based on 'centos7.1-base' strace -o /tmp/systemd-nspawn.out -s 500 -f systemd-nspawn -xbD /var/lib/machines/centos7.1-base/ `systemd-nspawn` will now just hang forever. I couldn't come up yet with a shorter/more low-level way to reproduce this as I lack quite a bit of btrfs experience. Thank you for reporting this. Could you do 'echo w /proc/sysrq-trigger' to gather the whole hang call stack? Here's a quick patch that may address your problem, can you give it a shot after getting sysrq-w output? Thanks, -liubo diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0770c91..b52bd66 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3478,6 +3478,22 @@ process_slot: drop_start = new_key.offset; /* +* We need to look up the roots that point at +* this bytenr and see if the new root does. If +* it does not we need to make sure we update +* quotas appropriately. +*/ + if (disko root != BTRFS_I(src)-root + disko != last_disko) { + no_quota = check_ref(trans, root, +disko); + if (no_quota 0) { + ret = no_quota; + goto out; + } + } + + /* * 1 - adjusting old extent (we may have to * split it) * 1 - add new extent * 1 - inode update @@ -3544,27 +3560,6 @@ process_slot: btrfs_set_file_extent_num_bytes(leaf, extent, datal); - /* -* We need to look up the roots that point at -* this bytenr and see if the new root does. If -* it does not we need to make sure we update -* quotas appropriately. - */ - if (disko root != BTRFS_I(src)-root - disko != last_disko) { - no_quota = check_ref(trans, root, -disko); - if (no_quota 0) { - btrfs_abort_transaction(trans, - root, - ret); - btrfs_end_transaction(trans, - root); - ret = no_quota; - goto out; - } - } - if (disko) { inode_add_bytes(inode, datal); ret = btrfs_inc_extent_ref(trans, root, ## Results: - Last 'strace' lines 6095 fchown(16, 0, 0) = 0 6095 fchmod(16, 0755) = 0 6095 utimensat(16, NULL, {{1402362275, 0}, {1438761285, 819041906}}, 0) = 0 6095 flistxattr(15, , 100) = 0 6095 getdents(15, /* 3 entries */, 32768) = 80 6095 newfstatat(15, coreutils.mo, {st_mode=S_IFREG|0644, st_size=357263, ...}, AT_SYMLINK_NOFOLLOW) = 0 6095 openat(15, coreutils.mo, O_RDONLY|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC) = 17 6095 openat(16, coreutils.mo, O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC, 0644) = 18 6095 fstat(18, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 6095 ioctl(18, BTRFS_IOC_CLONE - call trace in Kernel journal: Aug 05 10:10:03 moria kernel:
Lockup in BTRFS_IOC_CLONE/Kernel 4.2.0-rc5
I can reproduce a hard btrfs lockup (process issuing the ioctl() is in D-state, same goes for btrfs-transacti process) on Kernel 4.2.0-rc5. I had the same issue on 4.1, so it's unlikely a regression introduced in 4.2. ## With the following steps, I can reproduce the problem: 1. Create a new clean btrfs volume for /var/lib/machines machinectl set-limit 6G 2. Paste this to /tmp/yum.conf [main] reposdir=/dev/null gpgcheck=0 logfile=/var/log/yum.log installroot=/var/lib/machines/centos7.1-base assumeyes=1 [base] name=CentOS 7.1.1503 - x86_64 baseurl=http://mirror.centos.org/centos/7.1.1503/os/x86_64/ enabled=1 3. Bootstrap a CentOS 7.1 base image /usr/bin/yum -c /tmp/yum.conf groupinstall Base 4. Start an ephemeral systemd-nspawn container based on 'centos7.1-base' strace -o /tmp/systemd-nspawn.out -s 500 -f systemd-nspawn -xbD /var/lib/machines/centos7.1-base/ `systemd-nspawn` will now just hang forever. I couldn't come up yet with a shorter/more low-level way to reproduce this as I lack quite a bit of btrfs experience. ## Results: - Last 'strace' lines 6095 fchown(16, 0, 0) = 0 6095 fchmod(16, 0755) = 0 6095 utimensat(16, NULL, {{1402362275, 0}, {1438761285, 819041906}}, 0) = 0 6095 flistxattr(15, , 100) = 0 6095 getdents(15, /* 3 entries */, 32768) = 80 6095 newfstatat(15, coreutils.mo, {st_mode=S_IFREG|0644, st_size=357263, ...}, AT_SYMLINK_NOFOLLOW) = 0 6095 openat(15, coreutils.mo, O_RDONLY|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC) = 17 6095 openat(16, coreutils.mo, O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC, 0644) = 18 6095 fstat(18, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 6095 ioctl(18, BTRFS_IOC_CLONE - call trace in Kernel journal: Aug 05 10:10:03 moria kernel: INFO: task btrfs-transacti:4175 blocked for more than 120 seconds. Aug 05 10:10:03 moria kernel: Tainted: G O4.2.0-rc5 #2 Aug 05 10:10:03 moria kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Aug 05 10:10:03 moria kernel: btrfs-transacti D 8800b13279f8 0 4175 2 0x00080080 Aug 05 10:10:03 moria kernel: 8800b13279f8 88018fd3a380 8800ab4521c0 0246 Aug 05 10:10:03 moria kernel: 8800b1328000 88018d5c8518 88018debdba0 880232d64990 Aug 05 10:10:03 moria kernel: 0197 8800b1327a18 86999201 Aug 05 10:10:03 moria kernel: Call Trace: Aug 05 10:10:03 moria kernel: [86999201] schedule+0x74/0x83 Aug 05 10:10:03 moria kernel: [863ef8f0] btrfs_tree_lock+0xa7/0x1b7 Aug 05 10:10:03 moria kernel: [86137ed7] ? wait_woken+0x74/0x74 Aug 05 10:10:03 moria kernel: [8639d30f] push_leaf_right+0x9a/0x19f Aug 05 10:10:03 moria kernel: [8639dd9b] split_leaf+0x100/0x63f Aug 05 10:10:03 moria kernel: [86398f09] ? leaf_space_used+0xbb/0xea Aug 05 10:10:03 moria kernel: [863efa61] ? btrfs_set_lock_blocking_rw+0x52/0x95 Aug 05 10:10:03 moria kernel: [8639ea46] btrfs_search_slot+0x76c/0x8b3 Aug 05 10:10:03 moria kernel: [863a0107] btrfs_insert_empty_items+0x58/0xa3 Aug 05 10:10:03 moria kernel: [8640805a] btrfs_insert_delayed_items+0x7f/0x3bb Aug 05 10:10:03 moria kernel: [8640842e] __btrfs_run_delayed_items+0x98/0x1c0 Aug 05 10:10:03 moria kernel: [86408739] btrfs_run_delayed_items+0xc/0xe Aug 05 10:10:03 moria kernel: [863bdc50] btrfs_commit_transaction+0x298/0xb66 Aug 05 10:10:03 moria kernel: [863be8d0] ? start_transaction+0x3b2/0x535 Aug 05 10:10:03 moria kernel: [863b9cd9] transaction_kthread+0x100/0x1d6 Aug 05 10:10:03 moria kernel: [863b9bd9] ? btrfs_cleanup_transaction+0x49f/0x49f Aug 05 10:10:03 moria kernel: [8611eca9] kthread+0xcd/0xd5 Aug 05 10:10:03 moria kernel: [8611ebdc] ? kthread_create_on_node+0x17d/0x17d Aug 05 10:10:03 moria kernel: [8699d29f] ret_from_fork+0x3f/0x70 Aug 05 10:10:03 moria kernel: [8611ebdc] ? kthread_create_on_node+0x17d/0x17d Aug 05 10:10:03 moria kernel: INFO: task systemd-nspawn:6095 blocked for more than 120 seconds. Aug 05 10:10:03 moria kernel: Tainted: G O4.2.0-rc5 #2 Aug 05 10:10:03 moria kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Aug 05 10:10:03 moria kernel: systemd-nspawn D 88019f3e3668 0 6095 6090 0x00080083 Aug 05 10:10:03 moria kernel: 88019f3e3668 86e5d480 88018fd3a380 0246 Aug 05 10:10:03 moria kernel: 88019f3e4000 88018debdc08 88018fd3a380 88019f3e36b8 Aug 05 10:10:03 moria kernel: 88018fd3a380 88019f3e3688 86999201 Aug 05 10:10:03 moria kernel: Call Trace: Aug 05 10:10:03 moria kernel: [86999201] schedule+0x74/0x83 Aug 05 10:10:03 moria kernel: [863ef64c] btrfs_tree_read_lock+0xc0/0xea Aug 05 10:10:03 moria kernel: [86137ed7] ?