Re: Lockup in BTRFS_IOC_CLONE/Kernel 4.2.0-rc5

2015-10-21 Thread Filipe Manana
On Fri, Aug 7, 2015 at 5:01 AM, Liu Bo  wrote:
> Hi,
>
> On Wed, Aug 05, 2015 at 10:28:05AM +0200, Elias Probst wrote:
>> I can reproduce a hard btrfs lockup (process issuing the ioctl() is in
>> D-state, same goes for btrfs-transacti process) on Kernel 4.2.0-rc5.
>>
>> I had the same issue on 4.1, so it's unlikely a regression introduced in
>> 4.2.
>>
>> ## With the following steps, I can reproduce the problem:
>>
>> 1. Create a new clean btrfs volume for /var/lib/machines
>> machinectl set-limit 6G
>>
>> 2. Paste this to /tmp/yum.conf
>> [main]
>> reposdir=/dev/null
>> gpgcheck=0
>> logfile=/var/log/yum.log
>> installroot=/var/lib/machines/centos7.1-base
>> assumeyes=1
>>
>> [base]
>> name=CentOS 7.1.1503 - x86_64
>> baseurl=http://mirror.centos.org/centos/7.1.1503/os/x86_64/
>> enabled=1
>>
>> 3. Bootstrap a CentOS 7.1 base image
>> /usr/bin/yum -c /tmp/yum.conf groupinstall Base
>>
>> 4. Start an ephemeral systemd-nspawn container based on 'centos7.1-base'
>> strace -o /tmp/systemd-nspawn.out -s 500 -f systemd-nspawn -xbD
>> /var/lib/machines/centos7.1-base/
>>
>>
>> `systemd-nspawn` will now just hang forever.
>> I couldn't come up yet with a shorter/more low-level way to reproduce this 
>> as I lack quite a bit of btrfs experience.
>
> Thank you for reporting this.
>
> Could you do 'echo w > /proc/sysrq-trigger' to gather the whole hang call 
> stack?
>
> Here's a quick patch that may address your problem, can you give it a shot 
> after
> getting sysrq-w output?

Hi Liu,

Can you please send the patch git formatted, in a dedicated thread,
with a change log, signed-off-by tag, etc, so that it hopefully gets
in to 4.4? You can also add my Reviewed-by: Filipe Manana


Thanks

>
> Thanks,
>
> -liubo
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 0770c91..b52bd66 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3478,6 +3478,22 @@ process_slot:
> drop_start = new_key.offset;
>
> /*
> +* We need to look up the roots that point at
> +* this bytenr and see if the new root does.  If
> +* it does not we need to make sure we update
> +* quotas appropriately.
> +*/
> +   if (disko && root != BTRFS_I(src)->root &&
> +   disko != last_disko) {
> +   no_quota = check_ref(trans, root,
> +disko);
> +   if (no_quota < 0) {
> +   ret = no_quota;
> +   goto out;
> +   }
> +   }
> +
> +   /*
>  * 1 - adjusting old extent (we may have to * split 
> it)
>  * 1 - add new extent
>  * 1 - inode update
> @@ -3544,27 +3560,6 @@ process_slot:
> btrfs_set_file_extent_num_bytes(leaf, extent,
> datal);
>
> -   /*
> -* We need to look up the roots that point at
> -* this bytenr and see if the new root does.  
> If
> -* it does not we need to make sure we update
> -* quotas appropriately.  -   
>*/
> -   if (disko && root != BTRFS_I(src)->root &&
> -   disko != last_disko) {
> -   no_quota = check_ref(trans, root,
> -disko);
> -   if (no_quota < 0) {
> -   btrfs_abort_transaction(trans,
> -   root,
> -   ret);
> -   btrfs_end_transaction(trans,
> - root);
> -   ret = no_quota;
> -   goto out;
> -   }
> -   }
> -
> if (disko) {
> inode_add_bytes(inode, datal);
> ret = btrfs_inc_extent_ref(trans, 
> root,
>
>
>>
>> ## Results:
>>
>> - Last 'strace' lines
>> 6095  fchown(16, 0, 0)  = 0
>> 6095  fchmod(16, 0755)  = 0
>> 6095  utimensat(16, NULL, {{1402362275, 0}, {1438761285, 819041906}}, 0) = 0
>> 

Re: Lockup in BTRFS_IOC_CLONE/Kernel 4.2.0-rc5

2015-08-15 Thread Elias Probst
On 08/08/2015 01:29 AM, Elias Probst wrote:
 On 08/07/2015 06:01 AM, Liu Bo wrote:
 Could you do 'echo w  /proc/sysrq-trigger' to gather the whole hang call 
 stack?
 
 Thanks a lot for the feedback. Full call stack output is attached
 (pasting inline makes no sense due to the size of 2423 lines/135k).
 
 In case VGER will strip attachments of this size, I made it available as
 a pastebin here: https://bpaste.net/show/e5e0fd4bbb9f
 
 
 Here's a quick patch that may address your problem, can you give it a shot 
 after
 getting sysrq-w output?
 
 Will rebuild now with your patch and keep you updated about the outcome.

So far I've been unable to reproduce the lockup using your patch on
4.2.0-rc6.

Thanks a lot!

- Elias




signature.asc
Description: OpenPGP digital signature


Re: Lockup in BTRFS_IOC_CLONE/Kernel 4.2.0-rc5

2015-08-07 Thread Elias Probst
On 08/07/2015 06:01 AM, Liu Bo wrote:
 Could you do 'echo w  /proc/sysrq-trigger' to gather the whole hang call 
 stack?

Thanks a lot for the feedback. Full call stack output is attached
(pasting inline makes no sense due to the size of 2423 lines/135k).

In case VGER will strip attachments of this size, I made it available as
a pastebin here: https://bpaste.net/show/e5e0fd4bbb9f


 Here's a quick patch that may address your problem, can you give it a shot 
 after
 getting sysrq-w output?

Will rebuild now with your patch and keep you updated about the outcome.

Thanks a lot again!

- Elias





signature.asc
Description: OpenPGP digital signature


Re: Lockup in BTRFS_IOC_CLONE/Kernel 4.2.0-rc5

2015-08-06 Thread Liu Bo
Hi,

On Wed, Aug 05, 2015 at 10:28:05AM +0200, Elias Probst wrote:
 I can reproduce a hard btrfs lockup (process issuing the ioctl() is in
 D-state, same goes for btrfs-transacti process) on Kernel 4.2.0-rc5.
 
 I had the same issue on 4.1, so it's unlikely a regression introduced in
 4.2.
 
 ## With the following steps, I can reproduce the problem:
 
 1. Create a new clean btrfs volume for /var/lib/machines
 machinectl set-limit 6G
 
 2. Paste this to /tmp/yum.conf
 [main]
 reposdir=/dev/null
 gpgcheck=0
 logfile=/var/log/yum.log
 installroot=/var/lib/machines/centos7.1-base
 assumeyes=1
 
 [base]
 name=CentOS 7.1.1503 - x86_64
 baseurl=http://mirror.centos.org/centos/7.1.1503/os/x86_64/
 enabled=1
 
 3. Bootstrap a CentOS 7.1 base image
 /usr/bin/yum -c /tmp/yum.conf groupinstall Base
 
 4. Start an ephemeral systemd-nspawn container based on 'centos7.1-base'
 strace -o /tmp/systemd-nspawn.out -s 500 -f systemd-nspawn -xbD
 /var/lib/machines/centos7.1-base/
 
 
 `systemd-nspawn` will now just hang forever.
 I couldn't come up yet with a shorter/more low-level way to reproduce this as 
 I lack quite a bit of btrfs experience.

Thank you for reporting this.

Could you do 'echo w  /proc/sysrq-trigger' to gather the whole hang call stack?

Here's a quick patch that may address your problem, can you give it a shot after
getting sysrq-w output?

Thanks,

-liubo

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0770c91..b52bd66 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3478,6 +3478,22 @@ process_slot:
drop_start = new_key.offset;
 
/*
+* We need to look up the roots that point at
+* this bytenr and see if the new root does.  If
+* it does not we need to make sure we update
+* quotas appropriately.
+*/
+   if (disko  root != BTRFS_I(src)-root 
+   disko != last_disko) {
+   no_quota = check_ref(trans, root,
+disko);
+   if (no_quota  0) {
+   ret = no_quota;
+   goto out;
+   }
+   }
+
+   /*
 * 1 - adjusting old extent (we may have to * split it)
 * 1 - add new extent
 * 1 - inode update
@@ -3544,27 +3560,6 @@ process_slot:
btrfs_set_file_extent_num_bytes(leaf, extent,
datal);
 
-   /*
-* We need to look up the roots that point at
-* this bytenr and see if the new root does.  If
-* it does not we need to make sure we update
-* quotas appropriately.  - 
 */
-   if (disko  root != BTRFS_I(src)-root 
-   disko != last_disko) {
-   no_quota = check_ref(trans, root,
-disko);
-   if (no_quota  0) {
-   btrfs_abort_transaction(trans,
-   root,
-   ret);
-   btrfs_end_transaction(trans,
- root);
-   ret = no_quota;
-   goto out;
-   }
-   }
-
if (disko) {
inode_add_bytes(inode, datal);
ret = btrfs_inc_extent_ref(trans, root,


 
 ## Results:
 
 - Last 'strace' lines
 6095  fchown(16, 0, 0)  = 0
 6095  fchmod(16, 0755)  = 0
 6095  utimensat(16, NULL, {{1402362275, 0}, {1438761285, 819041906}}, 0) = 0
 6095  flistxattr(15, , 100)   = 0
 6095  getdents(15, /* 3 entries */, 32768) = 80
 6095  newfstatat(15, coreutils.mo, {st_mode=S_IFREG|0644, st_size=357263, 
 ...}, AT_SYMLINK_NOFOLLOW) = 0
 6095  openat(15, coreutils.mo, O_RDONLY|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC) = 17
 6095  openat(16, coreutils.mo, 
 O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC, 0644) = 18
 6095  fstat(18, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
 6095  ioctl(18, BTRFS_IOC_CLONE
 
 - call trace in Kernel journal:
 Aug 05 10:10:03 moria kernel: 

Lockup in BTRFS_IOC_CLONE/Kernel 4.2.0-rc5

2015-08-05 Thread Elias Probst
I can reproduce a hard btrfs lockup (process issuing the ioctl() is in
D-state, same goes for btrfs-transacti process) on Kernel 4.2.0-rc5.

I had the same issue on 4.1, so it's unlikely a regression introduced in
4.2.

## With the following steps, I can reproduce the problem:

1. Create a new clean btrfs volume for /var/lib/machines
machinectl set-limit 6G

2. Paste this to /tmp/yum.conf
[main]
reposdir=/dev/null
gpgcheck=0
logfile=/var/log/yum.log
installroot=/var/lib/machines/centos7.1-base
assumeyes=1

[base]
name=CentOS 7.1.1503 - x86_64
baseurl=http://mirror.centos.org/centos/7.1.1503/os/x86_64/
enabled=1

3. Bootstrap a CentOS 7.1 base image
/usr/bin/yum -c /tmp/yum.conf groupinstall Base

4. Start an ephemeral systemd-nspawn container based on 'centos7.1-base'
strace -o /tmp/systemd-nspawn.out -s 500 -f systemd-nspawn -xbD
/var/lib/machines/centos7.1-base/


`systemd-nspawn` will now just hang forever.
I couldn't come up yet with a shorter/more low-level way to reproduce this as I 
lack quite a bit of btrfs experience.

## Results:

- Last 'strace' lines
6095  fchown(16, 0, 0)  = 0
6095  fchmod(16, 0755)  = 0
6095  utimensat(16, NULL, {{1402362275, 0}, {1438761285, 819041906}}, 0) = 0
6095  flistxattr(15, , 100)   = 0
6095  getdents(15, /* 3 entries */, 32768) = 80
6095  newfstatat(15, coreutils.mo, {st_mode=S_IFREG|0644, st_size=357263, 
...}, AT_SYMLINK_NOFOLLOW) = 0
6095  openat(15, coreutils.mo, O_RDONLY|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC) = 17
6095  openat(16, coreutils.mo, 
O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC, 0644) = 18
6095  fstat(18, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
6095  ioctl(18, BTRFS_IOC_CLONE

- call trace in Kernel journal:
Aug 05 10:10:03 moria kernel: INFO: task btrfs-transacti:4175 blocked for more 
than 120 seconds.
Aug 05 10:10:03 moria kernel:   Tainted: G   O4.2.0-rc5 #2
Aug 05 10:10:03 moria kernel: echo 0  
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Aug 05 10:10:03 moria kernel: btrfs-transacti D 8800b13279f8 0  4175
  2 0x00080080
Aug 05 10:10:03 moria kernel:  8800b13279f8 88018fd3a380 
8800ab4521c0 0246
Aug 05 10:10:03 moria kernel:  8800b1328000 88018d5c8518 
88018debdba0 880232d64990
Aug 05 10:10:03 moria kernel:  0197 8800b1327a18 
86999201 
Aug 05 10:10:03 moria kernel: Call Trace:
Aug 05 10:10:03 moria kernel:  [86999201] schedule+0x74/0x83
Aug 05 10:10:03 moria kernel:  [863ef8f0] btrfs_tree_lock+0xa7/0x1b7
Aug 05 10:10:03 moria kernel:  [86137ed7] ? wait_woken+0x74/0x74
Aug 05 10:10:03 moria kernel:  [8639d30f] push_leaf_right+0x9a/0x19f
Aug 05 10:10:03 moria kernel:  [8639dd9b] split_leaf+0x100/0x63f
Aug 05 10:10:03 moria kernel:  [86398f09] ? leaf_space_used+0xbb/0xea
Aug 05 10:10:03 moria kernel:  [863efa61] ? 
btrfs_set_lock_blocking_rw+0x52/0x95
Aug 05 10:10:03 moria kernel:  [8639ea46] 
btrfs_search_slot+0x76c/0x8b3
Aug 05 10:10:03 moria kernel:  [863a0107] 
btrfs_insert_empty_items+0x58/0xa3
Aug 05 10:10:03 moria kernel:  [8640805a] 
btrfs_insert_delayed_items+0x7f/0x3bb
Aug 05 10:10:03 moria kernel:  [8640842e] 
__btrfs_run_delayed_items+0x98/0x1c0
Aug 05 10:10:03 moria kernel:  [86408739] 
btrfs_run_delayed_items+0xc/0xe
Aug 05 10:10:03 moria kernel:  [863bdc50] 
btrfs_commit_transaction+0x298/0xb66
Aug 05 10:10:03 moria kernel:  [863be8d0] ? 
start_transaction+0x3b2/0x535
Aug 05 10:10:03 moria kernel:  [863b9cd9] 
transaction_kthread+0x100/0x1d6
Aug 05 10:10:03 moria kernel:  [863b9bd9] ? 
btrfs_cleanup_transaction+0x49f/0x49f
Aug 05 10:10:03 moria kernel:  [8611eca9] kthread+0xcd/0xd5
Aug 05 10:10:03 moria kernel:  [8611ebdc] ? 
kthread_create_on_node+0x17d/0x17d
Aug 05 10:10:03 moria kernel:  [8699d29f] ret_from_fork+0x3f/0x70
Aug 05 10:10:03 moria kernel:  [8611ebdc] ? 
kthread_create_on_node+0x17d/0x17d
Aug 05 10:10:03 moria kernel: INFO: task systemd-nspawn:6095 blocked for more 
than 120 seconds.
Aug 05 10:10:03 moria kernel:   Tainted: G   O4.2.0-rc5 #2
Aug 05 10:10:03 moria kernel: echo 0  
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Aug 05 10:10:03 moria kernel: systemd-nspawn  D 88019f3e3668 0  6095   
6090 0x00080083
Aug 05 10:10:03 moria kernel:  88019f3e3668 86e5d480 
88018fd3a380 0246
Aug 05 10:10:03 moria kernel:  88019f3e4000 88018debdc08 
88018fd3a380 88019f3e36b8
Aug 05 10:10:03 moria kernel:  88018fd3a380 88019f3e3688 
86999201 
Aug 05 10:10:03 moria kernel: Call Trace:
Aug 05 10:10:03 moria kernel:  [86999201] schedule+0x74/0x83
Aug 05 10:10:03 moria kernel:  [863ef64c] 
btrfs_tree_read_lock+0xc0/0xea
Aug 05 10:10:03 moria kernel:  [86137ed7] ?