Re: [PATCH v3 1/2] btrfs: scrub: fix circular locking dependency warning

2018-12-04 Thread David Sterba
On Fri, Nov 30, 2018 at 01:15:23PM +0800, Anand Jain wrote:
> @@ -3757,10 +3757,13 @@ static noinline_for_stack int 
> scrub_workers_get(struct btrfs_fs_info *fs_info,
>  
>  static noinline_for_stack void scrub_workers_put(struct btrfs_fs_info 
> *fs_info)
>  {
> + lockdep_assert_held(_info->scrub_lock);
>   if (--fs_info->scrub_workers_refcnt == 0) {
> + mutex_unlock(_info->scrub_lock);
>   btrfs_destroy_workqueue(fs_info->scrub_workers);
>   btrfs_destroy_workqueue(fs_info->scrub_wr_completion_workers);
>   btrfs_destroy_workqueue(fs_info->scrub_parity_workers);
> + mutex_lock(_info->scrub_lock);
>   }
>   WARN_ON(fs_info->scrub_workers_refcnt < 0);
>  }

btrfs/011 lockdep warning is gone, but now there's a list corruption
reported by btrfs/073. I'm testing the 2 patches on top of current
master to avoid interference with misc-next patches.

btrfs/073   [11:07:19]
[ 3580.466293] run fstests btrfs/073at 2018-12-04 11:07:19
[ 3580.610367] BTRFS info (device vda): disk space caching is enabled
[ 3580.612809] BTRFS info (device vda): has skinny extents
[ 3580.876639] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 1 
transid 5 /dev/vdb
[ 3580.880569] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 2 
transid 5 /dev/vdc
[ 3580.882947] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 3 
transid 5 /dev/vdd
[ 3580.885499] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 4 
transid 5 /dev/vde
[ 3580.887972] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 5 
transid 5 /dev/vdf
[ 3580.890394] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 6 
transid 5 /dev/vdg
[ 3580.893729] BTRFS: device fsid d452261d-c956-4b54-aab9-8318c3c211fcdevid 7 
transid 5 /dev/vdh
[ 3580.903180] BTRFS info (device vdb): disk space caching is enabled
[ 3580.904538] BTRFS info (device vdb): has skinny extents
[ 3580.905322] BTRFS info (device vdb): flagging fs with big metadatafeature
[ 3580.908555] BTRFS info (device vdb): checking UUID tree
[ 3580.951440] [ cut here ]
[ 3580.954122] list_add corruption. prev->next should be 
next(a189faa9acc8), but was a189faa9adc0. (prev=a189faa9a480).
[ 3580.960061] WARNING: CPU: 0 PID: 24578 at 
lib/list_debug.c:28__list_add_valid+0x4d/0x70
[ 3580.962346] BTRFS info (device vdb): use no compression, level 0
[ 3580.963694] Modules linked in: dm_flakey dm_mod btrfs libcrc32c 
xorzstd_decompress zstd_compress xxhash raid6_pq loop
[ 3580.963702] CPU: 0 PID: 24578 Comm: btrfs Not tainted4.20.0-rc5-default+ #366
[ 3580.963703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 
rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[ 3580.963706] RIP: 0010:__list_add_valid+0x4d/0x70
[ 3580.963708] RSP: 0018:b72d88817b90 EFLAGS: 00010286
[ 3580.963709] RAX:  RBX: b72d88817c10 RCX:
[ 3580.963710] RDX: 0002 RSI: 0001 RDI:860c4c1d
[ 3580.963710] RBP: a189faa9acc8 R08: 0001 R09:
[ 3580.963714] R10:  R11: 882a0a2d R12:a189faa9ac70
[ 3580.966020] BTRFS info (device vdb): disk space caching is enabled
[ 3580.967721] R13: a189faa9a480 R14: a189faa9ac70 R15:a189faa9ac78
[ 3580.967724] FS:  7f04289d1700() 
GS:a189fd40()knlGS:
[ 3580.967725] CS:  0010 DS:  ES:  CR0: 80050033
[ 3580.967725] CR2: 7f7b58b034ac CR3: 699d9000 CR4:06f0
[ 3580.967729] Call Trace:
[ 3580.967736]  __mutex_add_waiter+0x34/0x70
[ 3580.967743]  ? drain_workqueue+0x1e/0x180
[ 3580.994465]  __mutex_lock+0x134/0x9d0
[ 3580.995526]  ? __schedule+0x2eb/0xb20
[ 3580.996584]  ? drain_workqueue+0x1e/0x180
[ 3580.997727]  drain_workqueue+0x1e/0x180
[ 3580.998793]  destroy_workqueue+0x17/0x240
[ 3580.999879]  btrfs_destroy_workqueue+0x57/0x200 [btrfs]
[ 3581.001148]  scrub_workers_put+0x6c/0x90 [btrfs]
[ 3581.002257]  btrfs_scrub_dev+0x2f6/0x590 [btrfs]
[ 3581.003370]  ? __sb_start_write+0x12c/0x1d0
[ 3581.004450]  ? mnt_want_write_file+0x24/0x60
[ 3581.005613]  btrfs_ioctl+0xfc7/0x2f00 [btrfs]
[ 3581.006361]  ? get_task_io_context+0x2d/0x90
[ 3581.007025]  ? do_vfs_ioctl+0xa2/0x6d0
[ 3581.007699]  do_vfs_ioctl+0xa2/0x6d0
[ 3581.008603]  ? __fget+0x109/0x1e0
[ 3581.009413]  ksys_ioctl+0x3a/0x70
[ 3581.010326]  __x64_sys_ioctl+0x16/0x20
[ 3581.011234]  do_syscall_64+0x54/0x180
[ 3581.012143]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 3581.013213] RIP: 0033:0x7f042bac9aa7
[ 3581.014133] Code: 00 00 90 48 8b 05 f1 83 2c 00 64 c7 00 26 00 00 0048 c7 c0 
ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f05 <48> 3d 01 f0 
ff ff 73 01 c3 48 8b 0d c1 83 2c 00 f7 d8 64 89 01 48
[ 3581.018486] RSP: 002b:7f04289d0d38 EFLAGS: 0246 
ORIG_RAX:0010
[ 3581.019703] RAX: ffda RBX: 55c05475de10 

[PATCH v3 1/2] btrfs: scrub: fix circular locking dependency warning

2018-11-29 Thread Anand Jain
Circular locking dependency check reports warning[1], that's because
the btrfs_scrub_dev() calls the stack #0 below with, the
fs_info::scrub_lock held. The test case leading to this warning..

  mkfs.btrfs -fq /dev/sdb && mount /dev/sdb /btrfs
  btrfs scrub start -B /btrfs

In fact we have fs_info::scrub_workers_refcnt to tack if the init and
destroy of the scrub workers are needed. So once we have incremented
and decremented the fs_info::scrub_workers_refcnt value in the thread,
its ok to drop the scrub_lock, and then actually do the
btrfs_destroy_workqueue() part. So this patch drops the scrub_lock
before calling btrfs_destroy_workqueue().

[1]
[   76.146826] ==
[   76.147086] WARNING: possible circular locking dependency detected
[   76.147316] 4.20.0-rc3+ #41 Not tainted
[   76.147489] --
[   76.147722] btrfs/4065 is trying to acquire lock:
[   76.147984] 38593bc0 ((wq_completion)"%s-%s""btrfs",
name){+.+.}, at: flush_workqueue+0x70/0x4d0
[   76.148337]
but task is already holding lock:
[   76.148594] 62392ab7 (_info->scrub_lock){+.+.}, at:
btrfs_scrub_dev+0x316/0x5d0 [btrfs]
[   76.148909]
which lock already depends on the new lock.

[   76.149191]
the existing dependency chain (in reverse order) is:
[   76.149446]
-> #3 (_info->scrub_lock){+.+.}:
[   76.149707]btrfs_scrub_dev+0x11f/0x5d0 [btrfs]
[   76.149924]btrfs_ioctl+0x1ac3/0x2d80 [btrfs]
[   76.150216]do_vfs_ioctl+0xa9/0x6d0
[   76.150468]ksys_ioctl+0x60/0x90
[   76.150716]__x64_sys_ioctl+0x16/0x20
[   76.150911]do_syscall_64+0x50/0x180
[   76.151182]entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   76.151469]
-> #2 (_devs->device_list_mutex){+.+.}:
[   76.151851]reada_start_machine_worker+0xca/0x3f0 [btrfs]
[   76.152195]normal_work_helper+0xf0/0x4c0 [btrfs]
[   76.152489]process_one_work+0x1f4/0x520
[   76.152751]worker_thread+0x46/0x3d0
[   76.153715]kthread+0xf8/0x130
[   76.153912]ret_from_fork+0x3a/0x50
[   76.154178]
-> #1 ((work_completion)(>normal_work)){+.+.}:
[   76.154575]worker_thread+0x46/0x3d0
[   76.154828]kthread+0xf8/0x130
[   76.155108]ret_from_fork+0x3a/0x50
[   76.155357]
-> #0 ((wq_completion)"%s-%s""btrfs", name){+.+.}:
[   76.155751]flush_workqueue+0x9a/0x4d0
[   76.155911]drain_workqueue+0xca/0x1a0
[   76.156182]destroy_workqueue+0x17/0x230
[   76.156455]btrfs_destroy_workqueue+0x5d/0x1c0 [btrfs]
[   76.156756]scrub_workers_put+0x2e/0x60 [btrfs]
[   76.156931]btrfs_scrub_dev+0x329/0x5d0 [btrfs]
[   76.157219]btrfs_ioctl+0x1ac3/0x2d80 [btrfs]
[   76.157491]do_vfs_ioctl+0xa9/0x6d0
[   76.157742]ksys_ioctl+0x60/0x90
[   76.157910]__x64_sys_ioctl+0x16/0x20
[   76.158177]do_syscall_64+0x50/0x180
[   76.158429]entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   76.158716]
other info that might help us debug this:

[   76.158908] Chain exists of:
  (wq_completion)"%s-%s""btrfs", name --> _devs->device_list_mutex
--> _info->scrub_lock

[   76.159629]  Possible unsafe locking scenario:

[   76.160607]CPU0CPU1
[   76.160934]
[   76.161210]   lock(_info->scrub_lock);
[   76.161458]
lock(_devs->device_list_mutex);
[   76.161805]
lock(_info->scrub_lock);
[   76.161909]   lock((wq_completion)"%s-%s""btrfs", name);
[   76.162201]
 *** DEADLOCK ***

[   76.162627] 2 locks held by btrfs/4065:
[   76.162897]  #0: bef2775b (sb_writers#12){.+.+}, at:
mnt_want_write_file+0x24/0x50
[   76.163335]  #1: 62392ab7 (_info->scrub_lock){+.+.}, at:
btrfs_scrub_dev+0x316/0x5d0 [btrfs]
[   76.163796]
stack backtrace:
[   76.163911] CPU: 1 PID: 4065 Comm: btrfs Not tainted 4.20.0-rc3+ #41
[   76.164228] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[   76.164646] Call Trace:
[   76.164872]  dump_stack+0x5e/0x8b
[   76.165128]  print_circular_bug.isra.37+0x1f1/0x1fe
[   76.165398]  __lock_acquire+0x14aa/0x1620
[   76.165652]  lock_acquire+0xb0/0x190
[   76.165910]  ? flush_workqueue+0x70/0x4d0
[   76.166175]  flush_workqueue+0x9a/0x4d0
[   76.166420]  ? flush_workqueue+0x70/0x4d0
[   76.166671]  ? drain_workqueue+0x52/0x1a0
[   76.166911]  drain_workqueue+0xca/0x1a0
[   76.167167]  destroy_workqueue+0x17/0x230
[   76.167428]  btrfs_destroy_workqueue+0x5d/0x1c0 [btrfs]
[   76.167720]  scrub_workers_put+0x2e/0x60 [btrfs]
[   76.168233]  btrfs_scrub_dev+0x329/0x5d0 [btrfs]
[   76.168504]  ? __sb_start_write+0x121/0x1b0
[   76.168759]  ? mnt_want_write_file+0x24/0x50
[   76.169654]  btrfs_ioctl+0x1ac3/0x2d80 [btrfs]
[   76.169934]  ? find_held_lock+0x2d/0x90
[   76.170204]  ? find_held_lock+0x2d/0x90
[   76.170450]  do_vfs_ioctl+0xa9/0x6d0
[   76.170690]  ? __fget+0x101/0x1f0
[   76.170910]  ? __fget+0x5/0x1f0
[