** Changed in: zfs-linux (Ubuntu Hirsute)
Status: In Progress => Fix Released
** Changed in: zfs-linux (Ubuntu Hirsute)
Assignee: Heitor Alves de Siqueira (halves) => (unassigned)
** Description changed:
[Impact]
TXG sync stalls, causing ZFS workloads to hang indefinitely
[Description]
For certain ZFS workloads, we can see hung task timeouts in the kernel logs
due to a transaction group deadlock. Userspace process will hang and display
stack traces similar to the one below:
[49181.619711] clnt_server D 0 21699 28868 0x00000320
[49181.619715] Call Trace:
[49181.619725] __schedule+0x24e/0x880
[49181.619730] schedule+0x2c/0x80
[49181.619750] cv_wait_common+0x11e/0x140 [spl]
[49181.619763] ? wait_woken+0x80/0x80
[49181.619775] __cv_wait+0x15/0x20 [spl]
[49181.619872] zil_commit.part.14+0x80/0x8c0 [zfs]
[49181.619884] ? _cond_resched+0x19/0x40
[49181.619887] ? mutex_lock+0x12/0x40
[49181.619959] zil_commit+0x17/0x20 [zfs]
[49181.620026] zfs_fsync+0x77/0xe0 [zfs]
[49181.620093] zpl_fsync+0x68/0xa0 [zfs]
[49181.620100] vfs_fsync_range+0x51/0xb0
[49181.620105] do_fsync+0x3d/0x70
[49181.620109] SyS_fsync+0x10/0x20
[49181.620114] do_syscall_64+0x73/0x130
[49181.620119] entry_SYSCALL_64_after_hwframe+0x41/0xa6
We also might see a kworker thread blocking in the zfs writeback/evict path:
[49181.881570] kworker/u17:3 D 0 4915 2 0x80000000
[49181.881576] Workqueue: writeback wb_workfn (flush-zfs-10)
[49181.881577] Call Trace:
[49181.881580] __schedule+0x24e/0x880
[49181.881582] ? atomic_t_wait+0x60/0x60
[49181.881584] schedule+0x2c/0x80
[49181.881588] bit_wait+0x11/0x60
[49181.881592] __wait_on_bit+0x4c/0x90
[49181.881596] ? atomic_t_wait+0x60/0x60
[49181.881599] __inode_wait_for_writeback+0xb9/0xf0
[49181.881601] ? bit_waitqueue+0x40/0x40
[49181.881605] inode_wait_for_writeback+0x26/0x40
[49181.881609] evict+0xb5/0x1a0
[49181.881611] iput+0x19c/0x230
[49181.881648] zfs_iput_async+0x1d/0x80 [zfs]
[49181.881682] zfs_get_data+0x1d4/0x2a0 [zfs]
[49181.881718] zil_commit.part.14+0x640/0x8c0 [zfs]
[49181.881752] zil_commit+0x17/0x20 [zfs]
[49181.881784] zpl_writepages+0xd5/0x160 [zfs]
[49181.881787] do_writepages+0x4b/0xe0
[49181.881790] __writeback_single_inode+0x45/0x350
[49181.881792] ? __writeback_single_inode+0x45/0x350
[49181.881794] writeback_sb_inodes+0x1d7/0x530
[49181.881796] wb_writeback+0xfb/0x300
[49181.881799] wb_workfn+0xad/0x400
[49181.881800] ? wb_workfn+0xad/0x400
[49181.881803] ? __switch_to_asm+0x35/0x70
[49181.881809] process_one_work+0x1de/0x420
[49181.881811] worker_thread+0x32/0x410
[49181.881813] kthread+0x121/0x140
[49181.881815] ? process_one_work+0x420/0x420
[49181.881817] ? kthread_create_worker_on_cpu+0x70/0x70
[49181.881819] ret_from_fork+0x35/0x40
This is caused by a race between ZFS writeback and evict threads,
usually during a transaction group sync operation. It's possible to have
two iput() threads racing for the same inode: one of them scheduled
async and the other executed synchronously as part of the writeback
path. If the writeback thread tries to evict the inode while the async
thread is running, it might re-enter the block layer for the same inode
due to ZFS counters being in an inconsistent state. This then causes the
kworker thread to stall the writeback, which in turn prevents the
transaction group sync to complete and locks other ZFS threads.
This is fixed by the upstream commit:
- - Fix zrele race in zrele_async that can cause hang (2921ad6cba54) [0]
+ - Fix zrele race in zrele_async that can cause hang (43eaef6de817) [0]
[Test Case]
Being a race condition, this issue has been hard to reproduce consistently.
This has been reported on heavy I/O workloads, mixing file creation and
deletion. We have some reports both from upstream and from Ubuntu users that
this is usually reproducible on e.g. heavy SQL workloads or on complex
ccache-enabled builds [1].
[0] https://github.com/openzfs/zfs/pull/11530
[1] https://github.com/openzfs/zfs/issues/11527
[Regression Potential]
The patch has been tested in the ZFS test suite and in production
environments, so the potential for further regressions should be fairly
controlled. Potential regressions might arise in the ZFS writeback path, so we
should monitor heavy I/O workloads that put a lot of stress in the sync and
evict paths.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916486
Title:
zfs_zrele_async can cause txg sync deadlocks
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1916486/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs