** Description changed:
SRU Justification:
- Impact: In pre-production at a mutual "hyperscaler" customer that is
- using the Ubuntu jammy kernel's NFS client with Hammerspace's pNFS
- flexfiles: NFS client deadlock occurred due to upstream commit
- 7be7b3ca16a59 ("NFS: Ensure we immediately start writeback on
- rescheduled writes"). Which was later fixed with upstream commit
- b1a28f2eb9ea7 ("NFS: nfs_async_write_reschedule_io must not recurse into
- the writeback code") in August 2022. But it unfortunately wasn't marked
- for stable@ at that time. That has since been rectified and Greg
- Kroah-Hartman has now picked it up for the next stable/linux-5.15.y
- kernel (but it hasn't yet appeared in the stable repo yet), please see:
+ Impact: In production at a mutual "hyperscaler" customer that is using
+ the Ubuntu jammy kernel's NFS client with Hammerspace's pNFS flexfiles:
+ NFS client deadlock occurred due to upstream commit 7be7b3ca16a59 ("NFS:
+ Ensure we immediately start writeback on rescheduled writes"). Which
+ was later fixed with upstream commit b1a28f2eb9ea7 ("NFS:
+ nfs_async_write_reschedule_io must not recurse into the writeback code")
+ in August 2022. But it unfortunately wasn't marked for stable@ at that
+ time. That has since been rectified and Greg Kroah-Hartman has now
+ picked it up for the next stable/linux-5.15.y kernel (but it hasn't yet
+ appeared in the stable repo yet), please see:
https://lore.kernel.org/stable/2024112146-tiptoeing-
available-c5fe@gregkh/T/
Fix: Apply upstream commit b1a28f2eb9ea7 ("NFS:
nfs_async_write_reschedule_io must not recurse into the writeback
code"), that commit was developed by and came from Trond Myklebust the
upstream Linux NFS client maintainer.
Testcase: Cause buffered IO issued by NFS client using pNFS flexfiles to hit
error paths (due to heavy enterprise use, with container limits being imposed,
which makes OOM within container particularly prone to hit error memory
allocation errors _and_ additional reason for NFS IO to be retransmitted, e.g.
due to volume down/up bounces). This can lead to deadlock in NFS due to
recursion with page locks already held, e.g.:
[<0>] wait_on_page_bit_common+0x10c/0x3d0
[<0>] wait_on_page_bit+0x3f/0x50
[<0>] wait_on_page_writeback+0x26/0x80
[<0>] write_cache_pages+0x138/0x460
[<0>] nfs_writepages+0x10d/0x200 [nfs]
[<0>] do_writepages+0xd4/0x200
[<0>] filemap_fdatawrite_wbc+0x89/0xe0
[<0>] filemap_fdatawrite_range+0x54/0x70
[<0>] nfs_async_write_reschedule_io+0x69/0x80 [nfs]
[<0>] ff_layout_reset_write+0x73/0xe0 [nfs_layout_flexfiles]
[<0>] ff_layout_write_release+0x7a/0x90 [nfs_layout_flexfiles]
[<0>] rpc_free_task+0x3d/0x70 [sunrpc]
[<0>] rpc_async_release+0x30/0x50 [sunrpc]
[<0>] process_one_work+0x228/0x3d0
[<0>] worker_thread+0x53/0x420
[<0>] kthread+0x127/0x150
[<0>] ret_from_fork+0x1f/0x30
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2089410
Title:
NFS: fix deadlock with pNFS flexfiles IO retry error path
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/kernel-package/+bug/2089410/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs