[Bug 2146310] Re: NFSv4 client hang in OPEN reclaim path waiting for RPC completion

wei.guo Mon, 30 Mar 2026 05:11:00 -0700

** Description changed:

- Hi,
+ NFSv4 client stuck during state recovery on Ubuntu 22.04 (kernel 5.15)
  
- We are seeing an NFSv4.0 client hang on Linux kernel 5.15 (Ubuntu
- 22.04).
+ 1. Environment
  
- The issue starts when the server returns NFS4ERR_EXPIRED. The client
- then enters recovery, but reclaim never completes.
+ -   Client OS: Ubuntu 22.04
+ -   Kernel version: 5.15.0-113-generic
+ -   NFS protocol: NFSv4.0
  
- The state manager thread is stuck with the following stack:
+ Mount options: 10.59.62.51:/ on /nfs type nfs4
+ 
(rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.59.254.244,local_lock=none,addr=10.59.62.51)
  
+ ------------------------------------------------------------------------
+ 
+ 2. Problem Description
+ 
+ We observed two types of abnormal behaviors related to NFSv4 client
+ state recovery.
+ 
+ ------------------------------------------------------------------------
+ 
+ Case 1: No recovery after NFS4ERR_STALE_CLIENTID
+ 
+ Expected behavior (per RFC7530): - Client should re-establish client
+ identity via SETCLIENTID / SETCLIENTID_CONFIRM - Client should reclaim
+ state (open/lock)
+ 
+ Actual behavior: - Client keeps retrying normal requests - No recovery
+ process is triggered - No SETCLIENTID observed
+ 
+ ------------------------------------------------------------------------
+ 
+ Case 2: Client stuck during reclaim after lease expiration
+ 
+ Scenario
+ 
+ 1.  Client stops sending RENEW due to network issue
+ 2.  Server considers the lease expired
+ 3.  After network recovery:
+     -   Client sends RENEW
+     -   Server responds with NFS4ERR_EXPIRED
+ 4.  Client starts recovery:
+     -   SETCLIENTID succeeds
+     -   SETCLIENTID_CONFIRM succeeds
+ 5.  Client enters reclaim phase(with open rpc reclaim=false)
+ 
+ 
+ Client gets stuck during reclaim phase.
+ 
+ Stack trace: 
+ [<0>] rpc_wait_bit_killable 
+ [<0>] __rpc_wait_for_completion_task
+ [<0>] nfs4_run_open_task 
+ [<0>] nfs4_open_recover_helper 
+ [<0>] nfs4_open_recover
+ [<0>] nfs4_do_open_expired 
+ [<0>] nfs40_open_expired 
+ [<0>] __nfs4_reclaim_open_state
+ [<0>] nfs4_reclaim_open_state 
+ [<0>] nfs4_do_reclaim 
+ [<0>] nfs4_state_manager
+ 
+ 
+ ------------------------------------------------------------------------
+ 
+ 3. Reproduction Steps
+ 
+ 1.  Mount NFS filesystem (see above)
+ 2.  Run workload scripts:
+     -   create_and_open.sh
+     -   flock_test.py
+ 3.  Restart NFS server during workload to cause the client lease to expire
+ 4.  Issue reproduces reliably
+ 
+ ------------------------------------------------------------------------
+ 
+ Additional stack traces
+ 
+ create_and_open.sh
  ```
- rpc_wait_bit_killable
- __rpc_wait_for_completion_task
- nfs4_run_open_task
- nfs4_open_recover_helper
- nfs4_open_recover
- nfs4_do_open_expired
- nfs40_open_expired
- __nfs4_reclaim_open_state
- nfs4_reclaim_open_state
- nfs4_do_reclaim
- nfs4_state_manager
+ [<0>] rpc_wait_bit_killable+0x25/0xb0 [sunrpc]
+ [<0>] __rpc_wait_for_completion_task+0x2d/0x40 [sunrpc]
+ [<0>] nfs4_do_close+0x2d7/0x380 [nfsv4]
+ [<0>] __nfs4_close.constprop.0+0x11f/0x1f0 [nfsv4]
+ [<0>] nfs4_close_sync+0x13/0x20 [nfsv4]
+ [<0>] nfs4_close_context+0x35/0x60 [nfsv4]
+ [<0>] __put_nfs_open_context+0xc7/0x150 [nfs]
+ [<0>] nfs_file_clear_open_context+0x4c/0x60 [nfs]
+ [<0>] nfs_file_release+0x3e/0x50 [nfs]
+ [<0>] __fput+0x9c/0x280
+ [<0>] ____fput+0xe/0x20
+ [<0>] task_work_run+0x6a/0xb0
+ [<0>] exit_to_user_mode_loop+0x157/0x160
+ [<0>] exit_to_user_mode_prepare+0xa0/0xb0
+ [<0>] syscall_exit_to_user_mode+0x27/0x50
+ [<0>] do_syscall_64+0x63/0xb0
+ [<0>] entry_SYSCALL_64_after_hwframe+0x67/0xd1
  ```
  
- Meanwhile:
- - The server repeatedly returns NFS4ERR_EXPIRED
- - The client does not successfully reclaim state
- - IO continues and repeatedly fails
  
- RPC stats show:
- - ~30M calls
- - very low retransmissions (94)
+ ------------------------------------------------------------------------
  
- This suggests the issue is unlikely to be caused by network loss or
- server unresponsiveness.
+ flock_test.py
+ ```
+ [<0>] rpc_wait_bit_killable+0x25/0xb0 [sunrpc]
+ [<0>] __rpc_wait_for_completion_task+0x2d/0x40 [sunrpc]
+ [<0>] _nfs4_do_setlk+0x290/0x410 [nfsv4]
+ [<0>] nfs4_proc_setlk+0x78/0x160 [nfsv4]
+ [<0>] nfs4_retry_setlk+0x1dd/0x250 [nfsv4]
+ [<0>] nfs4_proc_lock+0x9d/0x1b0 [nfsv4]
+ [<0>] do_setlk+0x64/0x100 [nfs]
+ [<0>] nfs_lock+0xb3/0x180 [nfs]
+ [<0>] do_lock_file_wait+0x4f/0x120
+ [<0>] fcntl_setlk+0x127/0x2e0
+ [<0>] do_fcntl+0x4ce/0x5a0
+ [<0>] __x64_sys_fcntl+0xa9/0xd0
+ [<0>] x64_sys_call+0x1f5c/0x1fa0
+ [<0>] do_syscall_64+0x56/0xb0
+ [<0>] entry_SYSCALL_64_after_hwframe+0x67/0xd1
+ ```
  
- Additionally, we have verified that:
- - Network connectivity is stable
- - The NFS server is operating normally (no restart or failover observed)
  
- Importantly:
- - We do observe that RENEW/SEQUENCE-related traffic is being sent from the 
client
- - However, the client still ends up with an expired lease (NFS4ERR_EXPIRED)
+ ------------------------------------------------------------------------
  
- This raises the question whether the lease renewal is not being properly
- processed or completed on the client side.
  
- Given that we are using NFSv4.1 (where lease renewal is implicit via
- SEQUENCE), we would like to understand:
+ [10.59.62.51-man]
+ ```
+ [<0>] rpc_wait_bit_killable+0x25/0xb0 [sunrpc]
+ [<0>] __rpc_wait_for_completion_task+0x2d/0x40 [sunrpc]
+ [<0>] nfs4_run_open_task+0x152/0x1e0 [nfsv4]
+ [<0>] nfs4_open_recover_helper+0x155/0x210 [nfsv4]
+ [<0>] nfs4_open_recover+0x22/0xd0 [nfsv4]
+ [<0>] nfs4_do_open_reclaim+0x128/0x220 [nfsv4]
+ [<0>] nfs4_open_reclaim+0x42/0xa0 [nfsv4]
+ [<0>] __nfs4_reclaim_open_state+0x25/0x110 [nfsv4]
+ [<0>] nfs4_reclaim_open_state+0xd1/0x2c0 [nfsv4]
+ [<0>] nfs4_do_reclaim+0x12f/0x230 [nfsv4]
+ [<0>] nfs4_state_manager+0x6d9/0x870 [nfsv4]
+ [<0>] nfs4_run_state_manager+0xa8/0x1a0 [nfsv4]
+ [<0>] kthread+0x127/0x150
+ [<0>] ret_from_fork+0x1f/0x30
+ ```
  
- 1. Under what conditions could the client still hit NFS4ERR_EXPIRED despite 
ongoing renew/SEQUENCE activity and a healthy server/network?
- 2. Is it possible that RPC completion, session slot handling, or sequence 
handling issues could prevent the lease from being effectively renewed?
- 3. Could this be a known issue in the NFSv4.1 recovery or session handling 
path in 5.15?
+ ------------------------------------------------------------------------
  
- It appears the client is stuck in the OPEN reclaim path waiting for RPC
- completion, and recovery cannot make forward progress.
+ 4. Kernel Version Comparison
  
- Are there known fixes or patches in newer kernels (e.g., 5.19 or 6.x)
- that address this class of issue?
+ Affected:
+ Ubuntu 22.04 5.15.0-113-generic
  
- Any pointers or suggestions would be greatly appreciated.
+ Not affected:
+ Ubuntu 20.04 5.4.0-48-generic
+ Ubuntu 22.04 6.8.0-60-generic
+ Ubuntu 24.04 6.8.0-31-generic
+ Centos 7.9 4.19.188-10.el7.ucloud.x86_64
+ Centos 7.9 3.10.0-1062.9.1.el7.x86_64
+ Centos 8.3 4.18.0-240.1.1.el8_3.x86_64
  
- Thanks
+ 
+ ------------------------------------------------------------------------
+ 
+ 5. Questions
+ 
+ 1.  Is it expected that no recovery is triggered after
+     NFS4ERR_STALE_CLIENTID?
+ 2.  During reclaim, should OPEN be sent with reclaim=true?
+ 3.  Could reclaim=false cause reclaim failure?
+ 4.  Why is client stuck in rpc_wait_bit_killable?
+ 5.  Is this a known issue in kernel 5.15?
+ 6.  Are there any related patches or fixes?


** Attachment added: "create_and_open.sh"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2146310/+attachment/5956856/+files/create_and_open.sh

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2146310

Title:
  NFSv4 client hang in OPEN reclaim path waiting for RPC completion

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2146310/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2146310] Re: NFSv4 client hang in OPEN reclaim path waiting for RPC completion

Reply via email to