** Attachment added: "flock_test.py"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2146310/+attachment/5956857/+files/flock_test.py

** Description changed:

  NFSv4 client stuck during state recovery on Ubuntu 22.04 (kernel 5.15)
  
  1. Environment
  
  -   Client OS: Ubuntu 22.04
  -   Kernel version: 5.15.0-113-generic
  -   NFS protocol: NFSv4.0
  
  Mount options: 10.59.62.51:/ on /nfs type nfs4
  
(rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.59.254.244,local_lock=none,addr=10.59.62.51)
  
  ------------------------------------------------------------------------
  
  2. Problem Description
  
  We observed two types of abnormal behaviors related to NFSv4 client
  state recovery.
  
  ------------------------------------------------------------------------
  
  Case 1: No recovery after NFS4ERR_STALE_CLIENTID
  
  Expected behavior (per RFC7530): - Client should re-establish client
  identity via SETCLIENTID / SETCLIENTID_CONFIRM - Client should reclaim
  state (open/lock)
  
  Actual behavior: - Client keeps retrying normal requests - No recovery
  process is triggered - No SETCLIENTID observed
  
  ------------------------------------------------------------------------
  
  Case 2: Client stuck during reclaim after lease expiration
  
  Scenario
  
  1.  Client stops sending RENEW due to network issue
  2.  Server considers the lease expired
  3.  After network recovery:
-     -   Client sends RENEW
-     -   Server responds with NFS4ERR_EXPIRED
+     -   Client sends RENEW
+     -   Server responds with NFS4ERR_EXPIRED
  4.  Client starts recovery:
-     -   SETCLIENTID succeeds
-     -   SETCLIENTID_CONFIRM succeeds
+     -   SETCLIENTID succeeds
+     -   SETCLIENTID_CONFIRM succeeds
  5.  Client enters reclaim phase(with open rpc reclaim=false)
- 
  
  Client gets stuck during reclaim phase.
  
- Stack trace: 
- [<0>] rpc_wait_bit_killable 
+ Stack trace:
+ [<0>] rpc_wait_bit_killable
  [<0>] __rpc_wait_for_completion_task
- [<0>] nfs4_run_open_task 
- [<0>] nfs4_open_recover_helper 
+ [<0>] nfs4_run_open_task
+ [<0>] nfs4_open_recover_helper
  [<0>] nfs4_open_recover
- [<0>] nfs4_do_open_expired 
- [<0>] nfs40_open_expired 
+ [<0>] nfs4_do_open_expired
+ [<0>] nfs40_open_expired
  [<0>] __nfs4_reclaim_open_state
- [<0>] nfs4_reclaim_open_state 
- [<0>] nfs4_do_reclaim 
+ [<0>] nfs4_reclaim_open_state
+ [<0>] nfs4_do_reclaim
  [<0>] nfs4_state_manager
- 
  
  ------------------------------------------------------------------------
  
  3. Reproduction Steps
  
  1.  Mount NFS filesystem (see above)
- 2.  Run workload scripts:
-     -   create_and_open.sh
-     -   flock_test.py
+ 2.  Run workload scripts(attachment below):
+     -   create_and_open.sh
+     -   flock_test.py
  3.  Restart NFS server during workload to cause the client lease to expire
  4.  Issue reproduces reliably
  
  ------------------------------------------------------------------------
  
  Additional stack traces
  
  create_and_open.sh
  ```
  [<0>] rpc_wait_bit_killable+0x25/0xb0 [sunrpc]
  [<0>] __rpc_wait_for_completion_task+0x2d/0x40 [sunrpc]
  [<0>] nfs4_do_close+0x2d7/0x380 [nfsv4]
  [<0>] __nfs4_close.constprop.0+0x11f/0x1f0 [nfsv4]
  [<0>] nfs4_close_sync+0x13/0x20 [nfsv4]
  [<0>] nfs4_close_context+0x35/0x60 [nfsv4]
  [<0>] __put_nfs_open_context+0xc7/0x150 [nfs]
  [<0>] nfs_file_clear_open_context+0x4c/0x60 [nfs]
  [<0>] nfs_file_release+0x3e/0x50 [nfs]
  [<0>] __fput+0x9c/0x280
  [<0>] ____fput+0xe/0x20
  [<0>] task_work_run+0x6a/0xb0
  [<0>] exit_to_user_mode_loop+0x157/0x160
  [<0>] exit_to_user_mode_prepare+0xa0/0xb0
  [<0>] syscall_exit_to_user_mode+0x27/0x50
  [<0>] do_syscall_64+0x63/0xb0
  [<0>] entry_SYSCALL_64_after_hwframe+0x67/0xd1
  ```
  
- 
  ------------------------------------------------------------------------
  
  flock_test.py
  ```
  [<0>] rpc_wait_bit_killable+0x25/0xb0 [sunrpc]
  [<0>] __rpc_wait_for_completion_task+0x2d/0x40 [sunrpc]
  [<0>] _nfs4_do_setlk+0x290/0x410 [nfsv4]
  [<0>] nfs4_proc_setlk+0x78/0x160 [nfsv4]
  [<0>] nfs4_retry_setlk+0x1dd/0x250 [nfsv4]
  [<0>] nfs4_proc_lock+0x9d/0x1b0 [nfsv4]
  [<0>] do_setlk+0x64/0x100 [nfs]
  [<0>] nfs_lock+0xb3/0x180 [nfs]
  [<0>] do_lock_file_wait+0x4f/0x120
  [<0>] fcntl_setlk+0x127/0x2e0
  [<0>] do_fcntl+0x4ce/0x5a0
  [<0>] __x64_sys_fcntl+0xa9/0xd0
  [<0>] x64_sys_call+0x1f5c/0x1fa0
  [<0>] do_syscall_64+0x56/0xb0
  [<0>] entry_SYSCALL_64_after_hwframe+0x67/0xd1
  ```
  
- 
  ------------------------------------------------------------------------
- 
  
  [10.59.62.51-man]
  ```
  [<0>] rpc_wait_bit_killable+0x25/0xb0 [sunrpc]
  [<0>] __rpc_wait_for_completion_task+0x2d/0x40 [sunrpc]
  [<0>] nfs4_run_open_task+0x152/0x1e0 [nfsv4]
  [<0>] nfs4_open_recover_helper+0x155/0x210 [nfsv4]
  [<0>] nfs4_open_recover+0x22/0xd0 [nfsv4]
  [<0>] nfs4_do_open_reclaim+0x128/0x220 [nfsv4]
  [<0>] nfs4_open_reclaim+0x42/0xa0 [nfsv4]
  [<0>] __nfs4_reclaim_open_state+0x25/0x110 [nfsv4]
  [<0>] nfs4_reclaim_open_state+0xd1/0x2c0 [nfsv4]
  [<0>] nfs4_do_reclaim+0x12f/0x230 [nfsv4]
  [<0>] nfs4_state_manager+0x6d9/0x870 [nfsv4]
  [<0>] nfs4_run_state_manager+0xa8/0x1a0 [nfsv4]
  [<0>] kthread+0x127/0x150
  [<0>] ret_from_fork+0x1f/0x30
  ```
  
  ------------------------------------------------------------------------
  
  4. Kernel Version Comparison
  
  Affected:
  Ubuntu 22.04 5.15.0-113-generic
  
  Not affected:
  Ubuntu 20.04 5.4.0-48-generic
  Ubuntu 22.04 6.8.0-60-generic
  Ubuntu 24.04 6.8.0-31-generic
  Centos 7.9 4.19.188-10.el7.ucloud.x86_64
  Centos 7.9 3.10.0-1062.9.1.el7.x86_64
  Centos 8.3 4.18.0-240.1.1.el8_3.x86_64
  
- 
  ------------------------------------------------------------------------
  
  5. Questions
  
  1.  Is it expected that no recovery is triggered after
-     NFS4ERR_STALE_CLIENTID?
+     NFS4ERR_STALE_CLIENTID?
  2.  During reclaim, should OPEN be sent with reclaim=true?
  3.  Could reclaim=false cause reclaim failure?
  4.  Why is client stuck in rpc_wait_bit_killable?
  5.  Is this a known issue in kernel 5.15?
  6.  Are there any related patches or fixes?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2146310

Title:
  NFSv4 client hang in OPEN reclaim path waiting for RPC completion

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2146310/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to