** Description changed:

+ [ Impact ]
+ 
+ The `ocf:heartbeat:nfsserver` agent fails to stop an NFS server resource
+ configured with `nfs_shared_infodir`.
+ 
+ The nfs-utils package now brings `fsidd`, which starts together with
+ nfs-server. When the agents stops the server, it does not stop `fsidd`,
+ which holds files in `/var/lib/nfs`, and then the operation fails with a
+ "Failed to unmount a bind mount" / "target is busy" error.
+ 
+ This is particularly bad for HA clusters as a resource ends up stuck
+ (failed-stop). The report states a 24.04 machine, where the bug is
+ reproducible.
+ 
+ There is a fix upstream which can be backported to the current versions
+ in stonking, resolute, questing and noble.
+ 
+ [ Test Plan ]
+ 
+ - Launch a fresh Ubuntu machine.
+ - Install nfs-kernel-server and resource-agents-extra.
+ - Verify fsidd.service is present
+ - Create the shared_infodir folder
+ - Start a one-node cluster with corosync+pacemaker
+ - Create the nfsserver resource with:
+   - sudo pcs resource create nfs-daemon ocf:heartbeat:nfsserver \
+     nfs_shared_infodir=/srv/nfs_shared_infodir nfs_ip=127.0.0.1
+ - Start the agent:
+   - sudo pcs resource debug-start nfs-daemon
+ - Stop the agent:
+   - sudo pcs resource debug-start nfs-daemon
+ 
+ Using the version from the Ubuntu release, the last stop fails with the 
unmount error.
+ Using the version from proposed, all works without errors.
+ 
+ [ Where problems could occur ]
+ 
+ The fact that the change comes from upstream adds reliability - we don't 
expect any obvious failures. The change only affects the `nfsserver` agent and 
is bound by a systemctl guard to make sure it only tries to stop `fsidd` when 
it's present.
+  
+ The patch stops `fsidd`  when present, but throws an error when that is not 
possible. If the stop operation is slow or fails for some reason, then it will 
break the operation - but that is not a regression, as it was broken before 
anyway.
+ 
+ Misapplying the patch would generate different problems which could be
+ found in the test phase.
+ 
+ [ Other Info ]
+ 
+ Upstream references:
+ 
https://github.com/ClusterLabs/resource-agents/commit/64b16e5741a582d0553e118d4df6136b363f46c9
+ 
https://github.com/ClusterLabs/resource-agents/commit/57d74e911f5f94bf298ca1b1e3fe58fce84e58e8
+ 
+ [ Original Description ]
+ 
  This is related to and sounds very similar to
  https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/2065848,
  but differs enough to warrant another bug.
  
  # lsb_release -rd
  Description: Ubuntu 24.04.4 LTS
  Release: 24.04
  
  Package version:
  
  # apt-cache policy resource-agents-extra
  resource-agents-extra:
-   Installed: 1:4.13.0-1ubuntu4
-   Candidate: 1:4.13.0-1ubuntu4
-   Version table:
-  *** 1:4.13.0-1ubuntu4 500
-         500 http://archive.ubuntu.com/ubuntu noble/universe amd64 Packages
-         100 /var/lib/dpkg/status
+   Installed: 1:4.13.0-1ubuntu4
+   Candidate: 1:4.13.0-1ubuntu4
+   Version table:
+  *** 1:4.13.0-1ubuntu4 500
+         500 http://archive.ubuntu.com/ubuntu noble/universe amd64 Packages
+         100 /var/lib/dpkg/status
  
  Expected behavior:
  
  Stopping the NFS server succeeds
  
  Actual behavior:
  
  The NFS server is stopped but the nfsserver resource's stop process
  reports a failure, because /var/lib/nfs filesystem fails to unmount,
  because fsidd is holding files open and is never stopped before
  attempting to unmount.
  
  Analysis:
  
  At some point, the "fsidd" service was added to the nfs-kernel-server
  package. The systemd unit supplied in this package starts fsidd before
  nfs-server:
  
-     [Unit]
-     Description=NFS FSID Daemon
-     After=local-fs.target
-     Before=nfs-mountd.service nfs-server.service
-     
-     [Service]
-     ExecStart=/usr/sbin/fsidd
-     
-     [Install]
-     RequiredBy=nfs-mountd.service nfs-server.service
+     [Unit]
+     Description=NFS FSID Daemon
+     After=local-fs.target
+     Before=nfs-mountd.service nfs-server.service
+ 
+     [Service]
+     ExecStart=/usr/sbin/fsidd
+ 
+     [Install]
+     RequiredBy=nfs-mountd.service nfs-server.service
  
  The fsidd process holds open a file in the nfs info dir:
  
-     # lsof -p 57130 | grep /var/lib/nfs
-     fsidd   57130 root    3u      REG              147,0    16384 16777352 
/var/lib/nfs/reexpdb.sqlite3
+     # lsof -p 57130 | grep /var/lib/nfs
+     fsidd   57130 root    3u      REG              147,0    16384 16777352 
/var/lib/nfs/reexpdb.sqlite3
  
  A combination of factors causes this service not to exit, and prevent
  the resource-agent script from reporting success in stopping the nfs
  server.
  
  If the nfs info dir is mounted from somewhere - when using
  nfs_shared_infodir - this process needs to exit before the
  nfs_shared_infodir is unmounted.
  
  When stopping nfs-server, the resource-agent script either uses
  /etc/init.d/nfs-kernel-server if present or systemctl commands, as
  mentioned in https://bugs.launchpad.net/ubuntu/+source/resource-
  agents/+bug/2065848/comments/10. Leaving /etc/init.d/nfs-kernel-server
  in place causes additional problems, as mentioned in
  https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/2065848,
  so I have removed it on my test system.
  
  Regardless of how nfs-server is stopped, fsidd is not stopped. This
  causes /var/lib/nfs to fail to unmount, and the error to bubble up.
  
  For my use case - an NFS server managed by corosync/pacemaker- this
  means failover to other hosts does not work as the existing NFS server
  cannot be stopped.
  
  This issue can be worked around by providing a systemd unit drop-in,
  that declares fsidd PartOf nfs-server:
  
  # cat > /etc/systemd/system/fsidd.service.d/nfs-server-dependency.conf << 
'EOF'
  [Unit]
  PartOf=nfs-server.service
  EOF
  
  # systemctl daemon-reload
  
  With this in place, fsidd is automatically stopped when nfs-server is
  stopped, which means /var/lib/nfs can then be unmounted successfully.
  
  I suspect nfs-kernel-server was updated adding this fsidd service, and
  resource-agents was not updated to match. Or perhaps leaving fsidd
  running even when nfs-server exits is sloppy. It's not my call.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2145790

Title:
  ocf:heartbeat:nfsserver resource's stop operation fails due to
  /var/lib/nfs filesystem failing to unmount

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/2145790/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to