** Description changed: [ Impact ] The `ocf:heartbeat:nfsserver` agent fails to stop an NFS server resource configured with `nfs_shared_infodir`. The nfs-utils package now brings `fsidd`, which starts together with nfs-server. When the agents stops the server, it does not stop `fsidd`, which holds files in `/var/lib/nfs`, and then the operation fails with a "Failed to unmount a bind mount" / "target is busy" error. This is particularly bad for HA clusters as a resource ends up stuck (failed-stop). The report states a 24.04 machine, where the bug is reproducible. There is a fix upstream which can be backported to the current versions in stonking, resolute, questing and noble. [ Test Plan ] - Launch a fresh Ubuntu machine. - Install nfs-kernel-server and resource-agents-extra. - Verify fsidd.service is present - Create the shared_infodir folder - Start a one-node cluster with corosync+pacemaker - Create the nfsserver resource with: - - sudo pcs resource create nfs-daemon ocf:heartbeat:nfsserver \ - nfs_shared_infodir=/srv/nfs_shared_infodir nfs_ip=127.0.0.1 + - sudo pcs resource create nfs-daemon ocf:heartbeat:nfsserver \ + nfs_shared_infodir=/srv/nfs_shared_infodir nfs_ip=127.0.0.1 - Start the agent: - - sudo pcs resource debug-start nfs-daemon + - sudo pcs resource debug-start nfs-daemon - Stop the agent: - - sudo pcs resource debug-start nfs-daemon + - sudo pcs resource debug-stop nfs-daemon Using the version from the Ubuntu release, the last stop fails with the unmount error. Using the version from proposed, all works without errors. [ Where problems could occur ] - The fact that the change comes from upstream adds reliability - we don't expect any obvious failures. The change only affects the `nfsserver` agent and is bound by a systemctl guard to make sure it only tries to stop `fsidd` when it's present. - - The patch stops `fsidd` when present, but throws an error when that is not possible. If the stop operation is slow or fails for some reason, then it will break the operation - but that is not a regression, as it was broken before anyway. + The fact that the change comes from upstream adds reliability - we don't + expect any obvious failures. The change only affects the `nfsserver` + agent and is bound by a systemctl guard to make sure it only tries to + stop `fsidd` when it's present. + + The patch stops `fsidd` when present, but throws an error when that is + not possible. If the stop operation is slow or fails for some reason, + then it will break the operation - but that is not a regression, as it + was broken before anyway. Misapplying the patch would generate different problems which could be found in the test phase. [ Other Info ] Upstream references: https://github.com/ClusterLabs/resource-agents/commit/64b16e5741a582d0553e118d4df6136b363f46c9 https://github.com/ClusterLabs/resource-agents/commit/57d74e911f5f94bf298ca1b1e3fe58fce84e58e8 [ Original Description ] This is related to and sounds very similar to https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/2065848, but differs enough to warrant another bug. # lsb_release -rd Description: Ubuntu 24.04.4 LTS Release: 24.04 Package version: # apt-cache policy resource-agents-extra resource-agents-extra: Installed: 1:4.13.0-1ubuntu4 Candidate: 1:4.13.0-1ubuntu4 Version table: *** 1:4.13.0-1ubuntu4 500 500 http://archive.ubuntu.com/ubuntu noble/universe amd64 Packages 100 /var/lib/dpkg/status Expected behavior: Stopping the NFS server succeeds Actual behavior: The NFS server is stopped but the nfsserver resource's stop process reports a failure, because /var/lib/nfs filesystem fails to unmount, because fsidd is holding files open and is never stopped before attempting to unmount. Analysis: At some point, the "fsidd" service was added to the nfs-kernel-server package. The systemd unit supplied in this package starts fsidd before nfs-server: [Unit] Description=NFS FSID Daemon After=local-fs.target Before=nfs-mountd.service nfs-server.service [Service] ExecStart=/usr/sbin/fsidd [Install] RequiredBy=nfs-mountd.service nfs-server.service The fsidd process holds open a file in the nfs info dir: # lsof -p 57130 | grep /var/lib/nfs fsidd 57130 root 3u REG 147,0 16384 16777352 /var/lib/nfs/reexpdb.sqlite3 A combination of factors causes this service not to exit, and prevent the resource-agent script from reporting success in stopping the nfs server. If the nfs info dir is mounted from somewhere - when using nfs_shared_infodir - this process needs to exit before the nfs_shared_infodir is unmounted. When stopping nfs-server, the resource-agent script either uses /etc/init.d/nfs-kernel-server if present or systemctl commands, as mentioned in https://bugs.launchpad.net/ubuntu/+source/resource- agents/+bug/2065848/comments/10. Leaving /etc/init.d/nfs-kernel-server in place causes additional problems, as mentioned in https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/2065848, so I have removed it on my test system. Regardless of how nfs-server is stopped, fsidd is not stopped. This causes /var/lib/nfs to fail to unmount, and the error to bubble up. For my use case - an NFS server managed by corosync/pacemaker- this means failover to other hosts does not work as the existing NFS server cannot be stopped. This issue can be worked around by providing a systemd unit drop-in, that declares fsidd PartOf nfs-server: # cat > /etc/systemd/system/fsidd.service.d/nfs-server-dependency.conf << 'EOF' [Unit] PartOf=nfs-server.service EOF # systemctl daemon-reload With this in place, fsidd is automatically stopped when nfs-server is stopped, which means /var/lib/nfs can then be unmounted successfully. I suspect nfs-kernel-server was updated adding this fsidd service, and resource-agents was not updated to match. Or perhaps leaving fsidd running even when nfs-server exits is sloppy. It's not my call.
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2145790 Title: ocf:heartbeat:nfsserver resource's stop operation fails due to /var/lib/nfs filesystem failing to unmount To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/2145790/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
