** Description changed: + [ Impact ] + + The `ocf:heartbeat:nfsserver` agent fails to stop an NFS server resource + configured with `nfs_shared_infodir`. + + The nfs-utils package now brings `fsidd`, which starts together with + nfs-server. When the agents stops the server, it does not stop `fsidd`, + which holds files in `/var/lib/nfs`, and then the operation fails with a + "Failed to unmount a bind mount" / "target is busy" error. + + This is particularly bad for HA clusters as a resource ends up stuck + (failed-stop). The report states a 24.04 machine, where the bug is + reproducible. + + There is a fix upstream which can be backported to the current versions + in stonking, resolute, questing and noble. + + [ Test Plan ] + + - Launch a fresh Ubuntu machine. + - Install nfs-kernel-server and resource-agents-extra. + - Verify fsidd.service is present + - Create the shared_infodir folder + - Start a one-node cluster with corosync+pacemaker + - Create the nfsserver resource with: + - sudo pcs resource create nfs-daemon ocf:heartbeat:nfsserver \ + nfs_shared_infodir=/srv/nfs_shared_infodir nfs_ip=127.0.0.1 + - Start the agent: + - sudo pcs resource debug-start nfs-daemon + - Stop the agent: + - sudo pcs resource debug-start nfs-daemon + + Using the version from the Ubuntu release, the last stop fails with the unmount error. + Using the version from proposed, all works without errors. + + [ Where problems could occur ] + + The fact that the change comes from upstream adds reliability - we don't expect any obvious failures. The change only affects the `nfsserver` agent and is bound by a systemctl guard to make sure it only tries to stop `fsidd` when it's present. + + The patch stops `fsidd` when present, but throws an error when that is not possible. If the stop operation is slow or fails for some reason, then it will break the operation - but that is not a regression, as it was broken before anyway. + + Misapplying the patch would generate different problems which could be + found in the test phase. + + [ Other Info ] + + Upstream references: + https://github.com/ClusterLabs/resource-agents/commit/64b16e5741a582d0553e118d4df6136b363f46c9 + https://github.com/ClusterLabs/resource-agents/commit/57d74e911f5f94bf298ca1b1e3fe58fce84e58e8 + + [ Original Description ] + This is related to and sounds very similar to https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/2065848, but differs enough to warrant another bug. # lsb_release -rd Description: Ubuntu 24.04.4 LTS Release: 24.04 Package version: # apt-cache policy resource-agents-extra resource-agents-extra: - Installed: 1:4.13.0-1ubuntu4 - Candidate: 1:4.13.0-1ubuntu4 - Version table: - *** 1:4.13.0-1ubuntu4 500 - 500 http://archive.ubuntu.com/ubuntu noble/universe amd64 Packages - 100 /var/lib/dpkg/status + Installed: 1:4.13.0-1ubuntu4 + Candidate: 1:4.13.0-1ubuntu4 + Version table: + *** 1:4.13.0-1ubuntu4 500 + 500 http://archive.ubuntu.com/ubuntu noble/universe amd64 Packages + 100 /var/lib/dpkg/status Expected behavior: Stopping the NFS server succeeds Actual behavior: The NFS server is stopped but the nfsserver resource's stop process reports a failure, because /var/lib/nfs filesystem fails to unmount, because fsidd is holding files open and is never stopped before attempting to unmount. Analysis: At some point, the "fsidd" service was added to the nfs-kernel-server package. The systemd unit supplied in this package starts fsidd before nfs-server: - [Unit] - Description=NFS FSID Daemon - After=local-fs.target - Before=nfs-mountd.service nfs-server.service - - [Service] - ExecStart=/usr/sbin/fsidd - - [Install] - RequiredBy=nfs-mountd.service nfs-server.service + [Unit] + Description=NFS FSID Daemon + After=local-fs.target + Before=nfs-mountd.service nfs-server.service + + [Service] + ExecStart=/usr/sbin/fsidd + + [Install] + RequiredBy=nfs-mountd.service nfs-server.service The fsidd process holds open a file in the nfs info dir: - # lsof -p 57130 | grep /var/lib/nfs - fsidd 57130 root 3u REG 147,0 16384 16777352 /var/lib/nfs/reexpdb.sqlite3 + # lsof -p 57130 | grep /var/lib/nfs + fsidd 57130 root 3u REG 147,0 16384 16777352 /var/lib/nfs/reexpdb.sqlite3 A combination of factors causes this service not to exit, and prevent the resource-agent script from reporting success in stopping the nfs server. If the nfs info dir is mounted from somewhere - when using nfs_shared_infodir - this process needs to exit before the nfs_shared_infodir is unmounted. When stopping nfs-server, the resource-agent script either uses /etc/init.d/nfs-kernel-server if present or systemctl commands, as mentioned in https://bugs.launchpad.net/ubuntu/+source/resource- agents/+bug/2065848/comments/10. Leaving /etc/init.d/nfs-kernel-server in place causes additional problems, as mentioned in https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/2065848, so I have removed it on my test system. Regardless of how nfs-server is stopped, fsidd is not stopped. This causes /var/lib/nfs to fail to unmount, and the error to bubble up. For my use case - an NFS server managed by corosync/pacemaker- this means failover to other hosts does not work as the existing NFS server cannot be stopped. This issue can be worked around by providing a systemd unit drop-in, that declares fsidd PartOf nfs-server: # cat > /etc/systemd/system/fsidd.service.d/nfs-server-dependency.conf << 'EOF' [Unit] PartOf=nfs-server.service EOF # systemctl daemon-reload With this in place, fsidd is automatically stopped when nfs-server is stopped, which means /var/lib/nfs can then be unmounted successfully. I suspect nfs-kernel-server was updated adding this fsidd service, and resource-agents was not updated to match. Or perhaps leaving fsidd running even when nfs-server exits is sloppy. It's not my call.
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2145790 Title: ocf:heartbeat:nfsserver resource's stop operation fails due to /var/lib/nfs filesystem failing to unmount To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/2145790/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
