Public bug reported:

== Summary ==

automount leaks Unix socketpairs per mount helper invocation. The parent
daemon does not close its end of the socketpair after the mount helper
subprocess exits. On active systems this causes steady fd accumulation
over days to weeks, eventually exhausting the per-process fd limit
(default 20,480 on Ubuntu 24.04), at which point autofs stops servicing
all NFS mount requests.

== Environment ==

Fleet of 500+ Ubuntu 24.04 and SLES compute hosts running a batch job
scheduler. Autofs uses SSSD as the automount backend:

  $ grep ^automount /etc/nsswitch.conf
  automount:  files sss

  $ grep autofs_provider /etc/sssd/sssd.conf
  autofs_provider = ldap

Automount maps are LDAP-backed. The fd leak is observed regardless of
SSSD/LDAP activity (see "What was ruled out" below).

== Symptoms ==

- ls -la /proc/$(pidof automount)/fd | wc -l  grows monotonically over time
- Rate: 0.45-10 fds/hour depending on NFS mount workload
- After exhaustion, autofs stops mounting NFS paths entirely
- systemctl restart autofs clears all accumulated fds immediately, confirming 
fds are held by automount, not the kernel

== Evidence ==

--- 1. lsof on affected host (202-day uptime, no restarts) ---

  Package: 5.1.9-1ubuntu4
  Kernel:  6.14.0-33-generic

  $ lsof -p $(pidof automount) | awk 'NR>1 {print $5}' | sort | uniq -c | sort 
-rn
    2082 unix
      57 REG
      51 FIFO
      27 DIR
       4 CHR

  Total fds: 2165 / 20480 (10%)
  Unix sockets: 2082 of 2165 (96%) — all anonymous CONNECTED, no bound path.
  Fresh-restart baseline: 80 fds. The 2,082 are the accumulated leak.

--- 2. Dead-peer verification ---

Sampled 20 socket inodes from automount's fd table:

  for inode in <sample>; do grep " $inode " /proc/net/unix; done

All 20 inodes absent from /proc/net/unix.

In Linux, both ends of a connected Unix socketpair appear in
/proc/net/unix while both are open. Absence proves the peer (mount
helper) has exited and closed its end. Automount holds 2,082 orphaned
socket ends with dead peers.

--- 3. strace — mount helper call chain ---

Captured with: strace -ff -yy -e
trace=socketpair,close,clone,fork,execve,exit_group

Each mount request triggers a 3-process chain:

  automount dispatch thread (PID A)
    close(4<UNIX-STREAM:[inode1]>) = 0   <- closes inherited socket pair ends
    close(4<UNIX-STREAM:[inode2]>) = 0
    close(4<UNIX-STREAM:[inode3]>) = 0
    close(4<UNIX-STREAM:[inode4]>) = 0
    clone() = PID B  (/bin/mount)
      clone() = PID C  (/sbin/mount.nfs)
        close(3<UNIX-STREAM:[inode5]>) = 0   <- grandchild closes its ends
        close(3<UNIX-STREAM:[inode6]>) = 0
        exit_group(0)

Representative dispatch thread trace:

  22:58:42.556386 close(4<UNIX-STREAM:[92517884]>) = 0
  22:58:42.556594 close(4<UNIX-STREAM:[92517885]>) = 0
  22:58:42.556811 close(4<UNIX-STREAM:[92517886]>) = 0
  22:58:42.557013 close(4<UNIX-STREAM:[92517887]>) = 0
  22:58:42.642343 clone(child_stack=NULL, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, ...) = PID_B
  22:58:42.705608 +++ exited with 0 +++

Representative /sbin/mount.nfs trace:

  22:58:42.649905 execve("/sbin/mount.nfs", ["/sbin/mount.nfs", 
"nas-server:/exports/home/user1", "/home/user1", "-o", "rw"], ...) = 0
  22:58:42.654225 close(3<UNIX-STREAM:[92499747]>) = 0
  22:58:42.654348 close(3<UNIX-STREAM:[92499748]>) = 0
  22:58:42.659118 close(3<UNIX-STREAM:[92523559]>) = 0
  22:58:42.659341 close(3<UNIX-STREAM:[92523560]>) = 0
  22:58:42.703484 exit_group(0) = ?

The dispatch thread closes 4 inherited socket ends at startup — pairs
created by the parent before the fork. No corresponding close() for
those inodes appears in the parent trace. The socketpair() call occurs
in an automount pthread; pthread shared-PID tracing limits prevent
capturing it directly, but the inherited-and-closed pattern in the child
confirms the parent created the pairs pre-fork and retains its ends
post-exit.

--- 4. Fleet-wide impact ---

Scan of 500+ Ubuntu 24.04 + SLES hosts:
  CRIT (>=90%): 0
  WARN (>=75%): 1  (hostB: 85%, 17510/20480 fds, 75-day uptime)
  Elevated:     6  (22%-47%)

7 hosts required manual autofs restart over a 2-day period.

== What was ruled out ==

Package 5.1.9-1ubuntu4.1: changelog (LP: #2074003) confirms this release
fixes only a Kerberos ticket renewal bug in modules/cyrus-sasl.c — no
fd-related changes. Both versions accumulate at identical rates (~1
fd/hr at equivalent workload). The apparent difference in fleet scans is
explained entirely by uptime and restart history.

Kerberos/LDAP reconnect storms: blocked LDAP ports 636/389 via iptables
on two hosts for 20 minutes while monitoring fds. Zero accumulation
observed. The SSSD/LDAP path is not involved.

== Workaround ==

  systemctl restart autofs

Resets to ~80 fds. Must be repeated periodically on active hosts.

== Suggested fix area ==

The parent should close its end of the socketpair after the mount helper
exits (or after dispatching). Likely location: mount subprocess dispatch
path in daemon/spawn.c, daemon/direct.c, or daemon/indirect.c.

== Version ==
  autofs 5.1.9-1ubuntu4 (Ubuntu 24.04 Noble)
  Also reproduced: SUSE Linux Enterprise 15 SP6, autofs 5.1.9-150600.1.4
  Kernel: 6.14.0-33-generic

** Affects: autofs (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: autofs fd-leak nfs noble

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2152277

Title:
  automount leaks Unix socketpairs per mount helper invocation — fd
  exhaustion after days/weeks of uptime

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/autofs/+bug/2152277/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to