Reproduced on Ubuntu 24.04.4 LTS (Noble) with apt 2.8.3 — same signature
as Walter's report. Three separate hosts across different mirror
infrastructures, same hang: two with full captured evidence (gdb / pcap
/ /proc, below), and a third (Host C) that pins down a clean,
deterministic environmental trigger — an AAAA-resolving mirror with no
working IPv6 route. Backport to Noble would be very welcome.
=== TWO CAPTURED INSTANCES ===
Host A — active reproduction
OS Ubuntu 24.04.4 LTS (Noble)
Kernel 6.8.0-110-generic
apt 2.8.3
Mirror archive.ubuntu.com -> Cloudflare CDN (104.20.28.x)
Trigger tight loop with rm /var/lib/apt/lists/* between rounds,
Acquire::http::Pipeline-Depth=0, Acquire::http::No-Cache=true
Frozen ~10 minutes when snapshot taken (then killed)
Captured gdb backtrace, pcap (full + SYN/FIN/RST trim), apt-debug.log,
ps, lsof, /proc/<pid>/{wchan,stack,...}
Host B — production zombie
OS Ubuntu 24.04.4 LTS (Noble)
Kernel 6.8.0-106-generic
apt 2.8.3
Mirror archive.ubuntu.com -> direct Canonical mirror
(ubuntu-mirror-{2,3}.ps6.canonical.com)
Trigger spontaneous, started by apt.systemd.daily update cron
Frozen 4 days 15 hours at snapshot time, still alive, holding
/var/lib/apt/lists/lock (no operator action yet)
Captured kernel /proc/<pid>/stack, lsof (incl. CLOSE_WAIT sockets), ps,
apt-logs/, pipes.txt (gdb couldn't be installed because the
lock was held by the very zombie we wanted to debug)
Both show the same kernel-side signature:
[<0>] do_select+0x6e6/0x890
[<0>] core_sys_select+0x3f6/0x5f0
[<0>] do_pselect.constprop.0+0xe9/0x190
[<0>] __x64_sys_pselect6+0x68/0xa0
(syscall 270 = pselect6_time64 on x86_64.) wchan = do_select for parent
and every method process, CPU time 00:00:00, no spinning.
=== HOST A — GDB USERSPACE BACKTRACE ===
Parent apt-get update (PID 2383996):
#0 __select (nfds=8, timeout={tv_sec=0, tv_nsec=381243623})
#1 pkgAcquire::Run(int) <- libapt-pkg.so.6.0
#2 AcquireUpdate(pkgAcquire&, int, bool, bool)
#3 ListUpdate(pkgAcquireStatus&, pkgSourceList&, int)
#4 DoUpdate() <- libapt-private.so.0.0
Child /usr/lib/apt/methods/http (PID 2384005):
#0 __select (nfds=1, timeout=NULL) <- blocked indefinitely
#1 WaitFd(int, bool, unsigned long) <- libapt-pkg.so.6.0
=== HOST B — PIPE TOPOLOGY + CLOSE_WAIT TCP ===
lsof on the four-day zombie shows the parent <-> workers IPC pipes
intact, the parent still holding the apt list lock, and two HTTP workers
still attached to TCP sockets the mirror has already half-closed:
apt-get 315784 root 4uW REG /var/lib/apt/lists/lock <- held since May
2 00:14 UTC
apt-get 315784 root 5r pipe:[41374374]
apt-get 315784 root 6r pipe:[41374391]
apt-get 315784 root 7r pipe:[41374419]
apt-get 315784 root 8w pipe:[41374375]
apt-get 315784 root 10w pipe:[41374392]
...
http 315793 _apt 3u TCP …:51594 ->
ubuntu-mirror-2.ps6.canonical.com:http (CLOSE_WAIT)
http 315794 _apt 3u TCP …:35958 ->
ubuntu-mirror-3.ps6.canonical.com:http (CLOSE_WAIT)
So the canonical-side mirror sent FIN, the apt-method's read loop saw
select() wake up with readable=1, read() returned 0 — but instead of
closing/cleaning up and asking the parent for a new URI, the worker is
still in WaitFd(NULL) waiting forever for the next "URI Acquire" from
the parent. The parent's queue ordering means that next command never
comes. Lock stays held; cron retries fail; the host needs an operator
with kill -9 to recover.
=== HOST A — TCP-LEVEL EVIDENCE ===
During a 30-minute aggressive-loop capture (with Pipeline-Depth=0 and
No-Cache=true), tcpdump "tcp[tcpflags] & (tcp-syn|tcp-fin|tcp-rst) != 0"
shows:
[S] client SYN 603
[S.] server SYN-ACK 591
[F.] FIN 1 243
[R] RST from peer 424
So roughly 70% of established connections were torn down by the server
side. Under those conditions the queue-ordering bug (post-MR !500) hits
within 1–10 rounds.
=== AT THE MOMENT OF EITHER SNAPSHOT ===
ss -tnp returned zero open sockets owned by apt-method processes (Host
A). Host B's lsof shows the sockets are still attached to the worker
process but in CLOSE_WAIT state — same root cause, just observed earlier
in the half-close lifecycle (the kernel hasn't garbage-collected them
yet because the apt-method still holds the fd).
=== HOST C — CLEAN ENVIRONMENTAL TRIGGER (NO IPv6 ROUTE + AAAA MIRROR) ===
A third production host (Ubuntu 24.04.4 LTS Noble, kernel
6.8.0-106-generic, apt 2.8.3) hit the same hang, again spontaneously via
apt.systemd.daily update, and sat as a 4-day-6-hour zombie holding
/var/lib/apt/lists/lock (apt-get -qq -y update, CPU time 00:00:00,
parent apt.systemd.daily lock_is_held update, child http/gpgv/store
methods all blocked) until an operator killed it. Same signature as
Hosts A and B.
What makes this instance useful is an unusually clean trigger: the host
has no IPv6 default route and no global IPv6 address, but its mirror
(archive.ubuntu.com / security.ubuntu.com, Cloudflare-fronted) returns
AAAA records (2606:4700:10::ac42:98b0). apt's method prefers IPv6, opens
a connection to an unroutable v6 address, and the same WaitFd() never
returns — so on a dual-stack-DNS / v4-only-routing host the bug fires
every single run, deterministically, with no need for server-side RST
churn.
Two things confirmed this as the proximate cause rather than a slow
mirror:
- A plain Acquire::http::Timeout alone let apt-get update finish, but only
after timing out each source:
W: Failed to fetch http://security.ubuntu.com/.../InRelease Connection
timed out [IP: 104.20.28.246 80]
- Adding Acquire::ForceIPv4 "true" made apt-get update return rc=0 with no
warnings at all — apt stopped attempting the unroutable v6 connect entirely.
So for the substantial population of hosts that have AAAA-resolving
mirrors but no working IPv6 path (common behind NAT/CGNAT gateways), the
hang is not intermittent — it is the steady state, and ForceIPv4 removes
the trigger outright.
=== WORKAROUND IN PRODUCTION ===
While the proper fix is the queue-ordering patch from apt 3.1.3 — please
backport! — operators on Noble can avoid forever-hangs by giving the
apt-method workers a real timeout:
# /etc/apt/apt.conf.d/99-timeouts
Acquire::http::Timeout "30";
Acquire::https::Timeout "30";
Acquire::Retries "3";
This converts the eternal select(NULL) in WaitFd() into a bounded wait;
the failed round exits with a normal error and a cron retry usually
succeeds.
On hosts with AAAA-resolving mirrors but no working IPv6 route (Host C
above), also add:
Acquire::ForceIPv4 "true";
This removes the trigger entirely rather than merely bounding it — apt
never opens the unroutable v6 connection in the first place, so the
update succeeds cleanly instead of timing out each source.
=== ATTACHED EVIDENCE ===
apt-hang-noble-2.8.3-evidence.tar.gz contains:
Host A (active reproduction, ~915 KB):
- info.txt, ps.txt, gdb-backtrace.txt (41 KB, full "thread apply all bt full"
for every PID — no debug symbols, but the libapt-pkg.so.6.0 symbol table is
sufficient)
- proc-<pid>/{wchan,stack,syscall,status,cmdline} per process
- lsof.txt, ss-all.txt
- apt-debug.log (187 KB — Debug::pkgAcquire(::Worker) +
Debug::Acquire::http(s))
- apt-hang-flags.pcap (239 KB — 30-min SYN/FIN/RST trim)
Host B (4-day production zombie, ~22 KB):
- info.txt, ps.txt, locks.txt, pipes.txt
- proc-<pid>/{wchan,stack,syscall,status,cmdline,fd} per process — kernel-side
stack identical to Host A
- lsof.txt — shows the CLOSE_WAIT sockets to canonical mirrors
- ss-all.txt
- apt-logs/ — last successful apt run was the upgrade that brought
kmod 31+20240202-2ubuntu7.2 on 2026-05-01 06:31:49 UTC; the zombie started
at
2026-05-02 00:14 UTC during the next apt.systemd.daily update
Host C (4-day zombie, no-IPv6 trigger, ~5 KB):
- info.txt — host state proving the trigger: no IPv6 default route / no global
v6 address, mirror returns AAAA, connect() to the v6 address times out;
apt-get update with only Acquire::http::Timeout warns "Connection timed out"
per source, with Acquire::ForceIPv4 returns rc=0 clean
- zombie-incident.txt — frozen process tree (4 d 6 h, CPU 00:00:00), lsof of
the held lists/lock, the http methods' mirror endpoint. No gdb/kernel-stack:
the live zombie was killed by the operator before capture
Full unfiltered pcap from Host A (101 MB) available on request.
** Attachment added: "apt-hang-noble-2.8.3-evidence.tar.gz"
https://bugs.launchpad.net/ubuntu/+source/apt/+bug/2003851/+attachment/5977615/+files/apt-hang-noble-2.8.3-evidence.tar.gz
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2003851
Title:
occasional hanging 'apt-get update' from daily cronjob since Jammy
22.04
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/apt/+bug/2003851/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs