Public bug reported: We run multiple Ubuntu servers that have the Internet full route table on its kernel route table. We started upgrade process to 26.04 LTS from 22.04 LTS recently, and observed multiple instabilities due to resource exhaustions triggered by various system processes which did not exist in 22.04.
## Cause We have 2 observations that causes the exhaustions. Both accidentially enumerating kernel route table to detect internet reachablity where route enumeration is not appropriate: 1. landscape-common /etc/update-motd.d/50-landscape-sysinfo calls python3-netifaces function that enumerates full kernel route table. - This causes update-motd to stall, which effectively prevents shell login, even with CPU exhaustion on a server. 2. fwupd and unattended-upgrades call NetworkMonitor get_default, which is expensive, to determine internet reachability. - Periodic updates (fwupd-update.timer, unattended-upgrades) cause CPU/RAM excessive usage. ## Root cause 1. landscape-common uses python3-netifaces, which enumerate full route table entries to detect default network interface. The call path exists in /etc/update-motd.d/50-landscape-sysinfo - https://github.com/canonical/landscape-client/blob/9cfa2458f1a2ef6b28fe4f7740031df5410a4f9b/landscape/lib/network.py#L127 - https://salsa.debian.org/python-team/packages/netifaces/-/blob/ffd1f927a289e2bc2defa19f637a6d0e31cf57b8/netifaces.c#L1778 2. fwupd and unattended-upgrades call glib/gio's NetworkMonitor.get_default, which *immediately* pulls all routes from kernel and subscribes to update. It is less performant and risky of excessive CPU/RAM usage to do frequently (or exhauses resource and never completes) - https://github.com/mvo5/unattended-upgrades/blob/26ae30dd42ee30ab4cc2e50f9d794f1fa8730f2e/unattended-upgrade#L907 - https://github.com/fwupd/fwupd/pull/8275 - https://gitlab.gnome.org/GNOME/glib/-/blob/7a314ecee2663d50dd776672a43e58d398b7dd50/gio/gnetworkmonitornetlink.c#L177 ## Possible fix - Don't enumerate kernel route table (at least it is not appropriate to do in packages installed out-of-the-box). - Use NetworkMonitor.get_default and python3-netifaces (which is unmaintained). - If there's known destination address, do netlink call to let kernel resolve route (equivalent to `ip route get 1.1.1.1`) - Have a reasonable timeout such as 200ms. ## Misc - NetworkMonitor library is confusing its consumer, and letting end users surprise because developers can't notice get_default() could be expensive. I am guessing this is by design, because NetworkMonitor wants to *monitor* so it pulls all routes and subscribe to event updates using netlink. - Maybe it has no problem if NetworkManager is running instead of systemd-networkd, and it is majority for most previous NetworkMonitor usage? ** Affects: landscape-client Importance: Undecided Status: New ** Affects: fwupd (Ubuntu) Importance: Undecided Status: New ** Affects: unattended-upgrades (Ubuntu) Importance: Undecided Status: New ** Also affects: fwupd (Ubuntu) Importance: Undecided Status: New ** Also affects: unattended-upgrades (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2157046 Title: CPU/RAM exhaustion triggered by system process on 26.04 with large kernel route table To manage notifications about this bug go to: https://bugs.launchpad.net/landscape-client/+bug/2157046/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
