Just captured detailed diagnostics from a fresh crash (2025-12-19 18:41 UTC):
Setup: I deployed periodic monitoring scripts on all nodes - ethtool stats every 60s, link state every 30s. Timeline of the crash: - 18:41:53 - Last successful link state check: eth0 UP, carrier 1, operstate up - 18:41:56 - First network timeouts in k3s agent logs (TLS handshake timeout to API server) - 18:43:08 - "dial tcp 172.16.101.1:6443: connect: connection timed out" - 18:44:37 - Last ethtool stats collection (system still running locally) - Node required power cycle to recover Key findings: 1. PHY reported link UP throughout the entire incident: - ip link show eth0: "state UP mode DEFAULT" - /sys/class/net/eth0/carrier: 1 - /sys/class/net/eth0/operstate: up 2. Zero errors in ethtool stats at 18:44:37 (last collection before crash): - rx_resource_errors: 0 - rx_overruns: 0 - rx_frame_check_sequence_errors: 0 - rx_symbol_errors: 0 - rx_alignment_errors: 0 - All checksum error counters: 0 3. No kernel messages about the network failure: - No "Link is Down" - No macb driver errors - No RP1 related messages - journalctl -k shows nothing between 18:40-18:45 4. System continued running locally: - systemd timers kept executing - journald kept writing logs - Only network traffic stopped flowing This confirms the "silent death" pattern: the PHY/link layer believes everything is fine, but something in the macb/RP1 stack stops passing traffic without any error indication. There's no software-detectable symptom other than remote hosts not responding. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2133877 Title: Complete network hang on Raspberry Pi 5 with kernel 6.17 under load - possibly related to CPU frequency scaling To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
