Public bug reported:
==========================================
Ubuntu Kernel 6.8.0-87 Regression Bug Report
Intel 82599ES ixgbe Driver Tx Hang Issue
==========================================
SUMMARY:
--------
After upgrading from kernel 6.8.0-85 to 6.8.0-87, the Intel 82599ES 10GbE NIC
(using ixgbe driver) experiences repeated Tx Unit Hang errors after several
hours
of operation, causing complete network failure and preventing SSH access.
SYSTEM INFORMATION:
-------------------
Distribution: Ubuntu 24.04.3 LTS (Noble)
Architecture: x86_64
Hostname: [REDACTED]
Current Kernel: 6.8.0-87-generic #88-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 11
09:28:41 UTC 2025
Problem Kernel: 6.8.0-87-generic (FAILS after several hours)
Working Kernel: 6.8.0-85-generic (NO ISSUES)
Kernel Upgrade Date: November 15, 2025
Problem First Occurred: November 15, 2025 around 06:51:18 UTC
HARDWARE INFORMATION:
---------------------
Affected NIC:
Device: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection
PCI ID: 8086:10fb (rev 01)
Bus: 0000:01:00.0
Interface: enp1s0f0
MAC Address: [REDACTED]
Link: 10 Gbps Full Duplex
SFP+ Module: Type 5
Firmware: 0x80000208, 1.3429.0
Other Network Interfaces (NOT affected):
- Intel 82599ES #2: 0000:01:00.1 (enp1s0f1) - not in use
- Broadcom BCM5720: 02:00.0, 02:00.1
- Mellanox ConnectX-4: 41:00.0
DRIVER INFORMATION:
-------------------
ixgbe Driver Version: 6.8.0-87-generic
ixgbe Source Version: 6BF1A5A47043DC1DD3134D6
IMPORTANT: All two kernel versions have IDENTICAL ixgbe driver source:
- 6.8.0-85-generic: srcversion 6BF1A5A47043DC1DD3134D6 (WORKS)
- 6.8.0-87-generic: srcversion 6BF1A5A47043DC1DD3134D6 (FAILS)
This indicates the regression is NOT in the ixgbe driver itself, but in other
kernel subsystems (network stack, PCIe, interrupt handling, etc.) that interact
with the driver.
NETWORK CONFIGURATION:
----------------------
Interface enp1s0f0 configuration:
- Primary IP: 203.0.113.10/29 [REDACTED - using RFC 5737 example IP]
- Secondary IPs: Entire /24 subnet (256 IPs assigned as /32)
- MTU: 1500
- Queue configuration: 63 Rx queues, 63 Tx queues
- Flow Control: RX/TX enabled
NOTE: The large number of IP addresses on this interface may be relevant to
the issue, as the Tx hangs occur on specific queues (36, 38).
PROBLEM DESCRIPTION:
--------------------
After approximately 6 hours of runtime on kernel 6.8.0-87-generic, the system
experienced a catastrophic network failure characterized by:
1. Repeated "Tx Unit Hang" errors on queues 36 and 38
2. "Warning firmware error detected FWSM: 0x00000000" messages
3. Automatic adapter reset attempts that fail
4. Queue disable timeouts (RXDCTL.ENABLE and TXDCTL.ENABLE)
5. PCIe primary disable timeout
6. Continuous reset loop preventing network recovery
7. Complete loss of network connectivity (SSH inaccessible)
8. Required physical console access and reboot to recover
The issue did NOT occur on kernel 6.8.0-85-generic under identical workload
and network configuration for months of operation.
RELEVANT LOG EXCERPTS:
-----------------------
From /var/log/syslog on 2025-11-15 starting at 06:51:18:
2025-11-15T06:51:18.570514+00:00 hostname kernel: ixgbe 0000:01:00.0 enp1s0f0:
Detected Tx Unit Hang
2025-11-15T06:51:18.570523+00:00 hostname kernel: ixgbe 0000:01:00.0 enp1s0f0:
tx hang 4298 detected on queue 36, resetting adapter
2025-11-15T06:51:18.574748+00:00 hostname kernel: ixgbe 0000:01:00.0 enp1s0f0:
tx hang 4298 detected on queue 38, resetting adapter
2025-11-15T06:51:18.574761+00:00 hostname kernel: ixgbe 0000:01:00.0 enp1s0f0:
initiating reset due to tx timeout
2025-11-15T06:51:18.574763+00:00 hostname kernel: ixgbe 0000:01:00.0 enp1s0f0:
initiating reset due to tx timeout
2025-11-15T06:51:18.574776+00:00 hostname kernel: ixgbe 0000:01:00.0: Warning
firmware error detected FWSM: 0x00000000
2025-11-15T06:51:18.574777+00:00 hostname kernel: ixgbe 0000:01:00.0 enp1s0f0:
Reset adapter
2025-11-15T06:51:18.625490+00:00 hostname kernel: ixgbe 0000:01:00.0 enp1s0f0:
RXDCTL.ENABLE for one or more queues not cleared within the polling period
2025-11-15T06:51:18.666504+00:00 hostname kernel: ixgbe 0000:01:00.0 enp1s0f0:
TXDCTL.ENABLE for one or more queues not cleared within the polling period
2025-11-15T06:51:18.845489+00:00 hostname kernel: ixgbe 0000:01:00.0: primary
disable timed out
2025-11-15T06:51:19.340491+00:00 hostname kernel: ixgbe 0000:01:00.0 enp1s0f0:
detected SFP+: 5
2025-11-15T06:51:19.382491+00:00 hostname kernel: ixgbe 0000:01:00.0: Warning
firmware error detected FWSM: 0x00000000
2025-11-15T06:51:19.486493+00:00 hostname kernel: ixgbe 0000:01:00.0: Warning
firmware error detected FWSM: 0x00000000
2025-11-15T06:51:19.486503+00:00 hostname kernel: ixgbe 0000:01:00.0 enp1s0f0:
NIC Link is Up 10 Gbps, Flow Control: RX/TX
2025-11-15T06:51:19.590491+00:00 hostname kernel: ixgbe 0000:01:00.0: Warning
firmware error detected FWSM: 0x00000000
2025-11-15T06:51:19.698366+00:00 hostname kernel: ixgbe 0000:01:00.0: Warning
firmware error detected FWSM: 0x00000000
2025-11-15T06:51:19.698374+00:00 hostname kernel: ixgbe 0000:01:00.0 enp1s0f0:
Detected Tx Unit Hang
[... Pattern repeats continuously ...]
Error statistics from ethtool -S enp1s0f0:
rx_errors: 0
tx_errors: 0
rx_dropped: 374
tx_dropped: 0
rx_csum_offload_errors: 285
(All other error counters: 0)
STEPS TO REPRODUCE:
-------------------
1. Install kernel 6.8.0-87-generic on Ubuntu 24.04.3 LTS
2. Configure Intel 82599ES NIC with ixgbe driver
3. Assign multiple IP addresses (e.g., /24 subnet) to the interface
4. Run with normal network load for approximately 6+ hours
5. Observe Tx Unit Hang errors and network failure in kernel logs
REGRESSION ANALYSIS:
--------------------
This is a clear regression between kernel versions:
- 6.8.0-85-generic: STABLE (months of operation)
- 6.8.0-87-generic: FAILS (after ~6 hours)
The ixgbe driver code is IDENTICAL (same srcversion) in both kernels.
The regression must be in kernel subsystems that interact with the driver:
- Network stack changes
- PCIe subsystem changes
- Interrupt handling changes
- Memory management changes
- Timer/scheduling changes
WORKAROUND:
-----------
Boot into kernel 6.8.0-85-generic which is stable.
To prevent automatic upgrade to 6.8.0-87:
sudo apt-mark hold linux-image-generic linux-headers-generic linux-generic
IMPACT:
-------
HIGH - Complete network failure requiring physical console access and reboot.
This affects production servers and can cause extended downtime.
ADDITIONAL NOTES:
-----------------
- The FWSM: 0x00000000 error suggests the kernel is incorrectly reading the
firmware status word or there's a PCIe communication issue
- The issue affects specific Tx queues (36, 38) consistently
- The 6-hour delay before failure suggests a cumulative issue (memory leak,
queue exhaustion, timer overflow, etc.)
- Hardware is confirmed working (stable on previous kernel)
- SFP+ module and cable are confirmed working
EXPECTED BEHAVIOR:
------------------
The network interface should remain stable indefinitely, as it does under
kernel 6.8.0-87-generic.
REQUESTED ACTION:
-----------------
1. Investigate kernel changes between 6.8.0-85 and 6.8.0-87 that affect:
- PCIe device communication
- Network queue management
- Interrupt handling
- Firmware status word reading
2. Test with Intel 82599ES hardware under sustained load
3. Consider backporting the fix or reverting the problematic change in the
next kernel update
ATTACHMENTS:
------------
Full system information
REPORTER INFORMATION:
---------------------
This bug report was prepared with comprehensive testing and analysis.
System is available for additional testing if needed.
Date: November 15, 2025
Kernel: 6.8.0-87-generic
Package: linux-image-6.8.0-87-generic 6.8.0-87.88
ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: linux-image-6.8.0-87-generic 6.8.0-87.88
ProcVersionSignature: Ubuntu 6.8.0-87.88-generic 6.8.12
Uname: Linux 6.8.0-87-generic x86_64
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Nov 15 06:53 seq
crw-rw---- 1 root audio 116, 33 Nov 15 06:53 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.28.1-0ubuntu3.8
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
'/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: unknown
CurrentDmesg: Error: command ['dmesg'] failed with exit code 1: dmesg: read
kernel buffer failed: Operation not permitted
Date: Sat Nov 15 07:37:05 2025
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: Supermicro Super Server
PciMultimedia:
ProcEnviron:
LANG=en_US.UTF-8
PATH=(custom, no user)
SHELL=/bin/bash
TERM=screen-256color
XDG_RUNTIME_DIR=<set>
ProcFB: 0 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.8.0-87-generic
root=/dev/mapper/vg0-root ro
RelatedPackageVersions:
linux-restricted-modules-6.8.0-87-generic N/A
linux-backports-modules-6.8.0-87-generic N/A
linux-firmware 20240318.git3b128b60-0ubuntu2.19
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
WifiSyslog:
acpidump:
dmi.bios.date: 07/29/2024
dmi.bios.release: 5.14
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 3.0
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: H12DSI-N6
dmi.board.vendor: Supermicro
dmi.board.version: 1.02
dmi.chassis.asset.tag: To be filled by O.E.M.
dmi.chassis.type: 17
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias:
dmi:bvnAmericanMegatrendsInc.:bvr3.0:bd07/29/2024:br5.14:svnSupermicro:pnSuperServer:pvr0123456789:rvnSupermicro:rnH12DSI-N6:rvr1.02:cvnSupermicro:ct17:cvr0123456789:skuTobefilledbyO.E.M.:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: Super Server
dmi.product.sku: To be filled by O.E.M.
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Tags: 6.8.0-87 82599es amd64 apport-bug driver intel ixgbe kernel networking
noble regression tx unit
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2131586
Title:
Kernel 6.8.0-87 regression: Intel 82599ES ixgbe driver Tx Unit Hang
after 6 hours causes complete network failure
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2131586/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs