apport information
** Package changed: linux-hwe-5.4 (Ubuntu) => linux (Ubuntu)
** Tags added: apport-collected bionic
** Description changed:
After upgrading 3 servers from linux-image-5.3.0-40-generic to linux-
image-5.4.0-48-generic I have started seeing the following queue
timeouts from IP-over-Infiniband (ipoib) devices. The devices in
question are (with newest available firmware, 2.42.5000):
# lspci -nnk -s 83:00.0
83:00.0 Network controller [0280]: Mellanox Technologies MT27500 Family
[ConnectX-3] [15b3:1003]
Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]
[15b3:0027]
Kernel driver in use: mlx4_core
Below is the WARN from one machine's syslog. The others are practically
identical. When the WARN happens on any of the machines, other 2 will
_also_ exhibit queue timeouts. Additionally, other (unrelated) machines
connected to the same infiniband fabric will exhibit a 12-second
transmission delay. This could conceivably be caused by these 3 servers
also being Subnet Managers (opensm package).
The infiniband fabric is partitioned, with the affected partition (8011)
seeing most of the traffic.
--------------------
kernel: [52642.480066] ------------[ cut here ]------------
kernel: [52642.480092] NETDEV WATCHDOG: ib0.8011 (): transmit queue 0 timed
out
kernel: [52642.480120] WARNING: CPU: 13 PID: 0 at
/build/linux-hwe-5.4-8m2I8l/linux-hwe-5.4-5.4.0/net/sched/sch_generic.c:448
dev_watchdog+0x264/0x270
kernel: [52642.480121] Modules linked in: aufs overlay ip6table_raw
ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_tables
nfnetlink cfg80211 ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter bpfilter mst_pciconf(OE) mst_pci(OE) 8021q garp mrp stp llc
nls_iso8859_1 intel_rapl_msr lz4 lz4_compress intel_rapl_common ib_iser rdma_cm
sb_edac iw_cm iscsi_tcp libiscsi_tcp libiscsi x86_pkg_temp_thermal
scsi_transport_iscsi intel_powerclamp zram veth vhost_net tap coretemp vhost
kvm_intel crct10dif_pclmul crc32_pclmul ghash_clmulni_intel kvm openvswitch nsh
nf_conncount nf_nat nf_conntrack rapl nf_defrag_ipv6 nf_defrag_ipv4 joydev
input_leds intel_cstate ib_ipoib mei_me mei ib_cm ioatdma ib_umad lpc_ich
acpi_pad acpi_power_meter mac_hid ipmi_si ipmi_ssif ipmi_devintf
ipmi_msghandler kyber_iosched sch_fq_codel tcp_highspeed ip_tables x_tables
autofs4 zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE)
spl(OE) zlua(POE) btrfs zstd_compress raid10 raid456
kernel: [52642.480182] async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq libcrc32c raid0 multipath linear dm_mirror dm_region_hash
dm_log mlx4_ib ib_uverbs ib_core hid_generic raid1 ses enclosure usbhid hid ast
drm_vram_helper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect
sysimgblt aesni_intel fb_sys_fops ixgbe glue_helper mpt3sas nvme xfrm_algo
crypto_simd ahci raid_class dca cryptd mlx4_core drm megaraid_sas nvme_core
libahci scsi_transport_sas mdio wmi
kernel: [52642.480221] CPU: 13 PID: 0 Comm: swapper/13 Tainted: P
OE 5.4.0-48-generic #52~18.04.1-Ubuntu
kernel: [52642.480223] Hardware name: Supermicro Super Server/X10DRW-iT, BIOS
2.0b 04/13/2017
kernel: [52642.480226] RIP: 0010:dev_watchdog+0x264/0x270
kernel: [52642.480229] Code: 48 85 c0 75 e6 eb a0 4c 89 ef c6 05 42 c1 e7 00
01 e8 30 b8 fa ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 50 05 63 ae e8 4c 31 71 ff
<0f> 0b eb 82 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
kernel: [52642.480230] RSP: 0018:ffffb4998c970e48 EFLAGS: 00010282
kernel: [52642.480233] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
000000000000083f
kernel: [52642.480234] RDX: 0000000000000000 RSI: 00000000000000f6 RDI:
000000000000083f
kernel: [52642.480235] RBP: ffffb4998c970e78 R08: 00000000000008fd R09:
0000000000000003
kernel: [52642.480237] R10: ffffb4998c970ee8 R11: 0000000000000001 R12:
0000000000000001
kernel: [52642.480238] R13: ffff93d302465000 R14: ffff93d302465480 R15:
ffff93f2d293c880
kernel: [52642.480240] FS: 0000000000000000(0000) GS:ffff93f33f640000(0000)
knlGS:0000000000000000
kernel: [52642.480242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [52642.480243] CR2: fffff90140127000 CR3: 0000002a26e0a004 CR4:
00000000001626e0
kernel: [52642.480245] Call Trace:
kernel: [52642.480247] <IRQ>
kernel: [52642.480252] ? pfifo_fast_reset+0x110/0x110
kernel: [52642.480255] call_timer_fn+0x32/0x130
kernel: [52642.480258] run_timer_softirq+0x443/0x480
kernel: [52642.480262] ? ktime_get+0x43/0xa0
kernel: [52642.480268] ? lapic_next_deadline+0x26/0x30
kernel: [52642.480273] __do_softirq+0xe4/0x2da
kernel: [52642.480278] irq_exit+0xae/0xb0
kernel: [52642.480282] smp_apic_timer_interrupt+0x79/0x130
kernel: [52642.480285] apic_timer_interrupt+0xf/0x20
kernel: [52642.480286] </IRQ>
kernel: [52642.480292] RIP: 0010:cpuidle_enter_state+0xbc/0x440
kernel: [52642.480294] Code: ff e8 98 f6 80 ff 80 7d d3 00 74 17 9c 58 0f 1f
44 00 00 f6 c4 02 0f 85 54 03 00 00 31 ff e8 2b 74 87 ff fb 66 0f 1f 44 00 00
<45> 85 ed 0f 88 1a 03 00 00 4c 2b 7d c8 48 ba cf f7 53 e3 a5 9b c4
kernel: [52642.480295] RSP: 0018:ffffb4998c5afe48 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff13
kernel: [52642.480298] RAX: ffff93f33f66ad00 RBX: ffffffffae9579e0 RCX:
000000000000001f
kernel: [52642.480299] RDX: 00002fe0c852cf32 RSI: 0000000037a6f80a RDI:
0000000000000000
kernel: [52642.480300] RBP: ffffb4998c5afe88 R08: 0000000000000002 R09:
000000000002a580
kernel: [52642.480302] R10: ffffb4998c5afe18 R11: 0000000000002171 R12:
ffffd4997f844500
kernel: [52642.480303] R13: 0000000000000004 R14: ffffffffae957b78 R15:
00002fe0c852cf32
kernel: [52642.480307] ? cpuidle_enter_state+0x98/0x440
kernel: [52642.480311] cpuidle_enter+0x2e/0x40
kernel: [52642.480316] call_cpuidle+0x23/0x40
kernel: [52642.480319] do_idle+0x1f6/0x270
kernel: [52642.480323] cpu_startup_entry+0x1d/0x20
kernel: [52642.480326] start_secondary+0x166/0x1c0
kernel: [52642.480331] secondary_startup_64+0xa4/0xb0
kernel: [52642.480334] ---[ end trace 770c6aafc2e53202 ]---
kernel: [52642.480339] ib0.8011: transmit timeout: latency 1420 msecs
kernel: [52642.480343] ib0.8011: queue stopped 1, tx_head 30, tx_tail 30,
global_tx_head 50968261, global_tx_tail 50968133
kernel: [52642.608061] ib0: transmit timeout: latency 1512 msecs
kernel: [52642.608081] ib0: queue stopped 1, tx_head 1293, tx_tail 1293,
global_tx_head 38440435, global_tx_tail 38440307
--------------------
# apt-cache policy linux-image-5.4.0-48-generic
linux-image-5.4.0-48-generic:
Installed: 5.4.0-48.52~18.04.1
# lsb_release -rd
Description: Ubuntu 18.04.5 LTS
Release: 18.04
+ ---
+ ProblemType: Bug
+ AlsaDevices:
+ total 0
+ crw-rw---- 1 root audio 116, 1 loka 1 01:28 seq
+ crw-rw---- 1 root audio 116, 33 loka 1 01:28 timer
+ AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
+ ApportVersion: 2.20.9-0ubuntu7.17
+ Architecture: amd64
+ ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord':
'arecord'
+ AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
'/dev/snd/timer'] failed with exit code 1:
+ DistroRelease: Ubuntu 18.04
+ HibernationDevice: RESUME=none
+ InstallationDate: Installed on 2017-05-26 (1224 days ago)
+ InstallationMedia: Ubuntu-Server 16.04.2 LTS "Xenial Xerus" - Release amd64
(20170215.8)
+ IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
+ MachineType: Supermicro Super Server
+ NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
+ Package: linux (not installed)
+ PciMultimedia:
+
+ ProcFB: 0 astdrmfb
+ ProcKernelCmdLine: BOOT_IMAGE=/ROOT/ubuntu@/boot/vmlinuz-5.4.0-48-generic
root=ZFS=rpool/ROOT/ubuntu ro iommu=pt intel_iommu=on scsi_mod.use_blk_mq=1
biosdevname=0 net.ifnames=0 rootdelay=5 pti=off spectre_v2=off l1tf=off
spec_store_bypass_disable=prctl
+ ProcVersionSignature: Ubuntu 5.4.0-48.52~18.04.1-generic 5.4.60
+ RelatedPackageVersions:
+ linux-restricted-modules-5.4.0-48-generic N/A
+ linux-backports-modules-5.4.0-48-generic N/A
+ linux-firmware 1.173.19
+ RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
+ Tags: bionic
+ Uname: Linux 5.4.0-48-generic x86_64
+ UnreportableReason: This report is about a package that is not installed.
+ UpgradeStatus: Upgraded to bionic on 2018-12-11 (660 days ago)
+ UserGroups:
+
+ _MarkForUpload: False
+ dmi.bios.date: 04/13/2017
+ dmi.bios.vendor: American Megatrends Inc.
+ dmi.bios.version: 2.0b
+ dmi.board.asset.tag: Default string
+ dmi.board.name: X10DRW-iT
+ dmi.board.vendor: Supermicro
+ dmi.board.version: 1.02
+ dmi.chassis.asset.tag: Default string
+ dmi.chassis.type: 17
+ dmi.chassis.vendor: Supermicro
+ dmi.chassis.version: 0123456789
+ dmi.modalias:
dmi:bvnAmericanMegatrendsInc.:bvr2.0b:bd04/13/2017:svnSupermicro:pnSuperServer:pvr0123456789:rvnSupermicro:rnX10DRW-iT:rvr1.02:cvnSupermicro:ct17:cvr0123456789:
+ dmi.product.family: Default string
+ dmi.product.name: Super Server
+ dmi.product.sku: Default string
+ dmi.product.version: 0123456789
+ dmi.sys.vendor: Supermicro
** Attachment added: "CRDA.txt"
https://bugs.launchpad.net/bugs/1898057/+attachment/5416150/+files/CRDA.txt
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1898057
Title:
Infiniband transmit queue timeouts after upgrading to linux-hwe-5.4
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898057/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs