Bug#1010073: Bug 1010073: kernel 4.19: nvme read overhead sometimes, system hangs
ср, 29 черв. 2022 р. о 16:32 Ben Hutchings пише: > On Thu, 9 Jun 2022 15:34:17 Андрій Василишин > wrote: > > Because it is the latest kernel which supports aufs. > > Problem gone when I change to default parameters NIC Mellanox > Technologies > > MT28908 Family [ConnectX-6] > > ethtool -C enp161s0f0np0 rx-usecs 8 rx-frames 128 tx-usecs 8 tx-frames > 128 > [...] > > So this seems to be a problem with the out-of-tree network driver you > are using. You should ask Mellanox for support, as there's nothing we > can do about that. > > Ben. > > -- > Ben Hutchings > Reality is just a crutch for people who can't handle science fiction. > Yes and no. Problem reappeared. Helped disable sendfile in nginx
Bug#1010073: Bug 1010073: kernel 4.19: nvme read overhead sometimes, system hangs
On Thu, 9 Jun 2022 15:34:17 Андрій Василишин wrote: > Because it is the latest kernel which supports aufs. > Problem gone when I change to default parameters NIC Mellanox Technologies > MT28908 Family [ConnectX-6] > ethtool -C enp161s0f0np0 rx-usecs 8 rx-frames 128 tx-usecs 8 tx-frames 128 [...] So this seems to be a problem with the out-of-tree network driver you are using. You should ask Mellanox for support, as there's nothing we can do about that. Ben. -- Ben Hutchings Reality is just a crutch for people who can't handle science fiction. signature.asc Description: This is a digitally signed message part
Bug#1010073: Bug 1010073: kernel 4.19: nvme read overhead sometimes, system hangs
problem repeats: Jun 17 23:28:06 nl100 kernel: [89832.101712] Modules linked in: binfmt_misc msr amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul efi_pstore crc32_pclmul ghash_clmulni_intel efivars pcspkr ipmi_ssif nls_ascii nls_cp437 vfat fat ast ttm joydev drm_kms_helper drm ccp i2c_algo_bit evdev rng_core sp5100_tco ipmi_si ipmi_devintf ipmi_msghandler pcc_cpufreq acpi_cpufreq button tcp_bbr sch_fq aufs(OE) efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb hid_generic usbhid hid crc32c_intel aesni_intel aes_x86_64 crypto_simd mlx5_core(OE) cryptd glue_helper mlxfw(OE) psample ahci mlxdevm(OE) auxiliary(OE) libahci xhci_pci xhci_hcd mlx_compat(OE) libata nvme usbcore devlink scsi_mod nvme_core i2c_piix4 usb_common Jun 17 23:28:06 nl100 kernel: [89832.101756] CPU: 51 PID: 96472 Comm: nginx Tainted: GW OEL4.19.0-20-amd64 #1 Debian 4.19.235-1 Jun 17 23:28:06 nl100 kernel: [89832.101757] Hardware name: Supermicro AS -1124US-TNRP/H12DSU-iN, BIOS 2.3a 03/03/2022 Jun 17 23:28:06 nl100 kernel: [89832.101764] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20 Jun 17 23:28:06 nl100 kernel: [89832.101767] Code: d8 48 3d 90 d0 03 00 76 cc 80 4d 00 08 eb 98 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 48 89 f7 57 9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 07 Jun 17 23:28:06 nl100 kernel: [89832.101767] RSP: 0018:91564d8c3e88 EFLAGS: 0282 ORIG_RAX: ff13 Jun 17 23:28:06 nl100 kernel: [89832.101769] RAX: 0066 RBX: d27afe4c4020 RCX: 8040003b Jun 17 23:28:06 nl100 kernel: [89832.101769] RDX: 8040003c RSI: 0282 RDI: 0282 Jun 17 23:28:06 nl100 kernel: [89832.101770] RBP: 005b R08: R09: b3ef6000 Jun 17 23:28:06 nl100 kernel: [89832.101770] R10: 910d717a7c00 R11: 0001 R12: 915637c5f858 Jun 17 23:28:06 nl100 kernel: [89832.101771] R13: 0282 R14: 915637c5f140 R15: d27afe4c6028 Jun 17 23:28:06 nl100 kernel: [89832.101772] FS: 77783b80() GS:91564d8c() knlGS: Jun 17 23:28:06 nl100 kernel: [89832.101772] CS: 0010 DS: ES: CR0: 80050033 Jun 17 23:28:06 nl100 kernel: [89832.101773] CR2: 77f8f8c0 CR3: 0176fe4a8000 CR4: 00340ee0 Jun 17 23:28:06 nl100 kernel: [89832.101773] Call Trace: Jun 17 23:28:06 nl100 kernel: [89832.101776] Jun 17 23:28:06 nl100 kernel: [89832.101781] fq_flush_timeout+0x6a/0x90 Jun 17 23:28:06 nl100 kernel: [89832.101784] ? fq_ring_free+0xd0/0xd0 Jun 17 23:28:06 nl100 kernel: [89832.101788] call_timer_fn+0x2b/0x130 Jun 17 23:28:06 nl100 kernel: [89832.101790] run_timer_softirq+0x1c7/0x3e0 Jun 17 23:28:06 nl100 kernel: [89832.101794] ? recalibrate_cpu_khz+0x10/0x10 Jun 17 23:28:06 nl100 kernel: [89832.101795] ? ktime_get+0x3a/0xa0 Jun 17 23:28:06 nl100 kernel: [89832.101797] __do_softirq+0xde/0x2d8 Jun 17 23:28:06 nl100 kernel: [89832.101800] irq_exit+0xba/0xc0 Jun 17 23:28:06 nl100 kernel: [89832.101802] smp_apic_timer_interrupt+0x74/0x140 Jun 17 23:28:06 nl100 kernel: [89832.101804] apic_timer_interrupt+0xf/0x20 Jun 17 23:28:06 nl100 kernel: [89832.101805] Jun 17 23:28:06 nl100 kernel: [89832.101806] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20 Jun 17 23:28:06 nl100 kernel: [89832.101807] Code: d8 48 3d 90 d0 03 00 76 cc 80 4d 00 08 eb 98 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 48 89 f7 57 9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 07 Jun 17 23:28:06 nl100 kernel: [89832.101807] RSP: 0018:b27b1e963638 EFLAGS: 0293 ORIG_RAX: ff13 Jun 17 23:28:06 nl100 kernel: [89832.101808] RAX: 9155ca566880 RBX: 915637c5f140 RCX: 9155ca566880 Jun 17 23:28:06 nl100 kernel: [89832.101809] RDX: 9155d0a7ce40 RSI: 0293 RDI: 0293 Jun 17 23:28:06 nl100 kernel: [89832.101809] RBP: 0078 R08: 0158 R09: 9155dbe9de80 Jun 17 23:28:06 nl100 kernel: [89832.101810] R10: R11: 914aa5ee6000 R12: 9155dbe9de80 Jun 17 23:28:06 nl100 kernel: [89832.101810] R13: 0080 R14: ff80 R15: ff80 Jun 17 23:28:06 nl100 kernel: [89832.101812] alloc_iova+0x11f/0x140 Jun 17 23:28:06 nl100 kernel: [89832.101813] alloc_iova_fast+0x56/0x250 Jun 17 23:28:06 nl100 kernel: [89832.101817] ? __kmalloc+0x180/0x220 Jun 17 23:28:06 nl100 kernel: [89832.101820] ? mempool_alloc+0x67/0x190 Jun 17 23:28:06 nl100 kernel: [89832.101821] dma_ops_alloc_iova.isra.28+0x4b/0x70 Jun 17 23:28:06 nl100 kernel: [89832.101822] map_sg+0x73/0x1f0 Jun 17 23:28:06 nl100 kernel: [89832.101827] nvme_queue_rq+0x1e7/0x9e0 [nvme] Jun 17 23:28:06 nl100 kernel: [89832.101831] ? __sbitmap_queue_get+0x24/0x90 Jun 17 23:28:06 nl100 kernel: [89832.101834] ? blk_mq_get_tag+0x236/0x260 Jun 17 23:28:06 nl100 kernel: [89832.101835] ?
Bug#1010073: Bug 1010073: kernel 4.19: nvme read overhead sometimes, system hangs
Because it is the latest kernel which supports aufs. Problem gone when I change to default parameters NIC Mellanox Technologies MT28908 Family [ConnectX-6] ethtool -C enp161s0f0np0 rx-usecs 8 rx-frames 128 tx-usecs 8 tx-frames 128 вт, 7 черв. 2022 р. о 18:35 Diederik de Haas пише: > Control: reassign -1 src:linux 4.19.235-1 > Control: tag -1 moreinfo > > On 23 Apr 2022 21:59:32 +0300 Андрій Василишин > wrote: > > Package: linux-image-4.19.0-20-amd64 > > Version: 4.19.235-1 > > > > ... > > > > Hardware name: Supermicro AS-1124US-TNRP/H12DSU-iN, BIOS 2.3a 03/03/2022 > > https://www.supermicro.com/en/Aplus/system/1U/1124/AS-1124US-TNRP.cfm > specifications (and BIOS date) indicate this is quite a new board. > Yet you're running it with a 4.19 kernel from *OldStable* ! > > Why? > > Can you reproduce this issue with at least the 5.10 kernel from OldStable > backports, but preferably with a recent kernel from Testing/Unstable. -- WBR, Andrey Vasilishin
Bug#1010073: Bug 1010073: kernel 4.19: nvme read overhead sometimes, system hangs
Control: reassign -1 src:linux 4.19.235-1 Control: tag -1 moreinfo On 23 Apr 2022 21:59:32 +0300 Андрій Василишин wrote: > Package: linux-image-4.19.0-20-amd64 > Version: 4.19.235-1 > > ... > > Hardware name: Supermicro AS-1124US-TNRP/H12DSU-iN, BIOS 2.3a 03/03/2022 https://www.supermicro.com/en/Aplus/system/1U/1124/AS-1124US-TNRP.cfm specifications (and BIOS date) indicate this is quite a new board. Yet you're running it with a 4.19 kernel from *OldStable* ! Why? Can you reproduce this issue with at least the 5.10 kernel from OldStable backports, but preferably with a recent kernel from Testing/Unstable. signature.asc Description: This is a digitally signed message part.