Bug#1010073: Bug 1010073: kernel 4.19: nvme read overhead sometimes, system hangs
ср, 29 черв. 2022 р. о 16:32 Ben Hutchings пише: > On Thu, 9 Jun 2022 15:34:17 Андрій Василишин > wrote: > > Because it is the latest kernel which supports aufs. > > Problem gone when I change to default parameters NIC Mellanox > Technologies > > MT28908 Family [ConnectX-6] > > ethtool -C enp161s0f0np0 rx-usecs 8 rx-frames 128 tx-usecs 8 tx-frames > 128 > [...] > > So this seems to be a problem with the out-of-tree network driver you > are using. You should ask Mellanox for support, as there's nothing we > can do about that. > > Ben. > > -- > Ben Hutchings > Reality is just a crutch for people who can't handle science fiction. > Yes and no. Problem reappeared. Helped disable sendfile in nginx
Bug#1010073: Bug 1010073: kernel 4.19: nvme read overhead sometimes, system hangs
R13: 0080 R14: ff80 R15: ff80 Jun 17 23:28:34 nl100 kernel: [89860.345176] alloc_iova+0x11f/0x140 Jun 17 23:28:34 nl100 kernel: [89860.345178] alloc_iova_fast+0x56/0x250 Jun 17 23:28:34 nl100 kernel: [89860.345181] ? __kmalloc+0x180/0x220 Jun 17 23:28:34 nl100 kernel: [89860.345185] ? mempool_alloc+0x67/0x190 Jun 17 23:28:34 nl100 kernel: [89860.345186] dma_ops_alloc_iova.isra.28+0x4b/0x70 Jun 17 23:28:34 nl100 kernel: [89860.345188] map_sg+0x73/0x1f0 Jun 17 23:28:34 nl100 kernel: [89860.345193] nvme_queue_rq+0x1e7/0x9e0 [nvme] Jun 17 23:28:34 nl100 kernel: [89860.345197] ? __sbitmap_queue_get+0x24/0x90 Jun 17 23:28:34 nl100 kernel: [89860.345199] ? blk_mq_get_tag+0x236/0x260 Jun 17 23:28:34 nl100 kernel: [89860.345201] ? nvme_queue_rq+0x4d2/0x9e0 [nvme] Jun 17 23:28:34 nl100 kernel: [89860.345203] ? finish_wait+0x80/0x80 Jun 17 23:28:34 nl100 kernel: [89860.345205] blk_mq_dispatch_rq_list+0x392/0x590 Jun 17 23:28:34 nl100 kernel: [89860.345206] ? remove_wait_queue+0x60/0x60 Jun 17 23:28:34 nl100 kernel: [89860.345208] blk_mq_sched_dispatch_requests+0xf0/0x170 Jun 17 23:28:34 nl100 kernel: [89860.345210] __blk_mq_run_hw_queue+0x4e/0xe0 Jun 17 23:28:34 nl100 kernel: [89860.345212] __blk_mq_delay_run_hw_queue+0x143/0x160 Jun 17 23:28:34 nl100 kernel: [89860.345213] blk_mq_run_hw_queue+0x88/0x110 Jun 17 23:28:34 nl100 kernel: [89860.345214] __blk_mq_try_issue_directly+0x8e/0x1c0 Jun 17 23:28:34 nl100 kernel: [89860.345215] ? recalibrate_cpu_khz+0x10/0x10 Jun 17 23:28:34 nl100 kernel: [89860.345216] ? ktime_get+0x3a/0xa0 Jun 17 23:28:34 nl100 kernel: [89860.345217] blk_mq_try_issue_directly+0x30/0xb0 Jun 17 23:28:34 nl100 kernel: [89860.345218] blk_mq_make_request+0x332/0x530 Jun 17 23:28:34 nl100 kernel: [89860.345220] generic_make_request+0x1a4/0x400 Jun 17 23:28:34 nl100 kernel: [89860.345222] ? __add_to_page_cache_locked+0x1df/0x240 Jun 17 23:28:34 nl100 kernel: [89860.345223] submit_bio+0x45/0x130 Jun 17 23:28:34 nl100 kernel: [89860.345224] ? add_to_page_cache_lru+0x74/0xe0 Jun 17 23:28:34 nl100 kernel: [89860.345225] ? bio_add_page+0x48/0x60 Jun 17 23:28:34 nl100 kernel: [89860.345240] ext4_mpage_readpages+0x4c3/0x860 [ext4] Jun 17 23:28:34 nl100 kernel: [89860.345243] read_pages+0x6b/0x190 Jun 17 23:28:34 nl100 kernel: [89860.345245] __do_page_cache_readahead+0x1c1/0x1e0 Jun 17 23:28:34 nl100 kernel: [89860.345246] ondemand_readahead+0x1f9/0x2c0 Jun 17 23:28:34 nl100 kernel: [89860.345248] generic_file_read_iter+0x742/0xbc0 Jun 17 23:28:34 nl100 kernel: [89860.345252] ? sock_write_iter+0x97/0x100 Jun 17 23:28:34 nl100 kernel: [89860.345255] new_sync_read+0xf8/0x160 Jun 17 23:28:34 nl100 kernel: [89860.345257] vfs_read+0x91/0x140 Jun 17 23:28:34 nl100 kernel: [89860.345259] ksys_pread64+0x61/0xa0 Jun 17 23:28:34 nl100 kernel: [89860.345262] do_syscall_64+0x53/0x110 Jun 17 23:28:34 nl100 kernel: [89860.345264] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 17 23:28:34 nl100 kernel: [89860.345265] RIP: 0033:0x77fb1df4 Jun 17 23:28:34 nl100 kernel: [89860.345266] Code: d8 64 89 02 b8 ff ff ff ff eb c5 66 2e 0f 1f 84 00 00 00 00 00 90 8b 05 5a e6 00 00 49 89 ca 85 c0 75 13 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5c c3 0f 1f 00 41 55 49 89 cd 41 54 49 89 d4 Jun 17 23:28:34 nl100 kernel: [89860.345267] RSP: 002b:7fffe5d8 EFLAGS: 0246 ORIG_RAX: 0011 Jun 17 23:28:34 nl100 kernel: [89860.345267] RAX: ffda RBX: 5a2ed388 RCX: 77fb1df4 Jun 17 23:28:34 nl100 kernel: [89860.345268] RDX: 0010 RSI: 5d16a8b0 RDI: 00d2 Jun 17 23:28:34 nl100 kernel: [89860.345268] RBP: 5a2ed050 R08: 5a2ed050 R09: 0001 Jun 17 23:28:34 nl100 kernel: [89860.345269] R10: 1e343d86 R11: 0246 R12: 0010 Jun 17 23:28:34 nl100 kernel: [89860.345269] R13: 7fffe630 R14: 0001 R15: 5a2ed000 чт, 9 черв. 2022 р. о 15:34 Андрій Василишин пише: > Because it is the latest kernel which supports aufs. > Problem gone when I change to default parameters NIC Mellanox > Technologies MT28908 Family [ConnectX-6] > ethtool -C enp161s0f0np0 rx-usecs 8 rx-frames 128 tx-usecs 8 tx-frames 128 > > > вт, 7 черв. 2022 р. о 18:35 Diederik de Haas пише: > >> Control: reassign -1 src:linux 4.19.235-1 >> Control: tag -1 moreinfo >> >> On 23 Apr 2022 21:59:32 +0300 Андрій Василишин >> wrote: >> > Package: linux-image-4.19.0-20-amd64 >> > Version: 4.19.235-1 >> > >> > ... >> > >> > Hardware name: Supermicro AS-1124US-TNRP/H12DSU-iN, BIOS 2.3a 03/03/2022 >> >> https://www.supermicro.com/en/Aplus/system/1U/1124/AS-1124US-TNRP.cfm >> specifications (and BIOS date) indicate this is quite a new board. >> Yet you're running it with a 4.19 kernel from *OldStable* ! >> >> Why? >> &
Bug#1010073: Bug 1010073: kernel 4.19: nvme read overhead sometimes, system hangs
Because it is the latest kernel which supports aufs. Problem gone when I change to default parameters NIC Mellanox Technologies MT28908 Family [ConnectX-6] ethtool -C enp161s0f0np0 rx-usecs 8 rx-frames 128 tx-usecs 8 tx-frames 128 вт, 7 черв. 2022 р. о 18:35 Diederik de Haas пише: > Control: reassign -1 src:linux 4.19.235-1 > Control: tag -1 moreinfo > > On 23 Apr 2022 21:59:32 +0300 Андрій Василишин > wrote: > > Package: linux-image-4.19.0-20-amd64 > > Version: 4.19.235-1 > > > > ... > > > > Hardware name: Supermicro AS-1124US-TNRP/H12DSU-iN, BIOS 2.3a 03/03/2022 > > https://www.supermicro.com/en/Aplus/system/1U/1124/AS-1124US-TNRP.cfm > specifications (and BIOS date) indicate this is quite a new board. > Yet you're running it with a 4.19 kernel from *OldStable* ! > > Why? > > Can you reproduce this issue with at least the 5.10 kernel from OldStable > backports, but preferably with a recent kernel from Testing/Unstable. -- WBR, Andrey Vasilishin
Bug#1010073:
On other server with same configuration Apr 23 22:30:43 nl03 kernel: [613296.099396] Modules linked in: binfmt_misc msr mst_pciconf(OE) amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul efi_pstore crc32_pclmul ghash_clmulni_intel efivars pcspkr ipmi_ssif nls_ascii nls_cp437 vfat fat ast ttm joydev drm_kms_helper drm ccp i2c_algo_bit rng_core evdev sp5100_tco ipmi_si ipmi_devintf ipmi_msghandler pcc_cpufreq acpi_cpufreq button tcp_bbr sch_fq aufs(OE) efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb hid_generic usbhid hid mlx5_core(OE) mlxfw(OE) xhci_pci psample crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper mlxdevm(OE) xhci_hcd ahci auxiliary(OE) libahci libata mlx_compat(OE) nvme usbcore devlink scsi_mod nvme_core i2c_piix4 usb_common [last unloaded: mst_pci] Apr 23 22:30:43 nl03 kernel: [613296.099438] CPU: 20 PID: 135069 Comm: nginx Tainted: GW OEL4.19.0-20-amd64 #1 Debian 4.19.235-1 Apr 23 22:30:43 nl03 kernel: [613296.099439] Hardware name: Supermicro AS -1124US-TNRP/H12DSU-iN, BIOS 2.3a 03/03/2022 Apr 23 22:30:43 nl03 kernel: [613296.099444] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20 Apr 23 22:30:43 nl03 kernel: [613296.099445] Code: d8 48 3d 90 d0 03 00 76 cc 80 4d 00 08 eb 98 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 48 89 f7 57 9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 07 Apr 23 22:30:43 nl03 kernel: [613296.099446] RSP: 0018:9c930d103e88 EFLAGS: 0282 ORIG_RAX: ff13 Apr 23 22:30:43 nl03 kernel: [613296.099447] RAX: 0009 RBX: d9380721d658 RCX: 0082 Apr 23 22:30:43 nl03 kernel: [613296.099448] RDX: 9c92dfcdc000 RSI: 0282 RDI: 0282 Apr 23 22:30:43 nl03 kernel: [613296.099448] RBP: R08: R09: b40f6001 Apr 23 22:30:43 nl03 kernel: [613296.099449] R10: 9c164669d700 R11: 0001 R12: 9d928c6d0858 Apr 23 22:30:43 nl03 kernel: [613296.099449] R13: 0282 R14: 9d928c6d0140 R15: d9380721f660 Apr 23 22:30:43 nl03 kernel: [613296.099450] FS: 7fa7d9ef0b80() GS:9c930d10() knlGS: Apr 23 22:30:43 nl03 kernel: [613296.099451] CS: 0010 DS: ES: CR0: 80050033 Apr 23 22:30:43 nl03 kernel: [613296.099451] CR2: 55cfa4118000 CR3: 01004a9f8000 CR4: 00340ee0 Apr 23 22:30:43 nl03 kernel: [613296.099452] Call Trace: Apr 23 22:30:43 nl03 kernel: [613296.099453] Apr 23 22:30:43 nl03 kernel: [613296.099456] fq_flush_timeout+0x6a/0x90 Apr 23 22:30:43 nl03 kernel: [613296.099459] ? fq_ring_free+0xd0/0xd0 Apr 23 22:30:43 nl03 kernel: [613296.099462] call_timer_fn+0x2b/0x130 Apr 23 22:30:43 nl03 kernel: [613296.099464] run_timer_softirq+0x1c7/0x3e0 Apr 23 22:30:43 nl03 kernel: [613296.099465] __do_softirq+0xde/0x2d8 Apr 23 22:30:43 nl03 kernel: [613296.099468] irq_exit+0xba/0xc0 Apr 23 22:30:43 nl03 kernel: [613296.099469] smp_apic_timer_interrupt+0x74/0x140 Apr 23 22:30:43 nl03 kernel: [613296.099471] apic_timer_interrupt+0xf/0x20 Apr 23 22:30:43 nl03 kernel: [613296.099472] Apr 23 22:30:43 nl03 kernel: [613296.099473] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20 Apr 23 22:30:43 nl03 kernel: [613296.099474] Code: d8 48 3d 90 d0 03 00 76 cc 80 4d 00 08 eb 98 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 48 89 f7 57 9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 07 Apr 23 22:30:43 nl03 kernel: [613296.099474] RSP: 0018:ba37da943638 EFLAGS: 0297 ORIG_RAX: ff13 Apr 23 22:30:43 nl03 kernel: [613296.099475] RAX: 9c9276209c00 RBX: 9c92e05bc140 RCX: 9c926d1046c1 Apr 23 22:30:43 nl03 kernel: [613296.099476] RDX: 9d72ec4e0640 RSI: 0297 RDI: 0297 Apr 23 22:30:43 nl03 kernel: [613296.099476] RBP: 00fc R08: 008c R09: 9c15dcdc9e40 Apr 23 22:30:43 nl03 kernel: [613296.099476] R10: R11: 9c90f4ed2000 R12: 9c15dcdc9e40 Apr 23 22:30:43 nl03 kernel: [613296.099477] R13: 0100 R14: ff00 R15: ff00 Apr 23 22:30:43 nl03 kernel: [613296.099478] alloc_iova+0x11f/0x140 Apr 23 22:30:43 nl03 kernel: [613296.099480] alloc_iova_fast+0x56/0x250 Apr 23 22:30:43 nl03 kernel: [613296.099483] ? __kmalloc+0x180/0x220 Apr 23 22:30:43 nl03 kernel: [613296.099485] ? mempool_alloc+0x67/0x190 Apr 23 22:30:43 nl03 kernel: [613296.099486] dma_ops_alloc_iova.isra.28+0x4b/0x70 Apr 23 22:30:43 nl03 kernel: [613296.099488] map_sg+0x73/0x1f0 Apr 23 22:30:43 nl03 kernel: [613296.099492] nvme_queue_rq+0x1e7/0x9e0 [nvme] Apr 23 22:30:43 nl03 kernel: [613296.099495] ? __sbitmap_queue_get+0x24/0x90 Apr 23 22:30:43 nl03 kernel: [613296.099497] ? blk_mq_get_tag+0x236/0x260 Apr 23 22:30:43 nl03 kernel: [613296.099499] ? finish_wait+0x80/0x80 Apr 23 22:30:43 nl03 kernel: [613296.099500]