Bug#1010073: Bug 1010073: kernel 4.19: nvme read overhead sometimes, system hangs

2022-06-29 Thread Андрій Василишин
ср, 29 черв. 2022 р. о 16:32 Ben Hutchings  пише:

> On Thu, 9 Jun 2022 15:34:17 Андрій Василишин 
> wrote:
> > Because it is the latest kernel which supports aufs.
> > Problem gone when I change to default  parameters NIC Mellanox
> Technologies
> > MT28908 Family [ConnectX-6]
> > ethtool -C enp161s0f0np0 rx-usecs 8 rx-frames 128 tx-usecs 8 tx-frames
> 128
> [...]
>
> So this seems to be a problem with the out-of-tree network driver you
> are using.  You should ask Mellanox for support, as there's nothing we
> can do about that.
>
> Ben.
>
> --
> Ben Hutchings
> Reality is just a crutch for people who can't handle science fiction.
>

Yes and no.
Problem reappeared.  Helped disable sendfile in nginx


Bug#1010073: Bug 1010073: kernel 4.19: nvme read overhead sometimes, system hangs

2022-06-18 Thread Андрій Василишин
 R13: 0080 R14:
ff80 R15: ff80
Jun 17 23:28:34 nl100 kernel: [89860.345176]  alloc_iova+0x11f/0x140
Jun 17 23:28:34 nl100 kernel: [89860.345178]  alloc_iova_fast+0x56/0x250
Jun 17 23:28:34 nl100 kernel: [89860.345181]  ? __kmalloc+0x180/0x220
Jun 17 23:28:34 nl100 kernel: [89860.345185]  ? mempool_alloc+0x67/0x190
Jun 17 23:28:34 nl100 kernel: [89860.345186]
 dma_ops_alloc_iova.isra.28+0x4b/0x70
Jun 17 23:28:34 nl100 kernel: [89860.345188]  map_sg+0x73/0x1f0
Jun 17 23:28:34 nl100 kernel: [89860.345193]  nvme_queue_rq+0x1e7/0x9e0
[nvme]
Jun 17 23:28:34 nl100 kernel: [89860.345197]  ?
__sbitmap_queue_get+0x24/0x90
Jun 17 23:28:34 nl100 kernel: [89860.345199]  ? blk_mq_get_tag+0x236/0x260
Jun 17 23:28:34 nl100 kernel: [89860.345201]  ? nvme_queue_rq+0x4d2/0x9e0
[nvme]
Jun 17 23:28:34 nl100 kernel: [89860.345203]  ? finish_wait+0x80/0x80
Jun 17 23:28:34 nl100 kernel: [89860.345205]
 blk_mq_dispatch_rq_list+0x392/0x590
Jun 17 23:28:34 nl100 kernel: [89860.345206]  ? remove_wait_queue+0x60/0x60
Jun 17 23:28:34 nl100 kernel: [89860.345208]
 blk_mq_sched_dispatch_requests+0xf0/0x170
Jun 17 23:28:34 nl100 kernel: [89860.345210]
 __blk_mq_run_hw_queue+0x4e/0xe0
Jun 17 23:28:34 nl100 kernel: [89860.345212]
 __blk_mq_delay_run_hw_queue+0x143/0x160
Jun 17 23:28:34 nl100 kernel: [89860.345213]  blk_mq_run_hw_queue+0x88/0x110
Jun 17 23:28:34 nl100 kernel: [89860.345214]
 __blk_mq_try_issue_directly+0x8e/0x1c0
Jun 17 23:28:34 nl100 kernel: [89860.345215]  ?
recalibrate_cpu_khz+0x10/0x10
Jun 17 23:28:34 nl100 kernel: [89860.345216]  ? ktime_get+0x3a/0xa0
Jun 17 23:28:34 nl100 kernel: [89860.345217]
 blk_mq_try_issue_directly+0x30/0xb0
Jun 17 23:28:34 nl100 kernel: [89860.345218]
 blk_mq_make_request+0x332/0x530
Jun 17 23:28:34 nl100 kernel: [89860.345220]
 generic_make_request+0x1a4/0x400
Jun 17 23:28:34 nl100 kernel: [89860.345222]  ?
__add_to_page_cache_locked+0x1df/0x240
Jun 17 23:28:34 nl100 kernel: [89860.345223]  submit_bio+0x45/0x130
Jun 17 23:28:34 nl100 kernel: [89860.345224]  ?
add_to_page_cache_lru+0x74/0xe0
Jun 17 23:28:34 nl100 kernel: [89860.345225]  ? bio_add_page+0x48/0x60
Jun 17 23:28:34 nl100 kernel: [89860.345240]
 ext4_mpage_readpages+0x4c3/0x860 [ext4]
Jun 17 23:28:34 nl100 kernel: [89860.345243]  read_pages+0x6b/0x190
Jun 17 23:28:34 nl100 kernel: [89860.345245]
 __do_page_cache_readahead+0x1c1/0x1e0
Jun 17 23:28:34 nl100 kernel: [89860.345246]  ondemand_readahead+0x1f9/0x2c0
Jun 17 23:28:34 nl100 kernel: [89860.345248]
 generic_file_read_iter+0x742/0xbc0
Jun 17 23:28:34 nl100 kernel: [89860.345252]  ? sock_write_iter+0x97/0x100
Jun 17 23:28:34 nl100 kernel: [89860.345255]  new_sync_read+0xf8/0x160
Jun 17 23:28:34 nl100 kernel: [89860.345257]  vfs_read+0x91/0x140
Jun 17 23:28:34 nl100 kernel: [89860.345259]  ksys_pread64+0x61/0xa0
Jun 17 23:28:34 nl100 kernel: [89860.345262]  do_syscall_64+0x53/0x110
Jun 17 23:28:34 nl100 kernel: [89860.345264]
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 17 23:28:34 nl100 kernel: [89860.345265] RIP: 0033:0x77fb1df4
Jun 17 23:28:34 nl100 kernel: [89860.345266] Code: d8 64 89 02 b8 ff ff ff
ff eb c5 66 2e 0f 1f 84 00 00 00 00 00 90 8b 05 5a e6 00 00 49 89 ca 85 c0
75 13 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5c c3 0f 1f 00 41 55 49
89 cd 41 54 49 89 d4
Jun 17 23:28:34 nl100 kernel: [89860.345267] RSP: 002b:7fffe5d8
EFLAGS: 0246 ORIG_RAX: 0011
Jun 17 23:28:34 nl100 kernel: [89860.345267] RAX: ffda RBX:
5a2ed388 RCX: 77fb1df4
Jun 17 23:28:34 nl100 kernel: [89860.345268] RDX: 0010 RSI:
5d16a8b0 RDI: 00d2
Jun 17 23:28:34 nl100 kernel: [89860.345268] RBP: 5a2ed050 R08:
5a2ed050 R09: 0001
Jun 17 23:28:34 nl100 kernel: [89860.345269] R10: 1e343d86 R11:
0246 R12: 0010
Jun 17 23:28:34 nl100 kernel: [89860.345269] R13: 7fffe630 R14:
0001 R15: 5a2ed000

чт, 9 черв. 2022 р. о 15:34 Андрій Василишин  пише:

> Because it is the latest kernel which supports aufs.
> Problem gone when I change to default  parameters NIC Mellanox
> Technologies MT28908 Family [ConnectX-6]
> ethtool -C enp161s0f0np0 rx-usecs 8 rx-frames 128 tx-usecs 8 tx-frames 128
>
>
> вт, 7 черв. 2022 р. о 18:35 Diederik de Haas  пише:
>
>> Control: reassign -1 src:linux 4.19.235-1
>> Control: tag -1 moreinfo
>>
>> On 23 Apr 2022 21:59:32 +0300 Андрій Василишин 
>> wrote:
>> > Package: linux-image-4.19.0-20-amd64
>> > Version: 4.19.235-1
>> >
>> > ...
>> >
>> > Hardware name: Supermicro AS-1124US-TNRP/H12DSU-iN, BIOS 2.3a 03/03/2022
>>
>> https://www.supermicro.com/en/Aplus/system/1U/1124/AS-1124US-TNRP.cfm
>> specifications (and BIOS date) indicate this is quite a new board.
>> Yet you're running it with a 4.19 kernel from *OldStable* !
>>
>> Why?
>>
&

Bug#1010073: Bug 1010073: kernel 4.19: nvme read overhead sometimes, system hangs

2022-06-09 Thread Андрій Василишин
Because it is the latest kernel which supports aufs.
Problem gone when I change to default  parameters NIC Mellanox Technologies
MT28908 Family [ConnectX-6]
ethtool -C enp161s0f0np0 rx-usecs 8 rx-frames 128 tx-usecs 8 tx-frames 128


вт, 7 черв. 2022 р. о 18:35 Diederik de Haas  пише:

> Control: reassign -1 src:linux 4.19.235-1
> Control: tag -1 moreinfo
>
> On 23 Apr 2022 21:59:32 +0300 Андрій Василишин 
> wrote:
> > Package: linux-image-4.19.0-20-amd64
> > Version: 4.19.235-1
> >
> > ...
> >
> > Hardware name: Supermicro AS-1124US-TNRP/H12DSU-iN, BIOS 2.3a 03/03/2022
>
> https://www.supermicro.com/en/Aplus/system/1U/1124/AS-1124US-TNRP.cfm
> specifications (and BIOS date) indicate this is quite a new board.
> Yet you're running it with a 4.19 kernel from *OldStable* !
>
> Why?
>
> Can you reproduce this issue with at least the 5.10 kernel from OldStable
> backports, but preferably with a recent kernel from Testing/Unstable.



-- 
WBR, Andrey Vasilishin


Bug#1010073:

2022-04-23 Thread Андрій Василишин
On other server with same configuration


Apr 23 22:30:43 nl03 kernel: [613296.099396] Modules linked in: binfmt_misc
msr mst_pciconf(OE) amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass
crct10dif_pclmul efi_pstore crc32_pclmul ghash_clmulni_intel efivars pcspkr
ipmi_ssif nls_ascii nls_cp437 vfat fat ast ttm joydev drm_kms_helper drm
ccp i2c_algo_bit rng_core evdev sp5100_tco ipmi_si ipmi_devintf
ipmi_msghandler pcc_cpufreq acpi_cpufreq button tcp_bbr sch_fq aufs(OE)
efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic
fscrypto ecb hid_generic usbhid hid mlx5_core(OE) mlxfw(OE) xhci_pci
psample crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper
mlxdevm(OE) xhci_hcd ahci auxiliary(OE) libahci libata mlx_compat(OE) nvme
usbcore devlink scsi_mod nvme_core i2c_piix4 usb_common [last unloaded:
mst_pci]
Apr 23 22:30:43 nl03 kernel: [613296.099438] CPU: 20 PID: 135069 Comm:
nginx Tainted: GW  OEL4.19.0-20-amd64 #1 Debian 4.19.235-1
Apr 23 22:30:43 nl03 kernel: [613296.099439] Hardware name: Supermicro AS
-1124US-TNRP/H12DSU-iN, BIOS 2.3a 03/03/2022
Apr 23 22:30:43 nl03 kernel: [613296.099444] RIP:
0010:_raw_spin_unlock_irqrestore+0x11/0x20
Apr 23 22:30:43 nl03 kernel: [613296.099445] Code: d8 48 3d 90 d0 03 00 76
cc 80 4d 00 08 eb 98 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00
0f 1f 40 00 48 89 f7 57 9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00
0f 1f 44 00 00 8b 07
Apr 23 22:30:43 nl03 kernel: [613296.099446] RSP: 0018:9c930d103e88
EFLAGS: 0282 ORIG_RAX: ff13
Apr 23 22:30:43 nl03 kernel: [613296.099447] RAX: 0009 RBX:
d9380721d658 RCX: 0082
Apr 23 22:30:43 nl03 kernel: [613296.099448] RDX: 9c92dfcdc000 RSI:
0282 RDI: 0282
Apr 23 22:30:43 nl03 kernel: [613296.099448] RBP:  R08:
 R09: b40f6001
Apr 23 22:30:43 nl03 kernel: [613296.099449] R10: 9c164669d700 R11:
0001 R12: 9d928c6d0858
Apr 23 22:30:43 nl03 kernel: [613296.099449] R13: 0282 R14:
9d928c6d0140 R15: d9380721f660
Apr 23 22:30:43 nl03 kernel: [613296.099450] FS:  7fa7d9ef0b80()
GS:9c930d10() knlGS:
Apr 23 22:30:43 nl03 kernel: [613296.099451] CS:  0010 DS:  ES: 
CR0: 80050033
Apr 23 22:30:43 nl03 kernel: [613296.099451] CR2: 55cfa4118000 CR3:
01004a9f8000 CR4: 00340ee0
Apr 23 22:30:43 nl03 kernel: [613296.099452] Call Trace:
Apr 23 22:30:43 nl03 kernel: [613296.099453]  
Apr 23 22:30:43 nl03 kernel: [613296.099456]  fq_flush_timeout+0x6a/0x90
Apr 23 22:30:43 nl03 kernel: [613296.099459]  ? fq_ring_free+0xd0/0xd0
Apr 23 22:30:43 nl03 kernel: [613296.099462]  call_timer_fn+0x2b/0x130
Apr 23 22:30:43 nl03 kernel: [613296.099464]  run_timer_softirq+0x1c7/0x3e0
Apr 23 22:30:43 nl03 kernel: [613296.099465]  __do_softirq+0xde/0x2d8
Apr 23 22:30:43 nl03 kernel: [613296.099468]  irq_exit+0xba/0xc0
Apr 23 22:30:43 nl03 kernel: [613296.099469]
 smp_apic_timer_interrupt+0x74/0x140
Apr 23 22:30:43 nl03 kernel: [613296.099471]  apic_timer_interrupt+0xf/0x20
Apr 23 22:30:43 nl03 kernel: [613296.099472]  
Apr 23 22:30:43 nl03 kernel: [613296.099473] RIP:
0010:_raw_spin_unlock_irqrestore+0x11/0x20
Apr 23 22:30:43 nl03 kernel: [613296.099474] Code: d8 48 3d 90 d0 03 00 76
cc 80 4d 00 08 eb 98 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00
0f 1f 40 00 48 89 f7 57 9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00
0f 1f 44 00 00 8b 07
Apr 23 22:30:43 nl03 kernel: [613296.099474] RSP: 0018:ba37da943638
EFLAGS: 0297 ORIG_RAX: ff13
Apr 23 22:30:43 nl03 kernel: [613296.099475] RAX: 9c9276209c00 RBX:
9c92e05bc140 RCX: 9c926d1046c1
Apr 23 22:30:43 nl03 kernel: [613296.099476] RDX: 9d72ec4e0640 RSI:
0297 RDI: 0297
Apr 23 22:30:43 nl03 kernel: [613296.099476] RBP: 00fc R08:
008c R09: 9c15dcdc9e40
Apr 23 22:30:43 nl03 kernel: [613296.099476] R10:  R11:
9c90f4ed2000 R12: 9c15dcdc9e40
Apr 23 22:30:43 nl03 kernel: [613296.099477] R13: 0100 R14:
ff00 R15: ff00
Apr 23 22:30:43 nl03 kernel: [613296.099478]  alloc_iova+0x11f/0x140
Apr 23 22:30:43 nl03 kernel: [613296.099480]  alloc_iova_fast+0x56/0x250
Apr 23 22:30:43 nl03 kernel: [613296.099483]  ? __kmalloc+0x180/0x220
Apr 23 22:30:43 nl03 kernel: [613296.099485]  ? mempool_alloc+0x67/0x190
Apr 23 22:30:43 nl03 kernel: [613296.099486]
 dma_ops_alloc_iova.isra.28+0x4b/0x70
Apr 23 22:30:43 nl03 kernel: [613296.099488]  map_sg+0x73/0x1f0
Apr 23 22:30:43 nl03 kernel: [613296.099492]  nvme_queue_rq+0x1e7/0x9e0
[nvme]
Apr 23 22:30:43 nl03 kernel: [613296.099495]  ?
__sbitmap_queue_get+0x24/0x90
Apr 23 22:30:43 nl03 kernel: [613296.099497]  ? blk_mq_get_tag+0x236/0x260
Apr 23 22:30:43 nl03 kernel: [613296.099499]  ? finish_wait+0x80/0x80
Apr 23 22:30:43 nl03 kernel: [613296.099500]